Azure / azure-sdk-for-node

Azure SDK for Node.js - Documentation
https://github.com/Azure/azure-sdk-for-node#documentation
MIT License
1.19k stars 567 forks source link

Result has cut bytes from retrieved data #2210

Closed miroslavvojtus closed 5 years ago

miroslavvojtus commented 7 years ago

I am currently trying to integrate DataLake as a file storage with Node.js app. So far, if I omit typings issue which is already reported here, I have some problems with reading files.

I am using code similar to following one:

import * as fs from "fs";
import * as msRestAzure from "ms-rest-azure";
import { DataLakeStoreFileSystemClient } from "azure-arm-datalake-store";

const streamToBuffer = (stream: NodeJS.ReadableStream): Promise<Buffer> => {
    return new Promise((resolve, reject) => {
        let bufs = [];
        stream.on("data", (d) => bufs.push(d));
        stream.on("error", reject);
        stream.on("end", () => resolve(Buffer.concat(bufs)););
    });
};

(async () => {
    let clientId: string = "clientId"; 
    let domain: string = "domain"; 
    let secret: string = "secret";

    let client = msRestAzure.loginWithServicePrincipalSecret(clientId, secret, domain)
                .then((credentials) => new DataLakeStoreFileSystemClient(credentials));
    let dlfs = client.fileSystem;

    fs.writeFileSync("a_file", await streamToBuffer(await dlfs.open("acount_name", "a_file")));
})();

Problem is that the file written to the file system has missing first 16834 bytes (from its remote source). It sounds to me like first data emission already occurred and stream.on("data", (d) => bufs.push(d)); receives only the rest of the file.

The issue seems to be in promises and used streaming code.

When I tried to use callback variant without promise it seems to work fine:

...
    dlfs.open("acount_name", "a_file", async (err, data) => {
        fs.writeFileSync("a_file", await streamToBuffer(data));
    });
...

Also, manual wrapping of returned stream seems to work:

...
    let data = await new Promise<NodeJS.ReadableStream>((resolve, reject) => {
        dfs.open("acount_name", "a_file" (err, data) => {
            if (err) {
                return reject(err);
            }
            resolve(new Readable().wrap(data));
        });
    });
    fs.writeFileSync("a_file", await streamToBuffer(data));
...
jdthorpe commented 6 years ago

TL;DR;

Pretty sure this is related to this issue with the underlying request library that handles the http requests.


I've also have this issue where the first 16kb of the Azure Data Lake file are lost when using the Promise / Await code. For example, this code works just fine:

const localPath = "c:/some/place/nice.csv"
const accountName = "my-adl-account"
const remotePath = "/my/favorite/file.csv"

msRestAzure.interactiveLogin(function(err, credentials) {
    const filesystemClient = new dataLakeClient.DataLakeStoreFileSystemClient(credentials);

    let file = fs.createWriteStream(localPath  )
    file.on("open",() =>{
        filesystemClient.fileSystem.open(accountName,remotePath,
            (error,source) => {
                if(error){return}
                source.pipe(file)
            })
    })
})

but the promise based approach looses the first 16kb:

const localPath = "c:/some/place/nice.csv"
const accountName = "my-adl-account"
const remotePath = "/my/favorite/file.csv"

msRestAzure.interactiveLogin(function(err, credentials) {
    const filesystemClient = new dataLakeClient.DataLakeStoreFileSystemClient(credentials);

    let file = fs.createWriteStream(localPath  )
    file.on("open",() =>{
        filesystemClient.fileSystem.open(accountName,remotePath)
            .then(source => {
                source.pipe(file)
            })
    })
})

To add to the discussion, I'm using Node version v8.9.3, with Typescript Version 3.0.1, my tsconfig includes the following:

{
  "compilerOptions": {
    "target": "es5",                          
    "module": "commonjs",                     
    "lib": ["ES2015"],                        
    "strict": true,                           
    "esModuleInterop": true                   
  }
}

and my package-lock.json file can be found here.