Azure / azure-data-lake-store-net

Azure Data Lake Store .Net SDK
MIT License
18 stars 24 forks source link

BulkUpload fails with error message "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection." #46

Open MullenStudio opened 3 years ago

MullenStudio commented 3 years ago

The issue happens if the files is larger than 240MB (need to upload as multiple chunks) and the line across 240MB boundary is close (but less than) 4MB (non-binary).

The issue is here https://github.com/Azure/azure-data-lake-store-net/blob/dde6a757976108cc09282a640498e129494e95ba/AdlsDotNetSDK/FileTransfer/Jobs/CopyFileJob.cs#L287

The bufferOffset value is 0 at the beginning normally, with 4MB BuffSize and 8KB ReadForwardBuffSize, it could make sure the bufferOffset is always multiple of 8KB (until the end of stream), and therefore bufferOffset + ReadForwardBufferSize is always no larger than BuffSize.

However, for the given case, the method is called by ReadForwardTillNewLine(readStream, readBytes, residualDataSize) and residualDataSize (which is bufferOffset) is not always multiple of 8KB, which means in the end the bufferOffset + ReadForwardBufferSize could be larger than BuffSize.

Suggest to fix it as int totBytesRead = ReadDataIntoBuffer(readStream, buffer, bufferOffset, Math.Min(ReadForwardBuffSize, BuffSize - bufferOffset));