Azure / azure-data-lake-store-net

Azure Data Lake Store .Net SDK
MIT License
18 stars 24 forks source link

Fix download of large files on linux #63

Closed akharit closed 1 year ago

akharit commented 1 year ago

This diff adds code to skip the MarkFileSparse method on non-Windows platforms.

This seems to be a day 0 bug in the bulkdownloader code. MarkFileSparse method is called in the downloader when downloading a large file. This method marks the file as sparse using win32 api(https://msdn.microsoft.com/en-us/library/windows/desktop/aa364596(v=vs.85).aspx) as a performance improvement.

    // https://msdn.microsoft.com/en-us/library/windows/desktop/aa364596(v=vs.85).aspx
    // Pinvoke to set the file as parse
    [DllImport("Kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
    private static extern bool DeviceIoControl(
        SafeFileHandle hDevice, int dwIoControlCode, IntPtr inBuffer, int nInBufferSize, IntPtr outBuffer, int nOutBufferSize, ref int pBytesReturned, [In] ref NativeOverlapped lpOverlapped);
    // Marks the file sparse when we create the local file during download. Otherwise if we have threads writing to the same file at different offset we have backfilling of zeros
    // meaning we are writing the file twicedue to which we get half the performance
    internal void MarkFileSparse(SafeFileHandle fileHandle)
    {
        int bytesReturned = 0;
        NativeOverlapped lpOverlapped = new NativeOverlapped();
        bool result = DeviceIoControl(fileHandle, 590020, //FSCTL_SET_SPARSE,
                IntPtr.Zero, 0, IntPtr.Zero, 0, ref bytesReturned, ref lpOverlapped);
        if (result == false)
        {
            throw new Win32Exception();
        }
    }