coryrwest / B2.NET

.NET library for Backblaze's B2 Cloud Storage
MIT License
62 stars 33 forks source link

LargeFileUploadTest() wrong file size #22

Closed JM63 closed 5 years ago

JM63 commented 5 years ago

Congratulations fantastic library. When i upload a file "test_1.zip" with 6335KB in size, with B2Client.Files.Upload() the file is in Backblze B2 with the same size, but if i do it with LargeFileUploadTest() the file is in B2 with approximately double the size and corrupted. Any suggestion?

coryrwest commented 5 years ago

Thanks, I'm glad it's useful.

I'm not quite clear on what you are doing. Are you uploading the test_1 file using the LargeFiles API? If so, that is not what LargeFiles is for. The API does not assume anything about the type of file you are uploading. You have to make the determination in your own code as to whether it is a large file or not. According to the B2 Docs, a large file must be at minimum 5MB in size.

JM63 commented 5 years ago

Thanks for the quick response.

This is the code i use to upload de Teste_2.zip file with 6.18MB, in the future i want upload much bigger files, this is a test file only. This file is in Backblaze B2 bucket with 16.1MB.

public static string UploadLargeFileB2(string bucketFolder, string fileFullName) { var client = new B2Client(_b2AccountId, _b2ApplicationKey);

        List<byte[]> parts = new List<byte[]>();
        using (FileStream fileStream = File.OpenRead(fileFullName_))
        {
            //fileStream.Seek(0, SeekOrigin.Begin);
            using (var stream = new StreamReader(fileStream))
            {
                while (stream.Peek() >= 0)
                {
                    char[] c = new char[1024 * (5 * 1024)];
                    stream.Read(c, 0, c.Length);

                    parts.Add(Encoding.UTF8.GetBytes(c));
                }
            }
        }

        var shas = new List<string>();
        foreach (var part in parts)
        {
            string hash = Utilities.GetSHA1Hash(part);
            shas.Add(hash);
        }

        B2File start = null;
        B2File finish = null;
        try
        {
            start = client.LargeFiles.StartLargeFile(bucketFolder_ + Path.GetFileName(fileFullName_), "", _b2BucketId).Result;

            for (int i = 0; i < parts.Count; i++)
            {
                var uploadUrl = client.LargeFiles.GetUploadPartUrl(start.FileId).Result;
                var part = client.LargeFiles.UploadPart(parts[i], i + 1, uploadUrl).Result;
            }

            finish = client.LargeFiles.FinishLargeFile(start.FileId, shas.ToArray()).Result;
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex);
            CancelUploadLargeFile(start);
        }

        return (start.FileId == finish.FileId) ? finish.FileId : null;
    }
mattdwen commented 5 years ago

I think it might be do with the the UTF8 encoding. If set a breakpoint on your code and inspect parts, each part has a different length, around 9900000. Each part should be a fixed length, of 5242880 as you have it configured.

I read the raw bytes straight from the FileStream into a buffer:

using (var stream = File.Open(localPath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    stream.Seek(part.offset, SeekOrigin.Begin);
    stream.Read(buffer, 0, part.length);
}

Then upload that raw byte buffer:

var partResult = b2.LargeFiles.UploadPart(buffer, part.partNo, uploadUrl).Result;

I have a multi-threaded upload setup, so pre-calculate part offsets (and size for the last part usually), based on recommended part size and the number of threads I want to run,

JM63 commented 5 years ago

I want to thank the answers given, for issues related to another project, only now I have returned to this problem. With the help of "mattdwen", i was able to made some changes to my function and now i can upload large zip files that are exactly the same size in backblase bucket. Everything seems to be fine however the zip files before being sent through this function, work normaly and can be decompress without problems, but after downloaded from backblaze they seem to get corrupted sending the error (bad offset to local header), when i try decompress it. What can be wrong in my function or what i'm doing wrong, any tips?

public static async Task<string> UploadLargeFileB2Async(string bucketFolder_, string fileFullName_)
{
   var client = new B2Client(_b2AccountId, _b2ApplicationKey);

   List<byte[]> parts = new List<byte[]>();
   using (var stream = File.Open(fileFullName_, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
   {
       stream.Seek(0, SeekOrigin.Begin);

       // Read and verify the data.
       byte[] buffer = new byte[1024 * (5 * 1024)];
       for (int i = 0; i < stream.Length; i++)
       {
          // Resize the byte[] array with the size of the remainder of the stream
           if (stream.Length - i < buffer.Length)
               buffer = new byte[stream.Length - i];

           stream.Read(buffer, 0, buffer.Length);
           i += buffer.Length;

           parts.Add(buffer);
       }
   }

   var shas = new List<string>();
   foreach (var part in parts)
   {
       string hash = Utilities.GetSHA1Hash(part);
       shas.Add(hash);
   }

   B2File start = null;
   B2File finish = null;
   try
   {
       start = await client.LargeFiles.StartLargeFile(bucketFolder_ + Path.GetFileName(fileFullName_), "", _b2BucketId);
       for (int i = 0; i < parts.Count; i++)
       {
           var uploadUrl = await client.LargeFiles.GetUploadPartUrl(start.FileId);
           var partResult = await client.LargeFiles.UploadPart(parts[i], i + 1, uploadUrl);
       }
       finish = await client.LargeFiles.FinishLargeFile(start.FileId, shas.ToArray());
   }
   catch (Exception)
   {
       CancelUploadLargeFile(start);
   }
   return (start.FileId == finish.FileId) ? finish.FileId : null;
}
coryrwest commented 5 years ago

I'm not 100% sure, but I think the problem might be in how you are splitting the file. I can't quite intuit how this would function (and don't have time right now to spin up a test), but I suspect that there could be an off-by-one error in the loop, or something similar.

       byte[] buffer = new byte[1024 * (5 * 1024)];
       for (int i = 0; i < stream.Length; i++)
       {
          // Resize the byte[] array with the size of the remainder of the stream
           if (stream.Length - i < buffer.Length)
               buffer = new byte[stream.Length - i];

           stream.Read(buffer, 0, buffer.Length);
           i += buffer.Length;

           parts.Add(buffer);
       }

Take a look at how the large files test does it. It uses a while loop so there is no indexing.

Edit: Just found a bug in the test. Will update this answer later. Edit Again: Tests updated with confirmed working code.

JM63 commented 5 years ago

Greetings I have tested with this change and it has worked well. I want to thank you for all your help, great job!

coryrwest commented 5 years ago

I'm glad you got it working.