bailey27 / cppcryptfs

cppcryptfs is an implementation of the gocryptfs encrypted overlay filesystem in C++ for Windows.
Other
663 stars 58 forks source link

Copying large file error #136

Closed wards999 closed 2 years ago

wards999 commented 3 years ago

I'm trying to copy a file that's 3.9gb in size to a cppcryptfs drive - however my Windows 10 machine stops during the copy and reports:

Insufficient system resources exist to complete the requested service

(I can copy this file to non cppcryptfs drives and don't get the error)

bailey27 commented 3 years ago

Hi. I just now made a file with 4194304000 random bytes (4.2 GB, 3.9 GiB) and copied it to a cppcryptfs volume using Windows 10 file manager and there wasn't an error. I copied it back and forth from c: to k: (cppcryptfs).

I also did the md5sum of the file before and after copying, and it was the same.

I tried both my usual settings (12 threads, 1024KB buffer size, encrypt keys in memory and cache keys in memory) also with the defaults, and it worked in both cases.

I also made and tested with a 5GB random file, and it worked also.

That was the 64-bit version on 64-bit Windows 10. cppcryptfs is 1.4.3.10 and Dokany is 1.5.0.3000

I also tried it with the 32-bit version running on 64-bit Windows 10 and it worked also (tried only the default settings).

Is there anything specific about your settings or your setup?

wards999 commented 3 years ago

Hi, I'm not aware of anything unusual or wrong with my setup but it looks like this must the case based on your independent tests.

From my side, I've tried 2 different 3.9gb files between C, D and E (local drives) and N (cppcryptfs) and they always fail when copying to N. The transfer speed slows down to 0 and then I get the error message.

I'll try booting into safe mode to see if it makes any difference.

My machine is running the latest Windows updates and has 32gb ram (+ Dokany 1.5.0.3000 / cppcryptf 1.4.3). Googling the error, some comments suggest it could be related to antivirus software, but if that was my problem then it wouldn't copy normally.

After posting my earlier comment I did think is it more likely to be Dokany than cppcryptfs, so I'll do a fresh install as well.

bailey27 commented 3 years ago

Also, what is the exact size in bytes of your (unencrypted) file?

If you right click on the mounted fs on the mount page and select "properties" the dialog shows everything about how the fs is mounted, about how it was created and what settings are in use, (e.g. encryption method, i.e. GCM vs SIV, etc.). If you could say what's there or just send a screen shot of it, that might help.

wards999 commented 3 years ago

Hmmm. It's something to do with the fact that my cppcryptfs N drive is located on an external Seagate USB drive formatted as exFAT.

I created and mounted a new cppcryptfs on the internal D drive (NTFS), and copying files to this worked as expected.

With N being on an external drive , do I need to change the values for either I/O buffer size or Cache time to Live?

wards999 commented 3 years ago

Just to add to the last comment:

1) I can copy the 3.9gb file directly to the usb exFAT drive without a problem 2) I can copy the 3.9gb file to a cppcryptfs volume on a different usb drive formatted as NTFS without a problem 3) I created a new cppcrpyptfs volume on a different exFAT USB drive and get the error

Of course, this is only my experience, but the best course of action for me is to just reformat the exFAT drives as NTFS and then my problem has been resolved.

Also, I wouldn't generally use exFAT - it was just the out-the-box format that Seagate use on their OneTouch drives, and up to now all the files under 3.5gb have copied to the cppcryptfs volume without issue.

bailey27 commented 3 years ago

I've rarely tested cppcryptfs with exFAT.

I think NTFS is the way to go.

The only drive I have handy to be re-formatted is a 16GB usb 2.0 flash drive.

I formatted it exFAT and created a cppcryptfs volume on it and was able to copy my 3.9GB and 5GB test files there and back without errors.

The problem might happen only on larger drives. I'll try to test with a larger drive.

bailey27 commented 3 years ago

I tried using a 240gb ssd in a usb 3.0 enclusure.

I formatted it exFAT, and I was able to copy my 3.9GB and my 5GB files to a cppcryptfs volume there.

I did notice the speed drop off substantially from about the middle of the copy to the end.

It went from around165MB/sec down to 15-20MB/sec, but it did complete.

I tried formatting NTFS, and the same thing happens. It also happens if I copy to the unencrypted drive letter (without cppcryptfs even running).

I think the speed drop might be thermal throttling by the USB controller in my notebook.

wards999 commented 3 years ago

Thanks for checking from your side.

My two Seagate devices are OneTouch 5TB drives - and the problem occurs on both.

The speed drops off significantly from about 57% and then after about 20 seconds the Insufficient Resources error is displayed.

Next job is to move all the data off, reformat to NTFS and start again :)

ghost commented 3 years ago

@bailey27 Is there a way to enable verbose logging to see the cause of these errors? I've experienced similar issues, for example with Google Drive (formerly Drive File Stream and OneDrive maybe because the virtual drive is FAT/exFAT.

bailey27 commented 3 years ago

@null-python Hi. Here's a build with logging enabled.

cppcryptfs_1.4.3.10_with_logging.zip

You'll need to make a directory

C:\cppcryptfslogs

And it will create a log file with the date and time cppcryptfs was started in the name there and append the logging messages there.

It displays the log file name in a message box upon startup.

It logs pretty much the same stuff dokany mirror does.

It doesn't put data written or read from the files there, but the plaintext filenames will be there.

Please look over the log before uploading it to make sure there's nothing sensitive there.

wards999 commented 3 years ago

Just to jump back in as well, I don't think my original problem was completely file size related.

While moving data off before the NTFS reformat, I noticed a number of files larger than 3.9gb, some up to 5gb that had previously been copied into the cppcryptfs volume. So, at some point the exFAT based volume did take these files without error. At the end, I was still able copy files smaller than 3.9gb into the volume just anything larger caused the copy slowdown / stop and error message. Perhaps size and quantity of files played a part?

Unfortunately, I've now reformatted, so I can't use the logging version, hopefully @null-python can provide something useful.

bailey27 commented 3 years ago

@wards999 I'm glad you mentioned that.

I was thinking about you said and was going to try to to mostly fill up my exFAT formatted 16GB usb 2.0 flash drive with smaller files before trying to copy a big file there.

What I found was that copying a 1GB file to cppcryptfs on that drive generally fails at just over 544MB, or at least it is now.

This drive has a max write speed of only 8MB/sec or so, so I can see by how fast the first 543MB of writes complete that WIndows is obviously buffering thee first writes to RAM, and then when the next write is done, that one blocks waiting for the ones before it to complete. I can see in task manager that that drive is continuing to do 8MB/sec of writes while the 544th 1MB write to cppcryptsf is stuck waiting in WriteFile().

It is when cppcryptfs is in this state (the last write it was told to do gets stuck waiting for the previously 500MB or so to complete) that I think Dokany returns the insufficient resources error.

However, I cannot get it to fail with the Dokany mirror sample filesystem using the same drive.

wards999 commented 3 years ago

Googling Dokany Insufficient Resources does throw up some interesting results which look timeout related. Would it be possible to change the timeout value via a setting in cppcryptfs?

bailey27 commented 3 years ago

@wards999

I think this timeout can't be increased. I think it's the Windows OS deciding that the driver (in this case Dokany) is taking way to long to complete a request. This doesn't appear to be Dokany's fault because the problem doesn't occur with the mirror sample fs.

@null-python , is your google drive exFAT?

I can reproduce the the insufficient resources error with my usb 2.0 flash drive only with exFAT. It doesn't happen with NTFS or FAT32.

I was trying my 1GB test file first.

What I see with exFAT is that the first 540MB or so of data appears to be written super-fast (to a write-cache in ram) and then the next write operation gets stuck until all (or most of) the previous writes are actually written to disk. At the 45 second mark, the copy fails with the "insufficient resources" error.

Windows is writing the data in 1MB chunks, by the way.

If I use NTFS, then the writes happen as though there is no caching, and I can see the writes happening (watching the debug output in the debugger) at about 8-10 per second from beginning to end.

With FAT32, I see the first 1GB or so of writes happen super-fast, and then (tried a 3.9 GB file to test this on FAT32) the subsequent writes continue at about 8-10 per second, and then it eventually succeeds.

But with exFAT, what I see is the first 500-600MB of writes happen super-fast, and then the next write gets "stuck" for a long time, long enough to time out. I can see in the Task Manager performance tab that the drive is writing 8-10 MB/sec while cppcryptfs is "stuck" on the last write. After 45 seconds the "insufficient resources" error happens.

I have no idea why this doesn't happen with Dokany mirror.

cppcryptfs does use WriteFile a little differently than mirror (cppcryptfs passes the offset to write to in the WriteFile offset parameter, whereas mirror sets the file pointer to the desired offset and then tells WriteFile to write at the current file pointer). I don't see how this could matter. I wrote a test program to mimic what cppcryptfs does when Windows copies the file, and I don't see the caching behavior.

I don't see why Windows would be caching writes to an external drive when I can see that this drive is recognized as external and is set to "Quick Removal" which implies no write-caching. cppcryptfs, at this point, is just a normal user-mode app writing to the D: drive.

The other odd thing is why the exFAT write cache behaves differently than FAT32. I think FAT32 is behaving correctly. Once the write-back cache is full, then subsequent writes complete at the speed of the media.

exFAT is behaving as though it has some sort of high-water-mark/low-water-mark write-caching scheme that is broken. Once the cache is full, then subsequent writes block until I think a low-water mark is reached.

If I add a FlushFileBuffers() (write all cached data for this file to disk) call right after each cppcrypts write operation, then the problem doesn't happen.

I'm wondering if I should continue to try to figure why Windows is caching the writes or just do the FlushFileBuffers() after ever write request if the underlying media is removable or maybe if the underlying filesystem is exFAT. Or maybe if it's exFAT or if it's FAT32.

I'd think exFAT and FAT32 are not normally used on internal drives, and Windows defaults to no write caching (Quick Removal) on external media.

wards999 commented 3 years ago

@bailey27 thanks for the detailed reply.

I'd looked at https://www.reddit.com/r/Cryptomator/comments/mzsn82/trouble_with_folders_with_many_subfolders/ and thought that as Cryptomator has a timeout option for Dokany, it was a Dokany parameter that was being set, rather than a Cryptomator internal wait value (I've not really used Cryptomator myself)

I'm 90% complete on my copy to the freshly NTFS formatted drive with no problems at all, and as you describe, the issue seems to be exFAT specific. I'd not even realised that Seagate used exFAT for their portable drives - unless it is better for Mac / Linux users? My recent WD portable drive came NTFS formatted.

Is there any side effect to adding the FlushFileBuffers() ?

ghost commented 3 years ago

@bailey27

Google Drive says it's FAT32 but I'm pretty sure that's not accurate because 4gb+ files work fine, Not sure how to tell what it is actually. My underlying drive is NTFS so maybe it's NTFS.

image

Did manage to reproduce error when copying from Google Drive. Error usually happens when theres a lot of small files on my Drive that have been uncached and then Google Drive tries to download all of them.

image

I use Cryptomator as well on OneDrive because it supports limiting path length unlike cppcryptfs and have the encountered the same error.

@wards999 thanks for the info, I didn't know Cryptomator supported increased timeouts https://community.cryptomator.org/t/1-5-6-error-accessing-directory-with-17000-files/6013

bailey27 commented 3 years ago

@wards999 Dokany used to have the kernel driver ping the user mode fileesystem programs that attached to it (e.g. cppcryptfs, dokany mirror) every so often, and if the user mode filesystem didn't respond in time, then Dokany dismounted the drive. They did this so they could auto-dismount of the user mode filesystem crashed.

They changed it so now they use a magic file handle that the user mode filesystem program opens. If the user mode file system crashes, then the OS closes the handle and the Dokany driver gets called to close it. It sees that the user mode fs must have crashed and dismounts.

So I think a lot of old post about "Dokany" and "timeouts" is about the old ping-based timeout.

I think the timeout we're seeing with exFAT is different. Basically the kernel is telling the Dokany driver to do something, but it doesn't respond for a long time because it's waiting on the user mode filesystem that's stuck waiting for a WriteFile() to return. So the timeout in this case is the OS deciding the Dokany driver must be stuck.

I realized my standalone test program that couldn't reproduce the problem lacked one important detail.

My test program wasn't writing the 18 bytes corresponding to the gocryptfs file header to the beginning of the file.

gocryptfs has an 18 byte file header. Every gocryptfs encrypted file that isn't 0 bytes long should have the header.

When Windows goes to copy a file, the first thing it does is reserve the the space by extending the destination file to be the same length as the source file.

When cppcryptfs does the extension, it sees that the file will not be 0 bytes long any more, so it writes the 18 byte header.

When I changed my test program to write 18 random bytes to the file on D:\ (my exFAT-formatted usb drive), then do the extension (SetEndOfFile()), then start writing the 1GB in 1MB chunks (actually 1,056,768 chunks to emulate the 4,128 byte gocryptfs block size), then I saw the behavior where the first 540MB or so of writes would complete instantly as though to cache, and then the next one would be stuck and eventually cause a timeout.

If I don't write the 18 bytes, then the problem doesn't happen.

The problem appears to come from the fact that exFAT doesn't support sparse files. When you extend a file, then exFAT appears to write 0's to all the empty space in the file. NTFS just marks that space as "return 0's if you try to read from there".

I have no idea why MS does this space reservation when they know exFAT doesn't do sparse files. When you copy a 1GB file to exFAT, you basically end up writing 1GB of 0's to the file plus all the data you're copying.

Apparently writing a few bytes to the beginning of a file and then extending it (or extending it and then writing a few bytes to the beginning, I tried that too and it doesn't matter) then this weird "cache a bunch of writes and then get stuck a long time" problem happens.

I think the best solution is either to always flush to disk after writing if the underlying filesystem isn't NTFS (or advertised as NTFS) or to provide UI so users can turn this behavior on if they wish.;

If the OS is doing write-back caching on the drive, then calling FlushFileBuffers() after every write request will have a big negative performance impact. But if the OS is not doing write-back caching, then it shouldn't hurt much if at all.

Personally, I've always used NTFS on high-capacity external drives (anything over 16GB or so), and used FAT32 on small USB sticks. I agree, Seagate must have used exFAT for better interoperability with Mac and Linux.

@null-python , a filesystem can give whatever name it wants to itself. I think Google picked FAT32 for Google drive, but as you observed, it behaves like exFAT (can have files > 4GB).

I'm wondering if the problems you've experience might be related.

Here is a test build that flushes writes to disk if the underlying filesystem doesn't say it's NTFS.

I'm hoping it won't have the timeout problems you've observed, but I'm also afraid that the performance will noticeably drop (because Google seems to cache data on the local hard drive, which would benefit from write caching).

@null-python could you please try it and see if it solves the problems with Google Drive? cppcryptfs_1.4.3.10_flush_non_ntfs.zip

ghost commented 3 years ago

@bailey27

How to reproduce: Set Google Drive to stream files, Clone a bunch of git repos to get a bunch of small files, Add files to mounted filesystem, Delete the drive cache, Restart Google Drive, Copy from filesystem then errors appear

Here's my testing results

Normal cppcryptfs

image

image

Froze my computer for 10 minutes after producing a bunch of errors, I had to stop the test

cppcryptfs_1.4.3.10_flush_non_ntfs

Cleared cache again and restarted Google Drive before running

image

Still doesn't work :(

bailey27 commented 3 years ago

@null-python I'm really sure this problem isn't the same problem @wards999 discovered with writing to exFAT drives.

It looks to me that you're getting the insufficient resources error (which I think is really because of a timeout, so I think it's possibly somewhat related) when you're reading from the encrypted fs. The problem my test build addressed could happen only when writing to the encrypted fs.

The cyclic redundancy check errors indicate data corruption. cppcryptfs returns that error if it can't decrypt a block of data because the AAD check for data integrity fails.

I tried reproducing the your issue with a compiled openssl source tree (about 5,000 files and 1.9GB). I think I followed your steps, but I didn't see the errors. I used fastcopy as well.

ghost commented 3 years ago

@bailey27

Oh, I see. I usually don't get errors writing to a volume on Google Drive, only when reading from the volume if the data has been uncached

Was your Drive set to streaming when doing the test? Also did you delete the cache after uploading the files to Drive? If the cache was deleted then the Drive icon should be active and show all the conf files being downloaded again when copying from the fs

image

bailey27 commented 3 years ago

@null-python yes, I have it set to streaming.

I copied files using FastCopy to an encrypted volume on google drive and waited for them to finish uploading (syncing).

Then I exited cppcryptfs, quit google drive and deleted C:\Users\Bailey\AppData\Local\Google\DriveFS

Then I started google drive. It wanted me to authenticate with the browser because I had deleted that directory.

Then I started cppcyptfs and mounted the volume that is on google drive.

Then I copied files using FastCopy from the encrypted fs to a dir on C: and then verified that the md5s of all the files were the same in both the copy I had just made and in the dir on C: I had originally copied them from.

ghost commented 3 years ago

@bailey27

Could you try this and see if there's any errors?

  1. Set Drive to Streaming
  2. Make a folder on a drive (Do you have a hard drive you could try this with if you're using an SSD?)
  3. In the folder, git clone https://github.com/giuspen/cherrytree.git
  4. Copy the folder using normal Windows Explorer ~12 times so you have this image
  5. In Drive go to the settings and change this folder from the default to somewhere else image, then restart the Google Drive client
  6. Create a new test cppcryptfs volume in the Google Drive and mount it
  7. Move the folder with all the other folders in it to the mounted volume (Using regular Windows Explorer, not FastCopy)
  8. Wait for everything to finish uploading
  9. Look in the new cache folder that you set above and delete just the folders that begin with "d". (Don't delete the entire Google Drive folder) image
  10. Restart Google Drive (no need to unmount cppcryptfs volume)
  11. Use FastCopy and set it to Sync, set source to the folder that was moved to the encrypted fs on the drive, set destination to a new folder on the same drive that the cache folder is in image
  12. Run FastCopy
bailey27 commented 3 years ago

@null-python I'm sorry I've been busy lately. I'll run your tests, hopefully by the end of this weekend, and let you know what happened.

bailey27 commented 3 years ago

@null-python

I ran your tests, and I was able to reproduce the errors.

However, I am convinced that this problem is not cppcryptfs's fault.

I was able to reproduce data corruption doing this type of test without cppcryptfs even running, and even without using FastCopy.

I think somehow the data is getting corrupted by Google Drive when it's being restored after deleting the cache.

I think I've proved this with the tests I did and describe below.

I used a 2.5" 7200 RPM hdd in a USB-C enclosure formatted NTFS mounted on F:

I made sure Google Drive was set to streaming and changed Google Drive's cache dir to be F:\DRIVE_CACHE

I checked out cherrytree source from github and made 12 additional copies of cherrytree source tree (about 22K files).

Then I copied all of these cherrytree directories to google drive (under G:\My Drive) using Windows Explorer.

Waited for the upload/sync to finish.

Make list of md5 checksums of all the files in all the cherrytree directories under G:\My Drive.

Delete all the directories under F:\DRIVE_CACHE\somenumber\content_cache (the directories that begin with "d").

Exit Google Drive, and start it again.

Copy all the cherrytree directories from Google Drive to somewhere on F:\ using Windows Explorer. This is when the files are copied back from the cloud.

Generate list of md5sums of the copied files.

Sort and diff md5 lists and find many differences.

Example difference

Original file:

F:\google_drive_test\cherrytree\hunspell>md5sum cs_CZ.dic 7116547ebfb1b1ce81d782e01345973f *cs_CZ.dic

From list of md5sums made after copying to Google Drive and before deleting cache. md5 is the same

F:\google_drive_test>fgrep "/cherrytree/hunspell/cs_CZ.dic" ordinary_copy_before_delete_cache_md5sum.txt 7116547ebfb1b1ce81d782e01345973f *./cherrytree/hunspell/cs_CZ.dic

Now after deleting cache and copying all cherrytree dirs to f: using Windows Explorer

F:\google_drive_test>fgrep "/cherrytree/hunspell/cs_CZ.dic" ordinary_copy_after_delete_cache_md5sum.txt 01b800d1b4285bc4d6293cc1b742b537 *./cherrytree/hunspell/cs_CZ.dic

So the data isn't the same and must have gotten corrupted.

I notice if I use FastCopy to do the copy after the delete cache and restart Google Drive, that it shows about the same number of errors as when I tested with cppcryptfs in the middle. But the errors are different. e.g.

ReadFile(Reached the end of the file.38) : g:\My Drive\test_data\cherrytree - Copy\.git\objects\pack\pack-4dc41cbc6377dd58f7e9e43f16d0d360b22dcb07.idx
ReadFile(file is modified) : g:\My Drive\test_data\cherrytree - Copy\.git\objects\pack\pack-4dc41cbc6377dd58f7e9e43f16d0d360b22dcb07.pack
ReadFile(file is modified) : g:\My Drive\test_data\cherrytree - Copy\hunspell\cs_CZ.dic

I think the "file is modified" errors become Cyclic Redundancy Check errors with cppcryptfs.

It looks to me that while cppcryptfs is reading the encrypted file (or while FastCopy is reading an ordinary file), Google Drive is writing to the same file. That could cause cppcryptfs to read inconsistent data and return a CRC error. When cppcryptfs isn't used and FastCopy is used to copy the ordinary files back from the cloud, FastCopy notices this and reports the error.

I'm not sure what the "Reached the end of the file.38" which also happen with cppcryptfs are. I didn't really research it, but I think those errors are returned when you try to read N bytes using a length and offset that is far enough from the end of the file where you should get all N bytes, but for some reason you don't get that many. I'm just guessing though.

I'm convinced that Google Drive is not resilient to having the cache deleted out from under it, at least not having just the d* directories deleted as opposed to everything e.g. when I was deleting C:\Users\Bailey\AppData\Local\Google\DriveFS previously.

What I find particularly odd about this test is this.

In this test, after the files are copied to Google Drive, nobody is writing to those files besides Google Drive. I would think that even if reading the files right when they are being refreshed from the could give inconsistent results, that there would be eventual consistency. However, the files appear to remain corrupted in Google Drive forever.

So I am convinced that the data is getting corrupted by Google Drive when it's being restored after deleting the cache.

Because I can reproduce the corruption without using cppcryptfs or even FastCopy, I think there is nothing I can do about this problem.

wards999 commented 3 years ago

@bailey27, Just wanted to say many thanks for all your efforts investigating my original and then this latter problem. It is really appreciated.

In respect to the original issue, have you decided to add a UI option or do the flush automatically:

"I think the best solution is either to always flush to disk after writing if the underlying filesystem isn't NTFS (or advertised as NTFS) or to provide UI so users can turn this behavior on if they wish"

ghost commented 3 years ago

@bailey27

Thanks for the running the tests and for the detailed explanation. It looks like most of the problems I've been facing are due to Google Drive. I'll try to make a report to their support.

bailey27 commented 3 years ago

@wards999 I'm going to add settings for flush after write in the next release.

I need to add a "More Settings" tab. The settings tab is pretty full as it is.

I think I'll have the flush after write UI look like this (with a grouping box around it and allowing multiple boxes to be checked).

Enable Flush after Write:
[ ] NTFS
[ ] not NTFS
[ ] Fat32
[ ] exFAT
[ ] filesystems that don't support sparse files

The setting will be on if the OR of any of the conditions is true. Checking the first two boxes would make it on always.

bailey27 commented 2 years ago

This was released in 1.4.3.11