Closed hieuhoang closed 5 days ago
Hi @hieuhoang. You are seeing incorrect data in server B because of kernel and attribute caching. Please run blobfuse2 in serverB in direct-io
mode to bypass kernel cache. Update the libfuse and attr_cache in config as below,
libfuse:
attribute-expiration-sec: 0
entry-expiration-sec: 0
negative-entry-expiration-sec: 0
direct-io: true
attr_cache:
timeout-sec: 0
no-symlinks: false
@souravgupta-msft thanks. Do I also have to change this too:?
file_cache:
timeout-sec: 120 --> 0
ps. any way you can ban spammer rickoq & krissV2?!
@hieuhoang, yes, you can set file cache timeout to 0, so that in the other server you will always get the updated data and not the locally cached data.
I have deleted and reported the spam comments.
thanks. The question is really is file cache timeout required to be 0 to prevent the file corruption issue?
Is disabling cache and using direct-io a temporary solution while you work on a fix, or is this the permanent recommendation to avoid file corruption? Using direct-io as it prevent us from memory mapping files
direct-io
is the recommendation when multiple readers/writers are involved. This is because of caching at kernel level and also at the blobfuse side (attribute and file cache). If one writer updates a file and another reader tries to read it, the reader might not get updated data due to caching at the reader's side. If your expectation is to get the updated data across different blobfuse mounts in different servers, you will have to enable direct-io
to bypass kernel cache and disable caching (both attribute and file cache) in blobfuse.
is there a way to have some caching but avoid the file corruption problem? eg. synchronize the file and attributes.
It's too slow to be usable without any caching
We managed to successfully used blobfuse v1 with caching. V2 seems to be a step backwards in this respect. AML fuse had a similar problem and fixed it
Hi @hieuhoang. You are seeing incorrect data in server B because of kernel and attribute caching. Please run blobfuse2 in serverB in
direct-io
mode to bypass kernel cache. Update the libfuse and attr_cache in config as below,libfuse: attribute-expiration-sec: 0 entry-expiration-sec: 0 negative-entry-expiration-sec: 0 direct-io: true
attr_cache: timeout-sec: 0 no-symlinks: false
I'm facing same issue. I've done the above on both machines, and I still have this corruption issue. I only stop having the issue when I mounted 1 machine only.
it's certainly true that disabling all caching stops the corruption. However, command line responsiveness become very slow.
If you're still seeing corruption my suggestion would be to remount the blobs. Maybe some parameters were not picked up
The question is how to enable some caching without file corruption.
I'm facing same issue. I've done the above on both machines, and I still have this corruption issue. I only stop having the issue when I mounted 1 machine only.
@dgsouzabr, please disable file caching also and retry.
If the workflow involves having multiple mounts in different servers, but working with one file on one server only, then attribute and kernel cache (using direct-io) can be disabled. Content cache (or file cache) can be enabled in this case, so that the files doesn't have to be downloaded every time and subsequent reads can be served from the local cache directory.
Closing, for more queries, please reply in this thread.
For reference - it seems like enabling file and attribute cache but using direct-io prevents corruption,
If file-cahe and attribute-cache are enabled then 'direct-io' is nullified and that might be the reason.
what do you mean by nullified? I know if I don't use direct-io, corruption occurs.
'direct-io' disables the kernel cache but as file-cache and attribute-cache is enabled the caching responsibility shifts to blobfuse2. In either case your read/writes are cached and not directly getting updates from the container.
ok. I don't think enabling file-cache and attribute-cache shift caching TOTALLY to blobfuse2. Caching is still done by the kernel cache if there's no direct-io flag. in addition to blobfuse2
Not having 'direct-io' means, there is additional kernel page cache being maintained and most of your read/writes will be served from it. Otherwise calls will come directly to blobfuse2 and based on what kind of caching you have enabled, blobfuse2 will act accordingly.
You've explained that already.
Maybe you should have example of settings in the readme that will work in different situations. It's extremely disconcerting that files to be corrupted
On Tue, Sep 10, 2024, 9:21 PM Vikas Bhansali @.***> wrote:
Not having 'direct-io' means, there is additional kernel page cache being maintained and most of your read/writes will be served from it. Otherwise calls will come directly to blobfuse2 and based on what kind of caching you have enabled, blobfuse2 will act accordingly.
— Reply to this email directly, view it on GitHub https://github.com/Azure/azure-storage-fuse/issues/1512#issuecomment-2342590567, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFI4FBB4UGPILUDI5K6523ZV7AN7AVCNFSM6AAAAABNNP2IAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBSGU4TANJWG4 . You are receiving this because you were mentioned.Message ID: @.***>
This issue is not related to corruption but is rather a limitation of Blobfuse2 which is clearly documented in the README.
Blobfuse2 is stable, and is supported by Microsoft provided that it is used within its limits documented here. Blobfuse2 supports both reads and writes however, it does not guarantee continuous sync of data written to storage using other APIs or other mounts of Blobfuse2. For data integrity it is recommended that multiple sources do not modify the same blob/file.
Like already explained, if multiple readers/writers are involved working on one file, you will have to disable all types caching (file, attribute as well as kernel caching). This will make sure that you will get the most updated data during read. But if you are doing read/write operations on a file on one server only, then you can enable caching at Blobfuse level (file and attribute cache) so as to prevent network calls for subsequent getProperties or download requests.
Which version of blobfuse was used?
blobfuse2 version 2.3.0
Which OS distribution and version are you using?
Ubuntu 20.04.6 LTS
If relevant, please share your mount command.
blobfuse2 mount $MOUNT_POINT --config-file=$CFG_FILE --allow-other --tmp-path /mnt/blobfuse2/$MOUNT_POINT
Config yaml file: logging: type: syslog level: log_debug
components:
libfuse: attribute-expiration-sec: 120 entry-expiration-sec: 120 negative-entry-expiration-sec: 240
file_cache: timeout-sec: 120 max-size-mb: 4096
attr_cache: timeout-sec: 120 no-symlinks: false
azstorage: type: block account-name: xxxstorage2 container: xxxblob2 mode: sas endpoint: xxx sas: xxx
What was the issue encountered?
File corrupted from truncation. Below are the commands to reproduce:
serverA$ echo Hello World > hello serverA$ echo Goodbye World > goodbye serverA$ cp hello corrupt
serverB$ ls -l ...
serverA$ cp goodbye corrupt
serverB$ cat corrupt Goodbye Worl\<EOF>
The file 'corrupt' should be a copy of 'goodbye' or 'hello'. However, the corruption occurs because 'corrupt' has the content of 'goodbye' but the length of 'hello'.
ps. this is a similar issue with the AML fuse driver on the AML cluster. I'm MS FTE, we can chat internally if you wish
Have you found a mitigation/solution?
no
Please share logs if available.
not available