Azure / azure-storage-fuse

A virtual file system adapter for Azure Blob storage
Other
658 stars 206 forks source link

Preserve Cache Data During Container Remount with blobfuse2 #1438

Closed malu-alphabot closed 3 months ago

malu-alphabot commented 3 months ago

Which version of blobfuse was used?

blobfuse2 version 2.1.2

Which OS distribution and version are you using?

Ubuntu 22.04.3 LTS

If relevant, please share your mount command.

-

What was the issue encountered?

We use blobfuse2 to mount the container in a cluster outside the Azure network. Because of this, it is important for us to maintain a cache that persists the data for a longer period to reduce bandwidth costs and optimize data consumption, so we don't need to download the data every time we need to read it. We utilize disk-based caching for this purpose.

However, we routinely encounter issues that require us to remount the container. When we unmount the container, we lose the cached data on the disk.

I would like to know if there is a way to remount the container while retaining the cached data on the disk or if it would be possible to implement this feature if it doesn't currently exist.

Have you found a mitigation/solution?

No.

Please share logs if available.

vibhansa-msft commented 3 months ago

Hi, thanks for reaching out to blobfuse team. Unfortunately there is no way to unmount and keep the disk based cache. This is to protect against the potential stale data where after unmount the data has been modified in container through some other means. You can configure file-cache to maintain the data for a long time but as soon as you unmount it will wipe out the data. Only other way is to use tools like "AzCopy" and download the data into local disk and keep using that path for long enough and then do the copy again.

When you say, you need to remount the container, is there any reason for this? If you keep the container mounted, cache will remain valid and intact.

malu-alphabot commented 3 months ago

Generally what causes us to have to remount the container are input/output errors, they don't happen very often but they cause us to have to download the data again. They happen when there is an internet or electricity outage, for example.

Regarding the cache, we have already configured it to keep it for a long time and this helps in most cases since our historical data generally does not change.

vibhansa-msft commented 3 months ago

For the disruption due to external forces, there is nothing much Blobfuse can help with. Our caching gives data consistency a priority hence on mount we expect an empty temp cache and do not use the data which is already there before the mount. Also, allowing such thing in past had wrong expectations from customer end that data which already exists if it was modified locally then on mount it will get uploaded/synced back to container as well, which blobfuse2 does not do. Hence this feature was removed in later releases, and we always expect a clean cache on start. I will add this item to our backlog for some time in future but as of now there is no way out of this.

As a hack what you can try is, in file-cache there is a flag that allows you to mount with non-empty temp cache. You can try that out and see if it suits your needs but again data consistency is not guaranteed here. This is just a hack I am suggesting if you hit the disruptions quite often and data is fairly static in nature.

malu-alphabot commented 3 months ago

Okay, thanks for the answers!