Azure / azure-storage-fuse

A virtual file system adapter for Azure Blob storage
Other
671 stars 207 forks source link

Subset of Containers Fail to Mount When Using "mount all" #1360

Closed michaelschmit closed 7 months ago

michaelschmit commented 7 months ago

Which version of blobfuse was used?

blobfuse2-2.2.1-1.x86_64

Which OS distribution and version are you using?

RHEL 8.9

If relevant, please share your mount command.

su -l [user] -c "blobfuse2 mount all /mount/blobfuse --config-file=/mount/config.yaml"

What was the issue encountered?

A handful of containers of the ~130 containers fail to mount with an empty "ERROR: " message. Also, the permissions on those particular directories are different than the others mounted. It will consistently fail to mount the same number of containers, but won't be the same containers each time.

Have you found a mitigation/solution?

No

Please share logs if available.

Please ask for particular logs if needed. /var/log/blobfuse2.log doesn't seem to contain anything useful and the error message is empty as indicated above.

michaelschmit commented 7 months ago

Looks like in the .blobfuse2 directory for the user, the containers that fail do not get a pid in the mount*.pid files

vibhansa-msft commented 7 months ago

Hi @michaelschmit thanks for reaching out to Blobfuse team.

Is there any particular reason to mount all 130 containers ? As this will cause 130 instances of blobfuse running on the same VM/Node and may create some sort of resource crunch on the system.

michaelschmit commented 7 months ago

I'll answer your bullets and then add another comment about the debugging I have done.

michaelschmit commented 7 months ago

Here is more information about what I am seeing. When I kick off the "blobfuse2 mount all" operation the first ~126 succeed without issue but the last 5-6 fail with:

Failed to mount container xxxxxxxxx : Error:

This occurs whether or not I am mounting using a "su -l [user]" or "sudo -u [user]" or just running as the current user. With that said, I am going to change the title of this issue to reflect that.

Like I mentioned, when I diff the _config.yaml files the only differences are the container names. For the container that failed to mount, I get a .pid file but it is empty. If I manually try to mount a failed container with "blobfuse2 mount /mount/xxxxxxx --config-file=[home dir]/.blobfuse2/config_*.yaml it still fails. With the same empty error.

What I discovered just now, is if I unmount a previous mounted container with "blobfuse2 unmount /mount/xxxxxx", I am able to mount a previous failed container. This likely indicates some threshold issue, where removing one then allows another to succeed. I am going to do some more research to see if I am reaching a mount threshold in the Azure VM or something within blobfuse.

Another interesting data point is that on a previous instance where I was debugging with slightly less mounted containers. (this time ~128 vs ~131), 2 containers failed to mount but I was able to eventually get them to mount individually over many attempts.

michaelschmit commented 7 months ago

Since the error returned is empty, I have been trying to figure out where the failure is occurring. I am looking at the individual mount command (mountCmd) since it happens on subsequent single mount commands as well. I think this helps narrow the scope a little. I don't end up seeing the critical message "Starting Blobfuse2 Mount", so I have to assume for now it is not getting to that point. I have experimented with trying to disable monitoring, just in case that had something to do with it, using the config:

health_monitor:
    enable-monitoring: false

But perhaps that is disabled by default.

michaelschmit commented 7 months ago

OK, I see that I was wrong about the stdout/stderr statement. Here are the syslogs during the mount operation:

Mar 12 18:28:50 [host] blobfuse2[526721]: LOG_INFO [mount.go (415)]: mount: Mounting blobfuse2 on /mount/xxxxxx
Mar 12 18:28:50 [host] blobfuse2[526721]: LOG_DEBUG [mount.go (453)]: mount: foreground disabled, child = false
Mar 12 18:28:50 [host] blobfuse2[526721]: LOG_INFO [mount.go (471)]: mount: Child [526732] terminated from /mount/xxxxxx
michaelschmit commented 7 months ago

Had another instance where some of the mounts failed during the initial mount all, but then I was able to get one of them individually mounted. The rest failed no matter how many attempts were made.

michaelschmit commented 7 months ago

The difference I see between a failed mount and a successful one is the log line:

Failed: LOG_DEBUG [mount.go (453)]: mount: foreground disabled, child = false

Success: LOG_DEBUG [mount.go (453)]: mount: foreground disabled, child = true

michaelschmit commented 7 months ago

Seems like I may be running into a limitation in the go-daemon that is being imported. I've looked at ulimit -a but it looks like max user processes is 62840, so I am probably not running into that.

vibhansa-msft commented 7 months ago

Thanks for providing the detailed info.

michaelschmit commented 7 months ago

Can you enable log debug mode through your config file or cli so that we get more details into why some mount fails.

The log lines I posted above is with this setting in the config_*.yaml:

logging:
    level: log_debug
    type: syslog

The only debug line that provides any clue is the LOG_DEBUG [mount.go (453)]: mount: foreground disabled, child = false

Seems like 125 containers appears to be the hard limit. If I observe otherwise I will make sure to update. I have scaled the VM up to see if it was a hardware limitation, but it doesn't appear so. I also temporarily disabled selinux, but that had no effect either.

I can try to kick off and background the processes myself. Need to think through that a bit on implementation.

vibhansa-msft commented 7 months ago

if you manually mount using a script, does it still create the same limitation of 125 containers? Just to validate the theory of having some OS/hardware level limitations.

michaelschmit commented 7 months ago

Another thing I tried today was moving the blobfuse2 testing to a physical server. Interestingly enough, I was only able to mount 116 containers on the physical system, which is a beefier box (72 logical cores and 192GB of memory) than what I am using in Azure. I strace'd a working mount and one that failed. I am currently trying to look through the diff for a smoking gun, but haven't come up with anything yet.

I've tried running the mount from a script and that had no effect.

michaelschmit commented 7 months ago

It would be helpful if you could try to reproduce it from your side as well. If indeed you run into a ~100 container limit then perhaps that should be added to the limitation section of the README.md. I don't think it is unusual to have a large number of containers because from an Azure blob storage standpoint, if you want to delete data, you can delete an entire container at once or you can delete a single blob at a time (since directories don't really exist in blob storage). Deleting a container performs a lot faster than deleting hundreds or thousands of blobs.

michaelschmit commented 7 months ago

OK, after digging in the strace I found the culprit: <... inotify_init1 resumed>) = -1 EMFILE (Too many open files)

The setting that is hanging things up is: cat /proc/sys/fs/inotify/max_user_instances

Increasing this value via echo 256 | sudo tee /proc/sys/fs/inotify/max_user_instances or via /etc/sysctl.conf allows all the containers to mount.

michaelschmit commented 7 months ago

With that said, the currently implementation is probably not long term scalable for us (with the daemon overhead for every mount), but this at least is a work around in the short/intermediate term to get us by.

vibhansa-msft commented 7 months ago

Thanks for sharing this. This is great insight and I really appreciate you digging deep to figure this out. I believe if it's about 'inotify' then this limit might be coming as we register for file-change-notification on our config file so that user can change some of the settings dynamically in config file and blobfuse can reconfigure itself on the fly (not all configs can be modified dynamically).

As I mentioned earlier mounting entire storage account in one daemon is the long-term solution which is already in our todo list. Let me bring this up to our PM and see if we can priorities that item.

vibhansa-msft commented 7 months ago

For now blobfuse is working as expected and mounting too many containers is an issue I agree. I will close this item here and we will check on mounting entire account in one shot seperatly. Feel free to reopen if there is anything else you need from blobfuse end to fix.

For the documentation part, your feedback is well received, and we will update our README for the same.

vibhansa-msft commented 7 months ago

@michaelschmit Let me now if the linked PR provides sufficient info on this or not.

michaelschmit commented 7 months ago

The PR seems sufficient for documentation.

If it is the inotify on the config, another option would be a setting in the config to toggle the dynamic loading functionality. It is pretty quick and easy to unmount and mount the container to pick up config changes. But don't feel pressured to execute on that. If you don't think that is useful we can always modify the source ourselves in the future.

vibhansa-msft commented 7 months ago

Unmount and mount are not acceptable to many of our customers as unmount means wipe out the local cache. There are reasons why we chose to load the config dynamically for small changes in config while changes like storage account or container still needs a remount. Once we have the mount of an storage account as a feature it will save lot of resources as well on a given VM/node.