Azure / azure-storage-fuse

A virtual file system adapter for Azure Blob storage
Other
660 stars 206 forks source link

Can't create hard links #599

Closed tikoflano closed 3 years ago

tikoflano commented 3 years ago

Which version of the blobfuse was used?

1.3.6

Which OS (please include version) are you using?

CloudLinux 7.9

What problem was encountered?

It can't create hard links

Have you found a mitigation/solution?

No.

I know this is a known limitation. But is this implementation in the roadmap? I created this issue as a feature request.

vibhansa-msft commented 3 years ago

Can you elaborate on what kind of hard-link you are trying to create, I mean the source and target both are inside container or how ?

tikoflano commented 3 years ago

My exact use case is to use blobfuse to mount my /backup folder to my Blob Storage so it can store my cPanel backups. It seems that cPanel's backups use heavy use of hard links to work, specially when using incremental backups (which I would like to use).

Answering your question, both source and target are inside the container.

The exact steps to reproduce:

mkdir /backup
blobfuse /backup --tmp-path=/mnt/resource/blobfusetmp --config-file=/root/fuse_connection.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120
touch /backup/test
ln /backup/test /backup/test_hl
vibhansa-msft commented 3 years ago

@tikoflano: Thanks for sharing your use case. I see that you are using hardlinks for incremental backups. As of now Blobfuse does not support hard-linking. Even if blobfuse does support it, Azure Storage does not support it directly hence keeping a count of hard-links (as shown in ls -l) will be difficult.

tikoflano commented 3 years ago

Maybe this functionality could be added as an optional mount option (like --no-symlinks, --no-hardlinks default to false seems ok). This should be optional and default to false because I think it would decrease permormance drastically, but for some use cases might be ok.

When this option is enabled blobfouse could keep some metadata information counting the hard links? If a hard link is created the count increases, if a file is open check if the hard link data to get the original file in the Blob Storage, if a file is deleted check first if it has any other hard links, etc? This is just how I imagine how it would work, I have no clue how a filesystem actually works.

vibhansa-msft commented 3 years ago

@tikoflano : Number of links can be managed with metadata as you suggest however that will have lot of impact on performance. On link creation or deletion we will end up searching all the hardlinks and updating them. Taking one step back from this, creation of hard-link on linux means the file is not replicated its just the meta. In Azure Storage I am not aware of any such feature, which means we will need to copy the data to another file. This extra copy means we are using more storage space and it also defies book-keeping of the link counts as the file are fairly independent now and not related in any way.

tikoflano commented 3 years ago

About the performance impact, I totally agree, this feature should be optional (--no-hardlinks), and true by default. I wrote it the other way around by mistake previously.

About the data duplication I think that is not needed, this I what I expect to happen, assuming I have blobfuse on /backup and hard links enabled :

vibhansa-msft commented 3 years ago

So when you do "ln /backup/testfile.txt /backup/hardlink.txt" what will happen on the storage. If we just increase the hardlink count for "testfile" then we do not have any reference of "hardlink.txt" file. So we need to create that file on storage and maintain it. Now there are two ways to do that. Either we do it just like we do a soft-link or copy entire data. As data is not copied, deleting "testfile.txt" will loose the the content and all other linked files will become dangling. The reference counting principal has to be supported by the Azure Storage itself for this feature to work.

vibhansa-msft commented 3 years ago

I hope I am able to explain the problem. "/backup/testfile.txt" is sort of the master file. All other links are merely shadowing this. Deleting this file will loose the data if we do not create copy of the data. In linux it works due to reference counting. If we maintain reference counting in metadata then deleting the master file will even loose the metadata as well. If we just decrement the count and keep the blob as is, then the listing calls like "ls -l" will still show up that file which will be an inconsistent state because file was deleted by user.

tikoflano commented 3 years ago

Thanks for your patience and time!

I can think of some workarounds that seems feasible, but would be probably too complicated to implement/mantain for such a small use case. You managed to convince me that some changes have to be made from the Azure Storage side first.

I see that other projects (https://github.com/s3fs-fuse/s3fs-fuse#limitations) have the same limitation, probably due the same problems. Also I see that ObjectiveFS supports hard links, but it works only on AWS and GCP, so maybe in the future this could be possible.

vibhansa-msft commented 3 years ago

I have added this item to our backlog, so in future if this is feasible we will support.