cs3org / reva

WebDAV/gRPC/HTTP high performance server to link high level clients to storage backends
https://reva.link
Apache License 2.0
171 stars 113 forks source link

eos: fids vs inodes #3549

Open aduffeck opened 1 year ago

aduffeck commented 1 year ago

Hey, we are currently playing with eos on the edge branch and got a little confused regarding the usage of inodes vs. fids.

The EOSClient interface defines both GetFileInfoByInode and GetFileInfoByFXID methods for statting files and directories but while the binary client implements both versions only the inode version is implemented in the grpc client and the fxid one isn't actually being used anywhere.

We are wondering about the reasons and the implications of that. EOS itself seems to be a little bit confusing in that regard as well, e.g. directories can be statted using the inode but not using the fid unless the fid is used as the pid in the identifier, some commands (e.g. the attr one) do not allow addressing entities by their inode but only fids, etc.

We are also wondering about the uniqueness of ids and inodes, are they guaranteed to not be reused by the filesystem? And what happens in scenarios like restoring a backup, would it be possible to set either of them to the initial value?

@labkode @gmgigi96 Can one of you shed some light onto this? \cc @butonic

aduffeck commented 1 year ago

Another related question just came up: the inode entry in find and md responses seems to be always set to 0 for both files and directories. We are using the gitlab-registry.cern.ch/dss/eos/eos-all:5.0.31 image referenced by https://gitlab.cern.ch/eos/eos-charts/-/tree/master/server. Do you know if that's supposed to work? image

aduffeck commented 1 year ago

After some more research I understand that the inode is derived from the fid using a legacy or a new encoding.

There are still a few remaining question though:

  1. Are file ids/inodes guaranteed to be unique or are the eventually gonna be reused?
  2. Can file ids/inodes be restored e.g. when restoring a backup?
  3. The grpc API always seems to return 0 as the inode, is that a known issue?
  4. Is there any reason not to use the fid (or fxid) instead of the inode in reva? That seems to be the more direct way and avoid some of the aforementioned obstacles.
labkode commented 1 year ago

Hi @aduffeck,

EOS have two namespaces, one for files and one for directories. Each of these namespaces starts with 0, 1, .... so numbers clash, that's why you need to use fid for files and pid for parents. As this is a bit cumberstone, the inode field was introduced, so it shifts the directory namespace by some bits so the inode field can be used both for files and directories without adding extra logic in the clients.

I can talk for the eos binary interface because I wrote it. The EOS inode field is what ownCloud expects as a file ID.

Please note that the ownCloud fileid for files is not the EOS inode of the file, but the EOS inode of the version folder of the file .sys.v#.myfile. The reason is that many applications (sync client) will overwrite the existing file and a new file is created (new inode), therefore invalidating previous shares, public links, etc ...

Pointing to the version folder we avoid that as we derefernce to the latest version available of the file.

Let me know if you need more information.

Regarding GRPC @gmgigi96 is off until early next year to clarify this part, however I will focus on the eosbinary one as it's the one used in production at CERN (not the GRPC one, still experimental).

aduffeck commented 1 year ago

Thank you @labkode, that explains a lot of things already.

I'll try to figure out why I don't get the inodes from the grpc API meanwhile.

gmgigi96 commented 1 year ago

Hi @aduffeck, which version of EOS are you using?

aduffeck commented 1 year ago

Hey @gmgigi96, we are deploying eos using the helm chart from https://gitlab.cern.ch/eos/eos-charts, the version it uses currently is 5.0.31.

Please note that we had to switch back to the binary client for now unfortunately as the GRPC server seems to be a little too experimental. But in case you are curious about the changes we made to the grpc client while working with it in our branch you can find the commits here.