gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.53k stars 1.07k forks source link

gfid-path by default #139

Closed amarts closed 6 years ago

amarts commented 7 years ago

As of today, glusterfs uses its own uniqueId for inode number like implementation called 'gfid'. Most of the internal features, tools consume this concept of gfid and the way to access the data by just knowing the gfid to provide more features.

But there is no clear (default) way of providing a approach, where if one know the GFID, a tool/command should provide the path for the file in the filesystem. Currently, such feature exists in the case of 'quota' being enabled.

Considering the feature is being used upstream for sometime now, we should make the option default in posix, thus allowing all the components to use the feature.

This helps many admins to know which file has 'split-brain' (and the name of the file to understand if its important or not) etc.

@pranithk @aravindavk @kotreshhr @raghavendrabhat

pranithk commented 7 years ago

I think the only thing we need to agree on is performance impact with and without this feature before doing this default.

ShyamsundarR commented 7 years ago

There was a devel thread that proposed improvements to this [1]

Are those valid and should we consider this as well when enabling this by default?

[1] http://lists.gluster.org/pipermail/gluster-devel/2015-December/047333.html (follow the thread across months, there are a few responses on this one)

kotreshhr commented 7 years ago

I went through devel discussion in http://lists.gluster.org/pipermail/gluster-devel/2015-December/047333.html. As mentioned in the link, the approach of storing basename as the value and having one xattr per hardlink avoids the crawl completely to get the path from gfid but it does raise below questions.

1. Space : This should be fine. The key max is 255 and value which is pargfid/basename is 292. It might be of concern with large number of hardlinks for the same file.

2. Number of xattrs will grow as number of hardlinks This is of concern. According the man page of listxattr VFS imposes a limit of 64kB for the entire list of extended attributes. If it's more than that, it's not longer possible to retrieve. So, considering the key format as "trusted.gfid2path.<hash-of-pargfid-bname" whos length is 50, we can have maximum of 65000/50 = 1300 hardlinks. This number would still reduced considering the other xattrs we use. Eventhough LINK_MAX is defined to be 127, most of the filesystems including xfs ignores it and allows to create more than that. I think it would be equal max value of data type used for link count in inode

  Please read the excerpt from man page below.
   _BUGS
   As noted in xattr(7), the VFS imposes a limit of 64 kB on the size of  the  extended
   attribute  name list returned by listxattr(7).  If the total size of attribute names
   attached to a file exceeds this limit, it is no longer possible to retrieve the list
   of attribute names._

3. Max size issue for xattr value? As per xattr man page, this should not be of concern. Please read the excerpt of the same below

   _The kernel and the filesystem may place limits on the maximum number and size of extended  
   attributes that can be associated with a file.  The  VFS  imposes  limitations  that  an
   attribute names is limited to 255 bytes and an attribute value is limited to 64 kB.  The list of attribute 
   names that can be returned is also limited to 64 kB (see BUGS in listx‐
   attr(2)).
   Some filesystems, such as Reiserfs (and, historically, ext2 and ext3), require the filesystem to be 
   mounted with the user_xattr mount option in order for extended user attributes
   to be used.
   In the current ext2, ext3, and ext4 filesystem implementations, the total bytes used by the names and 
   values of all of a files extended attributes must fit in a single filesystem
   block (1024, 2048 or 4096 bytes, depending on the block size specified when the filesystem was 
   created).
   In the Btrfs, XFS, and Reiserfs filesystem implementations, there is no practical limit on the number 
   of extended attributes associated with a file, and the  algorithms  used  to
   store extended attribute information on disk are scalable.
   In the JFS, XFS, and Reiserfs filesystem implementations, the limit on bytes used in an EA value is 
   the ceiling imposed by the VFS.
   In  the  Btrfs  filesystem  implementation,  the  total  bytes  used  for the name, value, and 
   implementation overhead bytes is limited to the filesystem nodesize value (16 kB by
   default)._

Let me know your thoughts @aravindavk @amarts @pranithk @ShyamsundarR @raghavendrabhat

amarts commented 7 years ago

All the 3 points mentioned above talks about the issues would arise with 'more' number of hardlink.

I am not aware of any application or anyone who would create more than 10 hardlinks for a given file. (If anyone know, please share example to enlighten me). So, as per the gluster's goals, this feature has more benefits than the issues it would cause.

I say go ahead.

amarts commented 7 years ago

Discussion on https://review.gluster.org/#/c/12202/

Patch Set 1:

Rafi, as discussed in Github issue #139, this is a needed feature. You can get it rebased, or check with Kotresh on what is his plan on this.

This patch only enables the existing gfid to path by default. Aravinda and me were working on improving the existing gfid to path as mentioned in #139. There are two approaches.

  1. Make improved gfid-to-path as separate xattrs and separate option. Once the dependent features switch to use this, we can get rid of the old pgfid xattr.

  2. Make it one shot development, where we directly modify the existing pgfid xattr with new xattr format.

I thought of going by first approach, let me know your inputs.

amarts commented 7 years ago

@kotreshhr i am fine with plan 1.

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: [WIP] New gfid2path framework

kotreshhr commented 6 years ago

I think we should restrict the support of gfid to path to max hardlinks of 500. As per my previous comment above, with this feature being enabled we could create around 1300 - 1500 hardlinks after which, the listgetxattr fails. Instead of reaching this scenario at worst case, we can limit the max hardlink support to 500. If the link count is above 500, we no more create gfid2path extended attribute. So we would not be able to convert gfid2path for hardlinks more than 500. This should be acceptable.

Thoughts? @amarts @pranithk @ShyamsundarR @aravindavk

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path framework

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path framework

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path framework

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path framework

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path framework

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path framework

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path framework

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path framework

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path framework

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path infra

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path infra

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path infra

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path infra

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path infra

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path infra

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path infra

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17488 has been posted that references this issue. Commit message: storage/posix: New gfid2path infra

aravindavk commented 6 years ago

Status

ShyamsundarR commented 6 years ago

One concern I had with storing an xattr per backpointer, was the need for different xattrs. Instead storing this in a single xattr would have been preferable.

On discussing this further with @kotreshhr the problems with a single xattr is that we would need some locking support (possibly local locking on the POSIX xlator itself, if ordering or operations is not a concern or is handled elsewhere (in other layers)) to ensure atomic updates.

With the xxhash and separate xattrs, this need goes away, hence is more preferable.

Further if there is ever an xxhash (or any other hash) collusion we would need to ensure we handle that. Just so that value is atomically modified. Is this needed?

Just stating, I am fine with an imposed 500 limit (or even less) for number of hard links that we support.

aravindavk commented 6 years ago

On discussing this further with @kotreshhr the problems with a single xattr is that we would need some locking support (possibly local locking on the POSIX xlator itself, if ordering or operations is not a concern or is handled elsewhere (in other layers)) to ensure atomic updates.

Along with locking, single xattr also adds overhead of reading xattr before every xattr update(Unlink, Rename and Hardlink). In case of Unlink, Read the current value, find which one is deleted, and remove that and update new value.

Further if there is ever an xxhash (or any other hash) collusion we would need to ensure we handle that. Just so that value is atomically modified. Is this needed?

Collusion chances are very very less, collusion can happen only in case of hardlinks. No issues if hash collusion happens for two different file since xattr stored on the file itself.

Just stating, I am fine with an imposed 500 limit (or even less) for number of hard links that we support.

I think we can increase this if required(But I think this is good practical limit), 500 limit is based on earlier implementation with MD5, with xxhash, hash length is reduced.

ShyamsundarR commented 6 years ago

On discussing this further with @kotreshhr the problems with a single xattr is that we would need some locking support (possibly local locking on the POSIX xlator itself, if ordering or operations is not a concern or is handled elsewhere (in other layers)) to ensure atomic updates.

Along with locking, single xattr also adds overhead of reading xattr before every xattr update(Unlink, Rename and Hardlink). In case of Unlink, Read the current value, find which one is deleted, and remove that and update new value.

This overhead should not be present, we should have this information cached in the inode (and updated on edits) and hence not need to read->modify->write this data. In the case where the inode is not cached, we will have this overhead, which IMO should be minimal as the protocol/server resolve and resume routines should have just visited the inode.

Further if there is ever an xxhash (or any other hash) collusion we would need to ensure we handle that. Just so that value is atomically modified. Is this needed?

Collusion chances are very very less, collusion can happen only in case of hardlinks. No issues if hash collusion happens for two different file since xattr stored on the file itself.

I am referring to the same file and hardlinks here. Although rare, if this happens we loose the previous data, hence IMO should have some fencing around this.

ShyamsundarR commented 6 years ago

@kotreshhr So for directories (IOW, mkdir rmdir (and rename cases for directories)) we will rely on the soft link data to track the path back, correct?

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17744 has been posted that references this issue. Commit message: posix/gfid2path: Block access to gfid2path xattr via mount

kotreshhr commented 6 years ago

@ShyamsundarR Yes, that's correct, we are relying on soft links to track path for directories.

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17744 has been posted that references this issue. Commit message: posix/gfid2path: Block access to gfid2path xattr via mount

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17785 has been posted that references this issue. Commit message: storage/posix: Add virtual xattr to fetch path from gfid

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17744 has been posted that references this issue. Commit message: posix/gfid2path: Block access to gfid2path xattr via mount

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17744 has been posted that references this issue. Commit message: posix/gfid2path: Block access to gfid2path xattr via mount

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17785 has been posted that references this issue. Commit message: storage/posix: Add virtual xattr to fetch path from gfid

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17744 has been posted that references this issue. Commit message: posix/gfid2path: Block access to gfid2path xattr via mount

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17785 has been posted that references this issue. Commit message: storage/posix: Add virtual xattr to fetch path from gfid

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17785 has been posted that references this issue. Commit message: storage/posix: Add virtual xattr to fetch path from gfid

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17839 has been posted that references this issue. Commit message: posix/gfid2path: Tool to set GFID to Path xattr in brick backend

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17744 has been posted that references this issue. Commit message: posix/gfid2path: Block access to gfid2path xattr via mount

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17785 has been posted that references this issue. Commit message: storage/posix: Add virtual xattr to fetch path from gfid

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17839 has been posted that references this issue. Commit message: tools/setgfid2path: Tool to set GFID to Path xattr in brick backend

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17839 has been posted that references this issue. Commit message: tools/setgfid2path: Tool to set GFID to Path xattr in brick backend

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17839 has been posted that references this issue. Commit message: tools/setgfid2path: Tool to set GFID to Path xattr in brick backend

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17785 has been posted that references this issue. Commit message: storage/posix: Add virtual xattr to fetch path from gfid

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17785 has been posted that references this issue. Commit message: storage/posix: Add virtual xattr to fetch path from gfid

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17869 has been posted that references this issue. Commit message: posix/gfid2path: Block access to gfid2path xattr via mount

gluster-ant commented 6 years ago

A patch https://review.gluster.org/17869 has been posted that references this issue. Commit message: posix/gfid2path: Block access to gfid2path xattr via mount