gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.73k stars 1.08k forks source link

md-cache: Improve readdir(p) performance for fuse #3927

Open mohit84 opened 1 year ago

mohit84 commented 1 year ago

The fuse has passed readdir(p) buffer size 4K while fetching entries from a server. In the case of a big directory(having more than 1M entries), it takes time to fetch all the entries. In 4k buffer we can fetch maximum 18 to 20 entries even gluster RPC supports to fetch 128K buffer from the server. The gluster already provides a readdir-ahead xlator that fetches the entries in the background and keep the entries in the local cache and server them as fuse wind a readdirp call. The issue with readdir-ahead is it does not handle cache invalidation so data may be inconsistent. The other issue readdir-ahead only uses readdirp call so in case if a user has disabled readdirp it won't work and the user would not be able to use it.

Solution: 1) To avoid it implement an entry cache logic at md-cache xlator. The md-cache has already provided cache invalidation logic so there is no consistency issue. The entries are fetches during readdir(p) call the only difference is the md-cache has changed the buffer size to 128k and saved the entries in the local buffer. After receiving the next wind operation the md-cache returns the entries to the fuse. The local buffer is valid only between opendir and close dir. As the application call closedir the buffer is cleaned up and cleanup the memory. 2) Avoid fuse_gfid_set for / in fuse_getattr, the fuse will set the gfid only in case of error. This key is to use by POSIX to validate gfid link and in case if a link does not exists it has created a link. The link validation is not required if inode->gfid is not NULL and for (/) It would be present so it has generated unnecessary lookup traffic while running " ls -l " on big directories because md-cache is not able to serve this key(gfid-req).

Note: To test the patch I have setup 1 brick on a physical server and has created 1.2M files(10kb) in a single directory. I have found the results

 1) With patch Default (while readdirp is enabled)
            time ls -l /mnt/test/ | wc -l
            1228254
           real 3m21.151s
           user 0m9.207s
           sys  0m25.539s
     2) Disable readdirp
           time ls /mnt/test/ | wc -l
           1228253
            real    0m13.337s
            user    0m3.239s
           sys  0m1.071s
    3) Without patch Default(when readdirp is enabled)
          time ls -l /mnt/test/ | wc -l
           1228251
           real 5m49.055s
          user  0m9.756s
          sys   0m28.599s
    4)  Disable readdirp
          time ls /mnt/test/ | wc -l
           1228253
          real  1m49.754s
          user  0m3.904s
          sys   0m3.462s
stale[bot] commented 1 year ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.