Open bradyforcier1 opened 3 months ago
Hey, I can confirm that Mountpoint doesn't cache any LIST responses today, so every readdir
operation would go directly to S3. The metadata cache is mainly used for lookup
operation. I didn't expect the result to be this much worse though, because we may need to do readdir
only once for each directory while traversing through them.
It would be really helpful to understand access pattern of the find
command so we will need debug logs from your test. Also, I would like to understand more about the structure of your bucket, like how many levels of subdirectory are under the test prefix. Could you share more information about that?
so we will need debug logs from your test
I would not be comfortable sharing the debug logs as they will contain details about the bucket names/path which may contain sensitive data. But the command is just using the bash find
utility to report all files recursively underneath a prefix
Also, I would like to understand more about the structure of your bucket, like how many levels of subdirectory are under the test prefix. Could you share more information about that?
In this case, the content of the root prefix we're recursively listing looks like:
├── prefix1
│ ├── a
│ ├── b
│ └── prefix1.1
│ ├── a
│ ├── b
│ ├── c
│ ├── d
│ ├── e
│ ├── f
│ └── g
├── prefix2
...
Where there are ~100 prefixes and each prefix contains ~145 objects spread across the subdirs. In this test there were a total of 1900 prefixes traversed
Thanks for sharing the structure. Seems like the problem will show up only when there is a lot of prefixes to traverse since I didn't face the same issue when trying to reproduce it with a few subdirectories. I will bring it back to the team and find out how we can test this and make directory listing more performant.
Tell us more about this new feature.
Background
Testing with latest version v1.7.2, I've noticed the performance of LISTs is significantly slower than other methods.
Test Setup
Test setup is recursively listing a prefix hierarchy with ~16,000 objects total
Mountpoint command:
sudo mount-s3 --read-only --allow-other --max-cache-size 50000 --cache /tmp/mtpt_cache --metadata-ttl 300 $BUCKET /tmp/mtpt_test
goofys command:sudo /usr/local/bin/goofys --type-cache-ttl 60s --stat-cache-ttl 60s --file-mode 0555 --dir-mode 0555 -o ro -o allow_other $BUCKET /tmp/goofys_test
awscli
mountpoint (caching is enabled, but it seems like LIST responses aren't cached so subsequent lists are still slow)
goofys