Open fredDJSonos opened 4 months ago
I realise that writing this, that my situation should improve by raising the metadata-ttl
(make it unlimited)
Well, in fact. Setting metadata-ttl
to unlimited does not appear to change anything. We still observe a giant amount of ListBucket
commands (45% of all the requests), and they cost 90% of the bill.
I’m not an expert on how AWS charges ListBucket
, but is it expected that ListBucket
is so much more expensive than GetObject
and HeadObject
(respectively 5% of the total cost) ?
Unless there is a bug in your usage of ListBucket
? You might transfer the entire prefix of /stuff/a/ab
when you just want to test the directoryness of /stuff/a/ab
. Is it a possibility ?
Hey @fredDJSonos,
I realise that writing this, that my situation should improve by raising the metadata-ttl (make it unlimited)
Yes, if your workload can tolerate stale entries or even its expected that the bucket content won't change, we'd recommend picking the longest reasonable TTL. If you never expect the content to change during the workload, you can use --metadata-ttl indefinite
. This caches results for lookup
FUSE requests, which are used by the Kernel to build its own tree of files and directories, but also to serve open and stat system calls.
I wonder if you could propose an implementation that does no lookup for intermediate folders. You could pretend to fuse that all possible directory paths exist, without checking that on S3. When there is a syscall to get a file or list the content of dir, then and only then, you would call S3.
Thanks for sharing the suggestion. It's something we've considered. The method for learning about a directory entry in FUSE does not include the purpose of the request, and so it's not possible to know if the application intends to - unfortunately the protocol does not indicate if it wants to learn about a file or a directory. This means that once we tell the Kernel that some path component is a directory, it will treat it like a directory from that point on without consulting Mountpoint. It's also a challenge faced in #891 where we want to allow access to directories within a bucket without having access to the paths at the root.
Well, in fact. Setting
metadata-ttl
to unlimited does not appear to change anything. We still observe a giant amount ofListBucket
commands (45% of all the requests), and they cost 90% of the bill.
It does depend on the key space. If your workload can tolerate stale entries or even its expected that the bucket content won't change, we'd recommend picking the longest reasonable TTL. It will ensure that repeated lookups can be served from the cache and not need to go to S3. That means for opening files /stuff/a/ab
and /stuff/a/ac
, the lookup
requests (FUSE) for /stuff/
and /stuff/a/
can be served from cache the second time around.
The number of requests for opening a path without metadata caching could be expressed like O(depth * n)
, where n
is the number of files. depth
is at what level the file is nested. By turning on metadata caching, you can eliminate depth
here but you still have lookups for each file.
If its possible, performing a list of the directory before opening the files can help here as it will perform one listing through the prefix which will allow all the children to be cached.
I’m not an expert on how AWS charges
ListBucket
, but is it expected thatListBucket
is so much more expensive thanGetObject
andHeadObject
(respectively 5% of the total cost) ?
ListObjectsV2 (referenced as ListBucket in billing) does cost more than object-level requests. The pricing is available for your region on the billing page under "Requests & data retrievals". https://aws.amazon.com/s3/pricing/
Unless there is a bug in your usage of
ListBucket
? You might transfer the entire prefix of/stuff/a/ab
when you just want to test the directoryness of/stuff/a/ab
. Is it a possibility ?
It's not possible to avoid the traversal, although we sure wish that the protocol could support it. We actually implement a small amount of caching (1 second) even when caching is turned off to try to reduce immediately making calls to the same directory again (details. The best option if you can though is to extend the metadata TTL for as long a duration as works for your workload.
Ultimately, I'd make the following recommendations:
--metadata-ttl indefinite
./stuff/
as a common path at the root. If that's part of your bucket, I'd recommend to use the argument --prefix stuff/
and then that never even needs to be looked up in S3.Thanks for your answer. Just to be clear, our last experiment was with --metadata-ttl indefinite
and my previous comment was about the fact it did not changed anything (same amount of ListBucket
)
New config:
mountOptions:
- allow-other
- region us-east-1
- cache /tmp # specify cache directory, relative to root host filesystem
- metadata-ttl indefinite # https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#metadata-cache
- max-cache-size 512 # 512MB maximum cache size
- max-threads 64 # increasing max-threads
I wonder if you could propose an implementation that does no lookup for intermediate folders. You could pretend to fuse that all possible directory paths exist, without checking that on S3. When there is a syscall to get a file or list the content of dir, then and only then, you would call S3.
Thanks for sharing the suggestion. It's something we've considered. The method for learning about a directory entry in FUSE does not include the purpose of the request, and so it's not possible to know if the application intends to - unfortunately the protocol does not indicate if it wants to learn about a file or a directory. This means that once we tell the Kernel that some path component is a directory, it will treat it like a directory from that point on without consulting Mountpoint. It's also a challenge faced in #891 where we want to allow access to directories within a bucket without having access to the paths at the root.
I guess you talk about the lookup handler you have to provide to fuse
. When you reply with fuse_reply_entry the struct, fuse_entry_param has two fields attr_timeout
and entry_timeout
that can probably be used to tell the kernel to stop caching anything. The fuse API has to be robust against any filesystem that mutates on its own (so it is legit to have a pseudo imaginary directory that suddenly turns into a regular file).
Then it also solved #891
At the end this gives a weirdo filesystem, where all the possible directories appear to exist. But since directories don’t really exist in S3, that’s ok.
In case there is a problem when the kernel does a lookup
on a real file and we pretend it is a directory. (Maybe that breaks the open
syscall. I’m not familiar with the details of linux VFS.)
There could be an intermediate strategy: We only do a HeadObject
in lookup
(no more ListBucket
)
That still works for #891.
Mountpoint for Amazon S3 version
mount-s3 1.7.0
AWS Region
us-east-1
Describe the running environment
Running inside an EKS cluster with mountpoint-s3-csi-driver.
Mountpoint options
What happened?
Our business code essentially open a file at a given path for reading its content. It might
stat
a given path. But no dir listing whatsoever happens. If we were to use S3 directly, we would just callGetObject
, and nothing else.We investigate to use
mountpoint-s3
and discover that the dominant cost (from cost explorer) is theListBucket
action.For historical reasons, we have a folder structure inherited from a real FS. It looks like this:
We have 150 millions files distributed in this folder structure.
I’m aware of this issue https://github.com/awslabs/mountpoint-s3/issues/770. I wonder if you could propose an implementation that does no lookup for intermediate folders. You could pretend to fuse that all possible directory paths exist, without checking that on S3. When there is a syscall to get a file or list the content of dir, then and only then, you would call S3.
Relevant log output
No response