fsspec / s3fs

S3 Filesystem
http://s3fs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
858 stars 271 forks source link

How to Increase async httpconnection limit? #873

Open ion-elgreco opened 4 months ago

ion-elgreco commented 4 months ago

I want to increase the http connection limit to see if I can saturate my network more but I don't see a way on how to pass this through the FileSystem, I went through the code and aiobotocore as well but no luck yet. Increasing the max_connection_pool already helps a bit though which increases io by 2x.

Any suggestions on how to increase the concurrency?

martindurant commented 4 months ago

There are many levers to pull, actually. How are you setting the pool, what kind of benchmark are you running, and do you have an idea of what your current bottleneck may be caused by? Since fsspec generally maintains its own IO thread/loop, a significant increase in performance is something I'd be happy to bake in.

ion-elgreco commented 4 months ago

@martindurant I am currently passing this to the S3FileSystem: config_kwargs={"max_pool_connections": 50},.

I was checking with iftop what peak transfer rate was, it was just 50Mb out of 1Gbps network capacity (aks -> LakeFS on aks -> azure blob). It took around 15secs to read 6000 txt files. I think it could go faster but not sure :)

martindurant commented 3 months ago

Would you mind making a graph of max_pool versus throughput? How many files (~ coroutines) are in flight?

ion-elgreco commented 3 months ago

@martindurant do you have some examples on how to access these things during execution?

martindurant commented 3 months ago
martindurant commented 2 months ago

ping, since this just came up on another thread. @ion-elgreco , have you had a chance to do any more benchmarking or testing?

ion-elgreco commented 2 months ago

@martindurant hey, I parked improving it further since it worked "good enough"