LittleFlower2019 / s3fs

Automatically exported from code.google.com/p/s3fs
GNU General Public License v2.0
0 stars 0 forks source link

Opening for READ ONLY could be optimized. #350

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
This is more of a feature request:

When opening a file on an s3fs mount, s3fs will download the entire file to a 
temporary file on the local disk to use. This is fine in general, but is 
inefficient when opening the file in 'r' mode only.

Use case:
Imagine a bioinformatics firm that stores large (>10GB) genome sequence files 
on S3. Alignment tools will want to only read these large files, scanning 
through them. Currently, they'll have to do extra work to download the file 
from S3 to the local machine. If it's being done in parallel on multiple 
machines then each machine will have to copy the file from S3. Using s3fs makes 
it a bit easier by abstracting away the downloading of the object from S3, but 
it still takes several minutes to download such a large file!

A better way would be for s3fs to recognize that the file is being opened in 
read-only mode and instead STREAM the file from S3 to the application, loading 
reasonably sized chunks from S3 into an in-memory buffer. This significantly 
reduces the overhead that the application sees when opening the file 
(downloading the ENTIRE object from S3). The application could also seek 
through the file doing random-access reads, and s3fs could just fetch the 
corresponding chunks from s3.

I'm hoping that we could add this feature to s3fs soon, as it would 
significantly improve the performance of applications like the one I've 
described above.

Original issue reported on code.google.com by jlhawn.p...@gmail.com on 14 Jun 2013 at 6:23

GoogleCodeExporter commented 8 years ago
Hi,

Thank you for a demand to s3fs.
There were some similar demands with a past, and I am worried about the 
overhead.
I want to examine this issue for your demand, but now I'm fixing the potential 
bugs that s3fs codes has.
I should accept this issue and would work for it, but wait a moment until I can 
start to work.

Regards,

Original comment by ggta...@gmail.com on 20 Jun 2013 at 1:41

GoogleCodeExporter commented 8 years ago
There are many applications where this would make the difference for me.

 * It would make new things possible - like opening many files on a VPS with low disk space
 * It would save money!

Original comment by yarden...@gmail.com on 8 Aug 2013 at 4:10

GoogleCodeExporter commented 8 years ago
Hi, yardenack

I updated new version v1.72 today.
This version is changed about open/close/upload/download logic.
(If you are only opening the file, s3fs does not download the file object.)

Please check this version.
Regards

Original comment by ggta...@gmail.com on 10 Aug 2013 at 5:27

GoogleCodeExporter commented 8 years ago
I am using s3fs for just this use case: large genome sequence files (10-30GB) 
hosted in a public s3 bucket.

I have caching turned on, so it should definitely still pull the whole file 
down ...but there is a 6s delay when doing a "head" to /dev/null, while hitting 
the cache file directly is nearly instantaneous.

Presumably something time-consuming needs to happen before falling through to 
the cache file?

Original comment by scottsmi...@gmail.com on 27 Sep 2013 at 10:37

GoogleCodeExporter commented 8 years ago
Hi,

About logic for download object(GET), for example if the object is large(over 
20MB) and read request area is not ever cached, s3fs loads 20MB(max).
So this is depending on information made a cache, this condition is complicated.

s3fs manages file cache by small parts, each part size is 50MB. And when s3fs 
downloads over 20MB, s3fs loads parallel request as each 10MB.(These size is 
hard coding now.)

Then your first downloading object(head command) takes a time for downloading 
50MB.
Probably I think it was 6s....

I'll need to examine about these size parameter in s3fs as option.
Now each part size(50MB) is not small.

If you have a idea about this, please let me know.

Best Regards,

Original comment by ggta...@gmail.com on 27 Sep 2013 at 5:14