derrickchoi / s3fs

Automatically exported from code.google.com/p/s3fs
GNU General Public License v2.0
0 stars 0 forks source link

Enhancement Request: Optimize copying/moving from one bucket to another #173

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Copy or move a large file from one mounted bucket to a different mounted 
bucket owned by the same Amazon account.

What is the expected output? What do you see instead?
1. The process should complete rather quickly, regardless of file size. 
Instead, the entire file is first downloaded and then uploaded again.

What version of the product are you using? On what operating system?
1.40 on openSUSE 11.4 running Fuse 2.8.4

Please provide any additional information below.
Amazon supports copying and/or moving files between buckets owned by the same 
account without downloading and re-uploading the entire files. Is it possible 
for s3fs to detect if two mounted buckets are in the same account and perform 
this same optimization, copying/moving the files directly on the server instead 
of downloading and re-uploading them?

Original issue reported on code.google.com by beamerb...@gmail.com on 16 Mar 2011 at 10:25

GoogleCodeExporter commented 8 years ago

Original comment by dmoore4...@gmail.com on 17 Mar 2011 at 12:47

GoogleCodeExporter commented 8 years ago
I might have a related defect, but this brings me to a larger question: can the 
filesystem even manage how something like `cp` works?

This isn't just between buckets. It's anywhere. When I use a command-line S3 
utility to make a copy of a large file (~100mb) within the same bucket, it 
completes in a couple of seconds. S3 takes care of making the copy locally on 
Amazon's servers. However, if I use s3fs and `cp` to make a copy of a file 
within the same mounted bucket, it downloads the entire file locally then 
re-uploads it. This is horribly inefficient. But is there even something that 
can be done about it? Or is it built in to `cp`'s code to do that, and that's 
the way it is?

Original comment by beamerb...@gmail.com on 17 Mar 2011 at 2:56

GoogleCodeExporter commented 8 years ago
Project member Moore (I don't know your first name):

Should I report a separate bug regarding `cp` downloading and re-uploading a 
file within the SAME bucket? Or is this the same issue? Any idea on whether it 
can be fixed (I don't mean to rush, I'd just curious as to if it's even 
possible)?

I can understand if there's no way around it downloading and re-uploading when 
copying files BETWEEN buckets (even though S3 supports it without downloading 
and re-uploading), but I would hope there was a way around it downloading and 
re-uploading when copying files within the SAME bucket.

Thanks,

Nick

Original comment by beamerb...@gmail.com on 25 Mar 2011 at 8:49

GoogleCodeExporter commented 8 years ago
I'm Dan

As far as I know the copy of a file to another file within a bucket is done on 
the server side.  The original author implemented this a long time ago.

Run s3fs in foreground mode and do the copy, you should be able to gleam what's 
going on.

This *might* be an issue with use_cache -- I typically do not use use_cache as 
my use model is mainly for storage vs. a dynamic file system.

I'll take a closer look later.

As for bucket-to-bucket server side copies, that's something that isn't 
currently supported.  I doubt that it would be easy since s3fs just takes its 
directions from FUSE i.e. do this, do that -- providing little information 
about context.

We might get lucky after a little investigation and find that its easy, but my 
gut doesn't tell me so.

Back to the server-side copy, if we find that it works without use_cache, but 
it does with use_cache (or has since been broken in some other way), then that 
will be a new issue.

Bucket-to-different-bucket copying might be a good candidate for inclusion in 
the s3fs "utility" mode.

...just rambling...

I'm not sure if bucket-to-different-bucket copying is supported across regions 
though. Both buckets must be in the same region for this to work, I think.

Original comment by dmoore4...@gmail.com on 25 Mar 2011 at 9:03

GoogleCodeExporter commented 8 years ago
I take it back, the server side copy wasn't implemented, it was the server side 
rename (which does a server side
copy).  So renames are fast, but copies are not.

As you observed, s3fs downloads, then uploads the file upon a copy.

Again this is a function of FUSE, fuse does not have a "copy" operation which 
gives two paths as arguments. (It does
have a rename operation which passes two arguments though).  Check out fuse.h 
for more info on available operations.

So what FUSE passes to s3fs is the following sequence (here I am copying file 
named "9" to a file named "90"):

s3fs_getattr[path=/90]
s3fs_getattr[path=/9]
s3fs_getattr[path=/90]
open[path=/9][flags=32768]
   get_local_fd[path=/9]
      downloading[path=/9][fd=5]
s3fs_getattr[path=/90]
s3fs_create[path=/90][mode=33188][flags=32961]
   create_file_object[path=/90][mode=33188]
   get_local_fd[path=/90]
      downloading[path=/90][fd=6]
s3fs_getattr[path=/90]
s3fs_read[path=/9]
s3fs_write[path=/90]
s3fs_getattr[path=/9]
    stat cache hit [path=/9]
s3fs_flush[path=/90][fd=6]
    calling get_headers [path=/90]
   put_local_fd[path=/90][fd=6]
      uploading[path=/90][fd=6][size=2]
s3fs_release[path=/90][fd=6]
s3fs_flush[path=/9][fd=5]
s3fs_release[path=/9][fd=5]

s3fs is a slave to FUSE, it doesn't know what "macro" operation is underway.

It doesn't look like a server side copy is possible for a cp command.  Luckily 
we got it for the mv command.

Original comment by dmoore4...@gmail.com on 25 Mar 2011 at 11:35

GoogleCodeExporter commented 8 years ago
Sorry, I don't think that this is a possibility due to how FUSE and s3fs works. 
Please accept this as a limitation of the system.

Original comment by dmoore4...@gmail.com on 7 Apr 2011 at 2:47

GoogleCodeExporter commented 8 years ago
Alright, thanks. I was afraid of that. Thanks for looking into it.

Nick

Original comment by beamerb...@gmail.com on 30 Apr 2011 at 12:53