Closed GoogleCodeExporter closed 8 years ago
Original comment by dmoore4...@gmail.com
on 17 Mar 2011 at 12:47
I might have a related defect, but this brings me to a larger question: can the
filesystem even manage how something like `cp` works?
This isn't just between buckets. It's anywhere. When I use a command-line S3
utility to make a copy of a large file (~100mb) within the same bucket, it
completes in a couple of seconds. S3 takes care of making the copy locally on
Amazon's servers. However, if I use s3fs and `cp` to make a copy of a file
within the same mounted bucket, it downloads the entire file locally then
re-uploads it. This is horribly inefficient. But is there even something that
can be done about it? Or is it built in to `cp`'s code to do that, and that's
the way it is?
Original comment by beamerb...@gmail.com
on 17 Mar 2011 at 2:56
Project member Moore (I don't know your first name):
Should I report a separate bug regarding `cp` downloading and re-uploading a
file within the SAME bucket? Or is this the same issue? Any idea on whether it
can be fixed (I don't mean to rush, I'd just curious as to if it's even
possible)?
I can understand if there's no way around it downloading and re-uploading when
copying files BETWEEN buckets (even though S3 supports it without downloading
and re-uploading), but I would hope there was a way around it downloading and
re-uploading when copying files within the SAME bucket.
Thanks,
Nick
Original comment by beamerb...@gmail.com
on 25 Mar 2011 at 8:49
I'm Dan
As far as I know the copy of a file to another file within a bucket is done on
the server side. The original author implemented this a long time ago.
Run s3fs in foreground mode and do the copy, you should be able to gleam what's
going on.
This *might* be an issue with use_cache -- I typically do not use use_cache as
my use model is mainly for storage vs. a dynamic file system.
I'll take a closer look later.
As for bucket-to-bucket server side copies, that's something that isn't
currently supported. I doubt that it would be easy since s3fs just takes its
directions from FUSE i.e. do this, do that -- providing little information
about context.
We might get lucky after a little investigation and find that its easy, but my
gut doesn't tell me so.
Back to the server-side copy, if we find that it works without use_cache, but
it does with use_cache (or has since been broken in some other way), then that
will be a new issue.
Bucket-to-different-bucket copying might be a good candidate for inclusion in
the s3fs "utility" mode.
...just rambling...
I'm not sure if bucket-to-different-bucket copying is supported across regions
though. Both buckets must be in the same region for this to work, I think.
Original comment by dmoore4...@gmail.com
on 25 Mar 2011 at 9:03
I take it back, the server side copy wasn't implemented, it was the server side
rename (which does a server side
copy). So renames are fast, but copies are not.
As you observed, s3fs downloads, then uploads the file upon a copy.
Again this is a function of FUSE, fuse does not have a "copy" operation which
gives two paths as arguments. (It does
have a rename operation which passes two arguments though). Check out fuse.h
for more info on available operations.
So what FUSE passes to s3fs is the following sequence (here I am copying file
named "9" to a file named "90"):
s3fs_getattr[path=/90]
s3fs_getattr[path=/9]
s3fs_getattr[path=/90]
open[path=/9][flags=32768]
get_local_fd[path=/9]
downloading[path=/9][fd=5]
s3fs_getattr[path=/90]
s3fs_create[path=/90][mode=33188][flags=32961]
create_file_object[path=/90][mode=33188]
get_local_fd[path=/90]
downloading[path=/90][fd=6]
s3fs_getattr[path=/90]
s3fs_read[path=/9]
s3fs_write[path=/90]
s3fs_getattr[path=/9]
stat cache hit [path=/9]
s3fs_flush[path=/90][fd=6]
calling get_headers [path=/90]
put_local_fd[path=/90][fd=6]
uploading[path=/90][fd=6][size=2]
s3fs_release[path=/90][fd=6]
s3fs_flush[path=/9][fd=5]
s3fs_release[path=/9][fd=5]
s3fs is a slave to FUSE, it doesn't know what "macro" operation is underway.
It doesn't look like a server side copy is possible for a cp command. Luckily
we got it for the mv command.
Original comment by dmoore4...@gmail.com
on 25 Mar 2011 at 11:35
Sorry, I don't think that this is a possibility due to how FUSE and s3fs works.
Please accept this as a limitation of the system.
Original comment by dmoore4...@gmail.com
on 7 Apr 2011 at 2:47
Alright, thanks. I was afraid of that. Thanks for looking into it.
Nick
Original comment by beamerb...@gmail.com
on 30 Apr 2011 at 12:53
Original issue reported on code.google.com by
beamerb...@gmail.com
on 16 Mar 2011 at 10:25