Closed GoogleCodeExporter closed 9 years ago
I followed the flow of logic in s3fs.cpp and could see the libcurl call made by
s3fs_mknod returns status 200, i.e. the object is created, but when the error
occurs
it's because the subsequent libcurl call made by s3fs_getattr (checking
existence of
the object) returns status 404.
Note these two operations (and more) are triggered by a single dd command.
Sure enough, running "ls /mnt/s3/tw-msg/0/0/28" moments later works fine. The
file
exists, but is empty.
I suspect a race condition caused by S3 propagation delay, where the S3 server
handling the s3fs_getattr's HEAD request is (occasionally) not yet aware that
the
object was created by s3fs_mknod's PUT request to a different server just a few
milliseconds earlier.
Here's a brief discussion of S3 propagation delay:
http://developer.amazonwebservices.com/connect/message.jspa?messageID=106354
Most discussion of S3 propagation delay concerns modifications made to existing
objects. I'm only speculating that this issue also applies to the creation of
new
objects.
Original comment by pgarp...@gmail.com
on 22 Jul 2009 at 7:03
I have just now seen the "of=$DN/$M" variant also report the "No such file or
directory" error. I have not analyzed the s3fs call sequence for this variant,
it may
even be the same as the ">$DN/$M" used in the attached script. The different
error
messages may just reflect a 404 being reported for the newly created object in
the
various libcurl requests following s3fs_mknod's PUT. The error I scrutinized and
described above was a 404 in the first s3fs_getattr. I made a naive attempt to
tolerate the error by adding a loop to s3fs_mknod calling s3fs_getattr to wait
for
the object to exist. Even so the very next call to s3fs_getattr (in the normal
sequence of creating a file) occasionally produces a 404. This is another
reason I
suspect S3 internal propagation delay.
Possibly related issues:
http://code.google.com/p/s3fs/issues/detail?id=44
http://code.google.com/p/s3fs/issues/detail?id=47
mitchell.penrod: "The issue seems to be with s3fs returning 0 as the st_mode
when the
file has no mode set via the amz-meta headers and when the content-type is
blank."
Based on mitchell.penrod's statement I added a ".jpg" extension to the
filenames.
That run created over 24000 files before failing with the "No such file or
directory"
error. Perhaps just a coincidence.
Original comment by pgarp...@gmail.com
on 22 Jul 2009 at 6:11
Hi- indeed, amazon s3 eventual consistency is undoubtedly what you're running
into
here; myself and other users have seen this before
Original comment by rri...@gmail.com
on 22 Jul 2009 at 7:08
I see.
http://www.google.com/search?q=site%3Adeveloper.amazonwebservices.com+s3+eventua
l+consistency
returns this:
http://developer.amazonwebservices.com/connect/message.jspa?messageID=38373
Colin asks: "1. Assume an object X does not exist. If I PUT X and then GET X,
am I
guaranteed to get X back instead of a 404 error?"
Ami@AWS answers: "no".
Also this:
http://developer.amazonwebservices.com/connect/click.jspa?searchID=-1&messageID=
104149
"endpoints that have incorrectly reported a 404 (over ~5m after the PUT)"
Yikes. I can work around this by retrying create/write in the application layer
until
there is no error, and retrying at read time will also be necessary because
even if
the PUT and a few GETs succeed there is no telling when S3 will eventually be
consistent for all subsequent GETs.
This S3 characteristic largely defeats the purpose of using FUSE to make it
look like
an ordinary file system. I'm afraid most programmers are not aware of this
issue, or
assume s3fs deals with it somehow. Not to criticize you, because I don't see
any way
s3fs could deal with it, but perhaps you could mention this at
http://code.google.com/p/s3fs/wiki/FuseOverAmazon under Limitations. Something
like:
Due to S3's "eventual consistency" limitations file creation can and will
occasionally fail. Even after a successful create subsequent reads can fail for
an
indeterminate time, even after one or more successful reads. Create and read
enough
files and you will eventually encounter this failure. This is not a flaw in
s3fs and
it is not something a FUSE wrapper like s3fs can work around. The retries
option does
not address this issue. Your application must either tolerate or compensate for
these
failures, for example by retrying creates or reads. For details see
http://code.google.com/p/s3fs/issues/detail?id=61
Just a suggestion. Thanks Randy.
Paul Gardner
Original comment by pgarp...@gmail.com
on 22 Jul 2009 at 9:43
Hi Paul,
I do see this with the script you provided. It would be nice to find a
way to mitigate this, even if it's a hit on performance -- personally, I'm
more interested in a reliable system vs. one that is fast.
What about implementation of a semaphoring system within the application?
For example, upon creation of the object, lock access to it until it actual
appears.
...so after the mknod, loop on a read until it returns success (or times out).
Dan
Original comment by dmoore4...@gmail.com
on 21 Dec 2010 at 9:43
"loop on a read until it returns success"
The problem is that when you talk to S3 you're talking to a distributed system.
You can make a read request that gets routed to a particular server, which
returns success, and then at some indeterminate amount of time (dt) in the
future make another request, which invisibly to you gets routed to a different
server, which has not yet heard of your new object and returns failure.
There is no limit on dt. In practice 99% of your objects may be fully
propagated and consistently readable 100msec later, 99.9% after 1sec, 99.99%
after 10sec, and so on. There is no way to know definitively when S3 has
finally got all it's servers into a consistent state.
What would solve the problem is a call that reports whether a given object is
fully propagated. You could then loop on that after creating an object.
Original comment by pgarp...@gmail.com
on 22 Dec 2010 at 1:43
Gotcha, that clears things up. Virtually nothing we can do other than put in a
(relatively) long wait.
I agree with your recommendation, let users be aware....
Original comment by dmoore4...@gmail.com
on 22 Dec 2010 at 2:18
This article is intriguing:
http://shlomoswidler.com/2009/12/read-after-write-consistency-in-amazon.html
It refers to a new AWS S3 feature (Dec 9, 2010) of "read-after-write
consistency" for new objects.
Original comment by dmoore4...@gmail.com
on 22 Dec 2010 at 11:44
Swidler writes: "Read-after-write consistency for AWS S3 is only available in
the US-west and EU regions, not the US-Standard region."
If so, then the script I provided above should not fail using s3fs as-is, as
long as the bucket being used was created in the US-west or EU region.
Original comment by pgarp...@gmail.com
on 23 Dec 2010 at 12:24
We'll see. I just created a US-west bucket and am running your script.
% date
Wed Dec 22 18:19:41 MST 2010
To note, the eventual consistency issue is not totally mitigated. A
"read-after-delete" is not guaranteed to return a "not found" message, as it
should.
Original comment by dmoore4...@gmail.com
on 23 Dec 2010 at 1:24
s3test.sh ended up error'ing out after writing 24,500+ files, but not due to a
"file not found" but due to too many retries upon a network timeout.
I'm pretty much convinced. The US-west bucket doesn't have the
"read-after-write" issue.
Created a Wiki and included info on the main page addressing this issue.
Original comment by dmoore4...@gmail.com
on 24 Dec 2010 at 12:22
I'm sure this has been seen, but the official FAQ lists read-after-write only
available in certain zones, and not in all US zones
From:
http://aws.amazon.com/s3/faqs/#What_data_consistency_model_does_Amazon_S3_employ
Q: What data consistency model does Amazon S3 employ?
Amazon S3 buckets in the US West (Northern California), EU (Ireland), Asia Pacific (Singapore), and Asia Pacific (Tokyo) Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES. Amazon S3 buckets in the US Standard Region provide eventual consistency.
Original comment by digital...@gmail.com
on 5 May 2011 at 2:05
Perhaps it would be good for s3fs to offer a caching mechanism so that when a
file is added to a mounted directory, it is cached locally and returned for a
short period of time to guarantee S3 is in a consistent state.
Original comment by fisher1...@gmail.com
on 20 Mar 2012 at 5:42
Original issue reported on code.google.com by
pgarp...@gmail.com
on 22 Jul 2009 at 6:37Attachments: