Closed GoogleCodeExporter closed 9 years ago
I have tried to replicate this problem, but I have had no luck so far.
Could you try to apply the attached patch to a separate S3QL installation? It
will trigger an artificial error when reading the third object. So if you try
it with a new filesystem:
$ s3ql-test/bin/mkfs.s3ql s3://nikratio-test/
$ s3ql-test/mount.s3ql s3://nikratio-test/ mnt
$ echo data1 > mnt/file1
$ echo data2 > mnt/file2
$ echo data3 > mnt/file3
Then the following two commands should work:
$ cat mnt/data1
$ cat mnt/data2
but the last one should trigger the error
$ cat mnt/data3
Does this reproduce the above crash on your system? Note that if it does not,
then it will appear to hang instead (because S3QL tries to download the object
again and again). In that case you can just kill -9 the mount.s3ql process and
free the mountpoint with fusermount -u.
Original comment by Nikolaus@rath.org
on 13 Jul 2012 at 2:52
Attachments:
I tried it with new S3 bucket. Doing a cat on 3rd uploaded file seemed to
hang with messages appearing in the log as
"[backend] Encountered BadDigest exception (BadDigest: TestError), retrying
call
to BetterBackend.perform_read...".
But it did not reproduce the earlier crash yet :(
Original comment by shrid...@staff.ownmail.com
on 13 Jul 2012 at 4:19
Hmm. I can only imagine one sequence of events that would lead to the above
crash, but it would most likely imply a bug in Python's httplib.
I have added some extra code that detects if what I think may be happening is
actually happening and then prints extra debug information and tries to recover
the situation. This patch will also be included in the next S3QL release and
should be safe to use for production. Once you have applied it, please keep an
eye on mount.log for error messages of the form
ObjectR.read(): response not closed after end of data
followed by some more information and let me know if they show up. There should
be no file system crash going along with the error, so you really need to look
into mount.log periodically.
Thanks for reporting!
Original comment by Nikolaus@rath.org
on 14 Jul 2012 at 4:25
Attachments:
Commited in revision 62de72165160.
Original comment by Nikolaus@rath.org
on 14 Jul 2012 at 4:28
Hello Nik, I added the patch and upgraded s3ql to latest version.
But the s3ql partition got disconnected again.
The logs reveal "ObjectR.read(): response not closed after end of data"
messages.
I am attaching the mount.log file, the crash occurred on 30th of Jul.
Regards
Shridhar
Original comment by shrid...@staff.ownmail.com
on 8 Aug 2012 at 12:45
Attachments:
I hope you updated to the latest version and *then* applied the patch? Because
I haven't released a new version since adding the patch, so if you first
installed the patch and then the new version, you removed the patch again.
Original comment by Nikolaus@rath.org
on 8 Aug 2012 at 12:59
:D . Initially I was using s3ql 1.10. I then upgraded it to s3ql-1.11.1 and
then
applied the patch.
Original comment by shrid...@staff.ownmail.com
on 8 Aug 2012 at 1:18
Original comment by Nikolaus@rath.org
on 13 Aug 2012 at 1:15
Ok, thanks. I actually introduced a bug in the bug handling code, but it really
looks like a Python bug at this point. I have reported it at
http://bugs.python.org/issue15633.
In order to gather more debugging information, could you please apply the
attached second patch? (You will need to revert the first patch using 'patch -R
< s3ql.diff' first). Once you have applied it, please again keep an eye on
mount.log for error messages of the form
ObjectR.read(): response not closed after end of data
followed by some more information and attach the mount.log when this happens.
This time there should really be no file system crash going along with the
error (so you really need to look into mount.log periodically).
Thanks, and sorry that this turns out to be so hard to debug!
Original comment by Nikolaus@rath.org
on 13 Aug 2012 at 1:38
Attachments:
I am sorry for the late reply. I almost forgot that i had some issues with s3ql
which needed had to be worked upon. I get an error while trying to apply this
patch.
it says
patching file src/s3ql/backends/s3c.py
Hunk #2 FAILED at 560.
1 out of 2 hunks FAILED -- saving rejects to file src/s3ql/backends/s3c.py.rej
Just for your easy reference, the previous patch updated 2 files s3c.py and
swift.py whereas the current patch tries to update only s3c.py. Is this how it
is supposed to be.
Thanks
Shridhar
Original comment by shrid...@staff.ownmail.com
on 19 Oct 2012 at 10:39
The patch is included in the 1.12 release, so upgrading to S3QL 1.12 should be
enough. After the upgrade, please keep an eye on mount.log for error messages
of the form
ObjectR.read(): response not closed after end of data
followed by some more information and attach the mount.log when this happens.
This time there should really be no file system crash going along with the
error (so you really need to look into mount.log periodically).
Thanks!
Original comment by Nikolaus@rath.org
on 21 Oct 2012 at 11:09
Shridhar, is this problem still happening?
Thanks!
Original comment by Nikolaus@rath.org
on 22 Nov 2012 at 11:01
Hi, I had installed the latest version on Nov1 2012.
The s3ql share hasn't disconnected since. and also there are no logs of type
"ObjectR.read(): response not closed after end of data" in mount.log.
Will keep you posted if i find any.
Best regards
Shridhar
Original comment by shrid...@staff.ownmail.com
on 23 Nov 2012 at 9:31
Have there been any more occurences? If not, I'll close this bug for now. Feel
free to reopen if the problem appears again.
Original comment by Nikolaus@rath.org
on 14 Jan 2013 at 6:54
Hi,
The s3ql has been running flawlessly. You had asked me to keep an eye on the
logs for the message of type "ObjectR.read(): response not closed after end of
data". after several months i found one :-)
There was no filesystem crash though.
Thanks
Original comment by shrid...@staff.ownmail.com
on 29 Jan 2013 at 5:01
Attachments:
Alright, thanks for reporting back! Hopefully this will be enough for the
Python guys.
Original comment by Nikolaus@rath.org
on 30 Jan 2013 at 1:39
This issue was closed by revision 6e64ee2b4c7f.
Original comment by Nikolaus@rath.org
on 12 Mar 2013 at 3:09
Original issue reported on code.google.com by
Nikolaus@rath.org
on 12 Jul 2012 at 1:18