google-code-export / s3ql

Automatically exported from code.google.com/p/s3ql
0 stars 0 forks source link

ResponseNotReady exception when retrying after BadDigest #358

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
2012-07-11 21:12:01.492 [7160] Dummy-25: [backends.s3c] 
ObjectR(s3ql_data_534144).close(): MD5 mismatch: 
bec6d28d37eb0f1456e16a556b4a9241 vs 9db3ed869c2c6914431877ed85c9e619
2012-07-11 21:12:01.503 [7160] Dummy-25: [backend] Encountered BadDigest 
exception (BadDigest: ETag header does not agree with calculated MD5), retrying 
call to BetterBucket.perform_read...
2012-07-11 21:12:01.863 [7160] MainThread: [mount] Encountered exception, 
trying to clean up...
2012-07-11 21:12:01.875 [7160] MainThread: [mount] Unmounting file system...
2012-07-11 21:12:03.246 [7160] MainThread: [mount] Exception during cleanup:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/mount.py", line 187, in main
    op()
  File "/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/block_cache.py", line 275, in destroy
    os.rmdir(self.path)
OSError: [Errno 39] Directory not empty: 
'/home/domains/.s3ql/s3:=2F=2Fs3qlownpages=2Fbriefcase-cache'

2012-07-11 21:12:04.120 [7160] MainThread: [root] Uncaught top-level exception. 
Traceback (most recent call last):
[...]
  File '/usr/local/bin/mount.s3ql', line 9, in <module>()
    load_entry_point('s3ql==1.10', 'console_scripts', 'mount.s3ql')()
  Current bindings:
    load_entry_point = <function load_entry_point> 

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/mount.py', line 139, in main(args=['--compress', 'none', '--allow-root', '--authfile', '/home/domains/.s3ql/authinfo', 's3://s3qlownpages/briefcase', '/home/briefcase'])
    llfuse.main(options.single)
  Current bindings:
    llfuse = <module 'llfuse' from '/usr/local/lib/python2.7/.../llfuse-0.37.1-py2.7-linux-x86_64.egg/llfuse.so'> (global)
    llfuse.main = <built-in function main> (local)
    options = Namespace(allow_other=False, allow_root=True, au...qlownpages/briefcase', threads=10, upstart=False) 
    options.single = False (local)

  File 'fuse_api.pxi', line 213, in llfuse.main (src/llfuse.c:18034)()
  Current bindings:

  File 'handlers.pxi', line 296, in llfuse.fuse_read (src/llfuse.c:6832)()
  Current bindings:

  File 'handlers.pxi', line 297, in llfuse.fuse_read (src/llfuse.c:6776)()
  Current bindings:

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/fs.py', line 928, in read(self=<s3ql.fs.Operations object>, fh=567947L, offset=0, length=16384)
    tmp = self._read(fh, offset, length)
  Current bindings:
tmp undefined
    self = <s3ql.fs.Operations object> 
    self._read = <bound method Operations._read of <s3ql.fs.Operations object>> (local)
    fh = 567947L 
    offset = 0 
    length = 16384 

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/fs.py', line 960, in _read(self=<s3ql.fs.Operations object>, id_=567947L, offset=0, length=16384)
    with self.cache.get(id_, blockno) as fh:
  Current bindings:
    self = <s3ql.fs.Operations object> 
    self.cache = <s3ql.block_cache.BlockCache object> (local)
    self.cache.get = <bound method BlockCache.get of <s3ql.block_cache.BlockCache object>> (local)
    id_ = 567947L 
    blockno = 0 
fh undefined

  File '/usr/local/lib/python2.7/contextlib.py', line 17, in __enter__(self=<contextlib.GeneratorContextManager object>)
    return self.gen.next()
  Current bindings:
    self = <contextlib.GeneratorContextManager object> 
    self.gen = <generator object get> (local)
    self.gen.next = <method-wrapper 'next' of generator object> (local)

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/block_cache.py', line 538, in get(self=<s3ql.block_cache.BlockCache object>, inode=567947L, blockno=0)
    el = bucket.perform_read(do_read, 's3ql_data_%d' % obj_id)
  Current bindings:
    el = None 
    bucket = <s3ql.backends.common.BetterBucket object> 
    bucket.perform_read = <bound method BetterBucket.perform_read of <s3ql.backends.common.BetterBucket object>> (local)
    do_read = <function do_read> 
    obj_id = 534144 

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/backends/common.py', line 60, in wrapped(self=<s3ql.backends.common.BetterBucket object>, *a=(<function do_read>, 's3ql_data_534144'), **kw={})
    return fn(self, *a, **kw)
  Current bindings:
    fn = <function perform_read> 
    self = <s3ql.backends.common.BetterBucket object> 
    a = (<function do_read>, 's3ql_data_534144') 
    kw = {} 

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/backends/common.py', line 222, in perform_read(self=<s3ql.backends.common.BetterBucket object>, fn=<function do_read>, key='s3ql_data_534144')
    with self.open_read(key) as fh:
  Current bindings:
    self = <s3ql.backends.common.BetterBucket object> 
    self.open_read = <bound method BetterBucket.open_read of <s3ql.backends.common.BetterBucket object>> (local)
    key = 's3ql_data_534144' 
fh undefined

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/backends/common.py', line 471, in open_read(self=<s3ql.backends.common.BetterBucket object>, key='s3ql_data_534144')
    fh = self.bucket.open_read(key)
  Current bindings:
fh undefined
    self = <s3ql.backends.common.BetterBucket object> 
    self.bucket = <s3ql.backends.s3.Bucket object> (local)
    self.bucket.open_read = <bound method Bucket.open_read of <s3ql.backends.s3.Bucket object>> (local)
    key = 's3ql_data_534144' 

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/backends/common.py', line 60, in wrapped(self=<s3ql.backends.s3.Bucket object>, *a=('s3ql_data_534144',), **kw={})
    return fn(self, *a, **kw)
  Current bindings:
    fn = <function open_read> 
    self = <s3ql.backends.s3.Bucket object> 
    a = ('s3ql_data_534144',) 
    kw = {} 

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/backends/s3c.py', line 283, in open_read(self=<s3ql.backends.s3.Bucket object>, key='s3ql_data_534144')
    resp = self._do_request('GET', '/%s%s' % (self.prefix, key))
  Current bindings:
resp undefined
    self = <s3ql.backends.s3.Bucket object> 
    self._do_request = <bound method Bucket._do_request of <s3ql.backends.s3.Bucket object>> (local)
    self.prefix = 'briefcase' (local)
    key = 's3ql_data_534144' 

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/backends/s3c.py', line 353, in _do_request(self=<s3ql.backends.s3.Bucket object>, method='GET', path='/briefcases3ql_data_534144', subres=None, query_string=None, headers={'authorization': 'AWS 175FBKDTAZAT47X0ZF82:wrBFfa4ZIQaqCn/+63Y/8zdnooA=', 'connection': 'keep-alive', 'content-length': '0', 'date': 'Wed, 11 Jul 2012 15:42:01 GMT'}, body=None)
    resp = self._send_request(method, path, headers, subres, query_string, body)
  Current bindings:
resp undefined
    self = <s3ql.backends.s3.Bucket object> 
    self._send_request = <bound method Bucket._send_request of <s3ql.backends.s3.Bucket object>> (local)
    method = 'GET' 
    path = '/briefcases3ql_data_534144' 
    headers = {'authorization': 'AWS 175FBKDTAZAT47X0ZF82:wrBFfa4ZIQaqCn/+63Y/8zdnooA=', 'connection': 'keep-alive', 'content-length': '0', 'date': 'Wed, 11 Jul 2012 15:42:01 GMT'} 
    subres = None 
    query_string = None 
    body = None 

  File '/usr/local/lib/python2.7/site-packages/s3ql-1.10-py2.7-linux-x86_64.egg/s3ql/backends/s3c.py', line 502, in _send_request(self=<s3ql.backends.s3.Bucket object>, method='GET', path='/briefcases3ql_data_534144', headers={'authorization': 'AWS 175FBKDTAZAT47X0ZF82:wrBFfa4ZIQaqCn/+63Y/8zdnooA=', 'connection': 'keep-alive', 'content-length': '0', 'date': 'Wed, 11 Jul 2012 15:42:01 GMT'}, subres=None, query_string=None, body=None)
    return self.conn.getresponse()
  Current bindings:
    self = <s3ql.backends.s3.Bucket object> 
    self.conn = <httplib.HTTPConnection instance> (local)
    self.conn.getresponse = <bound method HTTPConnection.getresponse of <httplib.HTTPConnection instance>> (local)

  File '/usr/local/lib/python2.7/httplib.py', line 1015, in getresponse(self=<httplib.HTTPConnection instance>, buffering=False)
    raise ResponseNotReady()
  Current bindings:
    ResponseNotReady = <class 'httplib.ResponseNotReady'> (global)

Exception: ResponseNotReady: 
  args = ()
  message = ''

Original issue reported on code.google.com by Nikolaus@rath.org on 12 Jul 2012 at 1:18

GoogleCodeExporter commented 9 years ago
I have tried to replicate this problem, but I have had no luck so far.

Could you try to apply the attached patch to a separate S3QL installation? It 
will trigger an artificial error when reading the third object. So if you try 
it with a new filesystem:

$ s3ql-test/bin/mkfs.s3ql s3://nikratio-test/
$ s3ql-test/mount.s3ql s3://nikratio-test/ mnt
$ echo data1 > mnt/file1
$ echo data2 > mnt/file2
$ echo data3 > mnt/file3

Then the following two commands should work:

$ cat mnt/data1
$ cat mnt/data2

but the last one should trigger the error

$ cat mnt/data3

Does this reproduce the above crash on your system? Note that if it does not, 
then it will appear to hang instead (because S3QL tries to download the object 
again and again). In that case you can just kill -9 the mount.s3ql process and 
free the mountpoint with fusermount -u.

Original comment by Nikolaus@rath.org on 13 Jul 2012 at 2:52

Attachments:

GoogleCodeExporter commented 9 years ago
I tried it with new S3 bucket. Doing a cat on 3rd uploaded file seemed to 
hang with messages appearing in the log as 
"[backend] Encountered BadDigest exception (BadDigest: TestError), retrying 
call 
to BetterBackend.perform_read...".

But it did not reproduce the earlier crash yet :(

Original comment by shrid...@staff.ownmail.com on 13 Jul 2012 at 4:19

GoogleCodeExporter commented 9 years ago
Hmm. I can only imagine one sequence of events that would lead to the above 
crash, but it would most likely imply a bug in Python's httplib.

I have added some extra code that detects if what I think may be happening is 
actually happening and then prints extra debug information and tries to recover 
the situation. This patch will also be included in the next S3QL release and 
should be safe to use for production. Once you have applied it, please keep an 
eye on mount.log for error messages of the form 

  ObjectR.read(): response not closed after end of data

followed by some more information and let me know if they show up. There should 
be no file system crash going along with the error, so you really need to look 
into mount.log periodically.

Thanks for reporting!

Original comment by Nikolaus@rath.org on 14 Jul 2012 at 4:25

Attachments:

GoogleCodeExporter commented 9 years ago
Commited in revision 62de72165160.

Original comment by Nikolaus@rath.org on 14 Jul 2012 at 4:28

GoogleCodeExporter commented 9 years ago
Hello Nik, I added the patch and upgraded s3ql to latest version.
But the s3ql partition got disconnected again. 
The logs reveal "ObjectR.read(): response not closed after end of data" 
messages. 

I am attaching the mount.log file, the crash occurred on 30th of Jul.

Regards
Shridhar

Original comment by shrid...@staff.ownmail.com on 8 Aug 2012 at 12:45

Attachments:

GoogleCodeExporter commented 9 years ago
I hope you updated to the latest version and *then* applied the patch? Because 
I haven't released a new version since adding the patch, so if you first 
installed the patch and then the new version, you removed the patch again.

Original comment by Nikolaus@rath.org on 8 Aug 2012 at 12:59

GoogleCodeExporter commented 9 years ago
:D . Initially I was using s3ql 1.10. I then upgraded it to s3ql-1.11.1 and 
then 
applied the patch.

Original comment by shrid...@staff.ownmail.com on 8 Aug 2012 at 1:18

GoogleCodeExporter commented 9 years ago

Original comment by Nikolaus@rath.org on 13 Aug 2012 at 1:15

GoogleCodeExporter commented 9 years ago
Ok, thanks. I actually introduced a bug in the bug handling code, but it really 
looks like a Python bug at this point. I have reported it at 
http://bugs.python.org/issue15633.

In order to gather more debugging information, could you please apply the 
attached second patch? (You will need to revert the first patch using 'patch -R 
< s3ql.diff' first). Once you have applied it, please again keep an eye on 
mount.log for error messages of the form 

  ObjectR.read(): response not closed after end of data

followed by some more information and attach the mount.log when this happens. 
This time there should really be no file system crash going along with the 
error (so you really need to look into mount.log periodically).

Thanks, and sorry that this turns out to be so hard to debug!

Original comment by Nikolaus@rath.org on 13 Aug 2012 at 1:38

Attachments:

GoogleCodeExporter commented 9 years ago
I am sorry for the late reply. I almost forgot that i had some issues with s3ql 
which needed had to be worked upon. I get an error while trying to apply this 
patch.
it says
patching file src/s3ql/backends/s3c.py
Hunk #2 FAILED at 560.
1 out of 2 hunks FAILED -- saving rejects to file src/s3ql/backends/s3c.py.rej

Just for your easy reference, the previous patch updated 2 files s3c.py and 
swift.py whereas the current patch tries to update only s3c.py. Is this how it 
is supposed to be.

Thanks
Shridhar

Original comment by shrid...@staff.ownmail.com on 19 Oct 2012 at 10:39

GoogleCodeExporter commented 9 years ago
The patch is included in the 1.12 release, so upgrading to S3QL 1.12 should be 
enough. After the upgrade, please keep an eye on mount.log for error messages 
of the form 

  ObjectR.read(): response not closed after end of data

followed by some more information and attach the mount.log when this happens. 
This time there should really be no file system crash going along with the 
error (so you really need to look into mount.log periodically).

Thanks!

Original comment by Nikolaus@rath.org on 21 Oct 2012 at 11:09

GoogleCodeExporter commented 9 years ago
Shridhar, is this problem still happening?

Thanks!

Original comment by Nikolaus@rath.org on 22 Nov 2012 at 11:01

GoogleCodeExporter commented 9 years ago
Hi, I had installed the latest version on Nov1 2012.
The s3ql share hasn't disconnected since. and also there are no logs of type 
"ObjectR.read(): response not closed after end of data" in mount.log.

Will keep you posted if i find any.

Best regards
Shridhar

Original comment by shrid...@staff.ownmail.com on 23 Nov 2012 at 9:31

GoogleCodeExporter commented 9 years ago
Have there been any more occurences? If not, I'll close this bug for now. Feel 
free to reopen if the problem appears again. 

Original comment by Nikolaus@rath.org on 14 Jan 2013 at 6:54

GoogleCodeExporter commented 9 years ago
Hi,

The s3ql has been running flawlessly. You had asked me to keep an eye on the 
logs for the message of type "ObjectR.read(): response not closed after end of 
data". after several months i found one :-)

There was no filesystem crash though.

Thanks

Original comment by shrid...@staff.ownmail.com on 29 Jan 2013 at 5:01

Attachments:

GoogleCodeExporter commented 9 years ago
Alright, thanks for reporting back! Hopefully this will be enough for the 
Python guys.

Original comment by Nikolaus@rath.org on 30 Jan 2013 at 1:39

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 6e64ee2b4c7f.

Original comment by Nikolaus@rath.org on 12 Mar 2013 at 3:09