google-code-export / s3ql

Automatically exported from code.google.com/p/s3ql
0 stars 0 forks source link

Malformed data leads to CannotSendRequest exception #424

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Today's joy..

2013-10-15 09:09:18.786 [pid=1889, thread='Dummy-25', module='s3ql.fs', 
fn='_readwrite', line=1110]: Backend returned malformed data for block 0 of 
inode 1974637 (Invalid compressed stream)
2013-10-15 09:09:18.818 [pid=1889, thread='Thread-20', 
module='s3ql.backends.common', fn='wrapped', line=77]: Encountered 
BadStatusLine exception (''), retrying call to Backend._delete_multi...
2013-10-15 09:09:18.821 [pid=1889, thread='Thread-7', module='root', 
fn='excepthook', line=123]: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/mount.py", line 54, in run_with_except_hook
    run_old(*args, **kw)
  File "/usr/lib/python3.3/threading.py", line 596, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/block_cache.py", line 300, in _upload_loop
    self._do_upload(*tmp)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/block_cache.py", line 319, in _do_upload
    obj_size = backend.perform_write(do_write, 's3ql_data_%d' % obj_id).get_obj_size()
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 65, in wrapped
    return method(*a, **kw)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 402, in perform_write
    return fn(fh)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 858, in __exit__
    self.close()
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 851, in close
    self.fh.close()
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 973, in close
    self.fh.close()
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 65, in wrapped
    return method(*a, **kw)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/s3c.py", line 695, in close
    headers=self.headers, body=self.fh)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/s3c.py", line 287, in _do_request
    resp = self._send_request(method, path, headers, subres, query_string, body)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/s3c.py", line 472, in _send_request
    self.conn.putrequest(method, path)
  File "/usr/lib/python3.3/http/client.py", line 944, in putrequest
    raise CannotSendRequest(self.__state)
http.client.CannotSendRequest: Request-sent
2013-10-15 09:09:19.551 [pid=1889, thread='MainThread', module='s3ql.mount', 
fn='unmount', line=129]: Unmounting file system...
2013-10-15 09:09:24.779 [pid=1889, thread='MainThread', module='root', 
fn='excepthook', line=123]: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/local/bin/mount.s3ql", line 9, in <module>
    load_entry_point('s3ql==2.4', 'console_scripts', 'mount.s3ql')()
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/mount.py", line 166, in main
    llfuse.main(options.single)
  File "fuse_api.pxi", line 252, in llfuse.main (src/llfuse.c:19256)
  File "handlers.pxi", line 296, in llfuse.fuse_read (src/llfuse.c:7222)
  File "handlers.pxi", line 297, in llfuse.fuse_read (src/llfuse.c:7166)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/fs.py", line 1017, in read
    tmp = self._readwrite(fh, offset, length=length)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/fs.py", line 1094, in _readwrite
    with self.cache.get(id_, blockno) as fh:
  File "/usr/lib/python3.3/contextlib.py", line 48, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/block_cache.py", line 553, in get
    backend.perform_read(do_read, 's3ql_data_%d' % obj_id)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 65, in wrapped
    return method(*a, **kw)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 389, in perform_read
    with self.open_read(key) as fh:
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 671, in open_read
    fh = self.backend.open_read(key)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/common.py", line 65, in wrapped
    return method(*a, **kw)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/s3c.py", line 219, in open_read
    resp = self._do_request('GET', '/%s%s' % (self.prefix, key))
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/s3c.py", line 287, in _do_request
    resp = self._send_request(method, path, headers, subres, query_string, body)
  File "/usr/local/lib/python3.3/dist-packages/s3ql-2.4-py3.3-linux-x86_64.egg/s3ql/backends/s3c.py", line 468, in _send_request
    return self.conn.getresponse()
  File "/usr/lib/python3.3/http/client.py", line 1135, in getresponse
    raise ResponseNotReady(self.__state)
http.client.ResponseNotReady: Request-sent

Thanks
Balazs

Original issue reported on code.google.com by czv...@gmail.com on 15 Oct 2013 at 1:15

GoogleCodeExporter commented 9 years ago
Is this bucket using encryption and compression, or just compression?

In the later case, this is most likely caused by the same issue responsible for 
the HMAC error that you reported on the mailing list a little while ago (i.e., 
defective hardware on your machine or a problems on the backend server). Of 
course, it could also be a bug in S3QL though. Is this happening on the same 
machine, and/or with the same backend as the HMAC error?

That said, this problem still should not result in a crash, so I'll make sure 
to fix that. Thanks for the report!

Original comment by Nikolaus@rath.org on 15 Oct 2013 at 4:11

GoogleCodeExporter commented 9 years ago
This bucket uses both encryption and compression. No, the HMAC error is 
happening on a server that has the same patch level as the server in Issue 425 
(which is also a different server). The server with this 
http.client.ResponseNotReady error is a vanilla 2.4, no patches. Interestingly, 
it is a bucket specific error. The server has five buckets mounted, and this 
only happens on one (same S3 region as the others). 
I made sure not to have any other job running when this bombed, to make sure it 
is not a bandwidth issue. The server is on a 100mb line, so that shouldn't be 
an issue anyhow. Not sure what to look for :-(

Thanks for the help
Balazs

Original comment by czv...@gmail.com on 16 Oct 2013 at 10:36

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 6b219b840e80.

Original comment by Nikolaus@rath.org on 19 Oct 2013 at 5:26

GoogleCodeExporter commented 9 years ago
The above revision will fix the problem with S3QL crashing when it receives 
malformed data. I am, however, still at a loss as what might be causing this 
corruption for you in the first place. 

If this file system really uses compression and encryption, but you're getting 
an error when decompressing, this means that the HMAC of the compressed data 
was successfully verified. Therefore, at least this case of corruption cannot 
result from problems with the storage service. This leaves either a bug in 
S3QL, or problems with the local hardware.

On the other hand, the HMAC error that you reported on the list implies that in 
that case the *encrypted* data was corrupted. This would mean that there are at 
least two data corruption bugs in S3QL, or that you have faulty hardware.

But then, you also reported this happening on two different systems, which 
makes it rather hard to blame on a hardware problem as well.

In other words, I currently do not have a clear plan forward. Maybe the best 
strategy is to upgrade all your servers to the newest (soon to be released) 
S3QL and collect some more data. Hopefully that will show some sort of pattern 
in either corruption type, affected computers, or affected buckets.

Original comment by Nikolaus@rath.org on 19 Oct 2013 at 9:22

GoogleCodeExporter commented 9 years ago
Ok, that sounds like a plan. I am upgrading all involved servers today, and 
rerun everything. I will report back. Thanks very much for releasing a new 
version, I was starting to lose track of the right order of the patches :-)

Original comment by czv...@gmail.com on 21 Oct 2013 at 3:35