google-code-export / s3ql

Automatically exported from code.google.com/p/s3ql
0 stars 0 forks source link

1.13.2 no longer retries on internal S3 error #393

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi

We tried several times copying a ~30GB folder from an Ubuntu 12.04-based system 
to an S3-mounted s3ql file system

This is what we've got from the mount.log:

2013-03-13 20:00:33.205 [2064] MainThread: [mount] Encountered exception, 
trying to clean up...
2013-03-13 20:00:33.287 [2064] MainThread: [mount] Unmounting file system...
2013-03-13 20:03:54.429 [2064] MainThread: [root] Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/bin/mount.s3ql", line 9, in <module>
    load_entry_point('s3ql==1.13.2', 'console_scripts', 'mount.s3ql')()
  File "/usr/lib/s3ql/s3ql/mount.py", line 52, in run_with_except_hook
    run_old(*args, **kw)
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 286, in _upload_loop
    self._do_upload(*tmp)
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 305, in _do_upload
    obj_size = backend.perform_write(do_write, 's3ql_data_%d' % obj_id).get_obj_size()
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 63, in wrapped
    return fn(self, *a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 263, in perform_write
    return fn(fh)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 701, in __exit__
    self.close()
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 694, in close
    self.fh.close()
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 819, in close
    self.fh.close()
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 63, in wrapped
    return fn(self, *a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 608, in close
    headers=self.headers, body=self.fh)
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 399, in _do_request
    raise get_S3Error(tree.findtext('Code'), tree.findtext('Message'))
S3Error: InternalError: We encountered an internal error. Please try again.

User interface error says:  

Error splicing file: Transport endpoint is not connected

We are expecting similar sized files to be uploaded. Any reason why should 
believe File System isn't stable enough?

Your assistance on this will be appreciated.

PS: after that system appears to be unmounted not well, asking for fsck.

Original issue reported on code.google.com by netflix....@gmail.com on 13 Mar 2013 at 8:18

GoogleCodeExporter commented 9 years ago
Finding the same issue with version 1.13.2, but not with 1.11.1 (Ubuntu Precise)

Original comment by lionel.l...@gmail.com on 16 Mar 2013 at 10:01

GoogleCodeExporter commented 9 years ago
On 1.11.1 s3ql retries after the InternalError :

Mar 16 03:46:32 web6 s3ql: Encountered InternalError exception (InternalError: 
We encountered an internal error. Please try again.), retrying call to 
ObjectW.close...
Mar 16 05:23:57 web6 s3ql: Encountered InternalError exception (InternalError: 
We encountered an internal error. Please try again.), retrying call to 
ObjectW.close...

Not in 1.13.2.

Original comment by lionel.l...@gmail.com on 16 Mar 2013 at 10:04

GoogleCodeExporter commented 9 years ago
Thanks for the report, will try to look into this quickly.

Original comment by Nikolaus@rath.org on 16 Mar 2013 at 10:15

GoogleCodeExporter commented 9 years ago
This issue was closed by revision a41c65e42b23.

Original comment by Nikolaus@rath.org on 18 Mar 2013 at 4:08

GoogleCodeExporter commented 9 years ago
Lionel, could you confirm that the patch fixes this issue for you? You can 
apply it with `patch -p1 < s3ql.diff` in the S3QL source directory.

Thanks,
-Nikolaus

Original comment by Nikolaus@rath.org on 18 Mar 2013 at 4:11

Attachments:

GoogleCodeExporter commented 9 years ago
Applied the patch. Works for me. Thanks for the quick reply.

Original comment by lionel.l...@gmail.com on 18 Mar 2013 at 8:20

GoogleCodeExporter commented 9 years ago
I'm having the same behavior as Lionel stated. It happened several times with 
the following output in the log. It seems to be the same lines.

Should I apply the patch, or wait for a further release as we are already on 
production.

Thanks.

2013-04-08 13:17:50.521 [30686] MainThread: [fs] forget([(1, 1), (6, 2), (8, 
1), (9, 2), (10, 1), (13, 2), (14, 2), (15, 2), (16, 2), (17, 2), (18, 2), (19, 
2), (20, 2$
2013-04-08 13:17:55.274 [30686] MainThread: [mount] Waiting for background 
threads...
2013-04-08 13:17:55.275 [30686] MainThread: [BlockCache] destroy(): clearing 
cache...
2013-04-08 13:17:55.275 [30686] MainThread: [BlockCache] clear: start
2013-04-08 13:17:55.275 [30686] MainThread: [BlockCache] expire: start
2013-04-08 13:17:55.308 [30686] CommitThread: [mount] CommitThread: end
2013-04-08 13:17:55.327 [30686] Metadata-Upload-Thread: [mount] 
MetadataUploadThread: end
2013-04-08 13:17:55.347 [30686] MainThread: [BlockCache] expire: end
2013-04-08 13:17:55.347 [30686] MainThread: [BlockCache] clear: end
2013-04-08 13:17:55.349 [30686] MainThread: [BlockCache] destroy(): waiting for 
upload threads...
2013-04-08 13:17:55.349 [30686] MainThread: [BlockCache] destroy(): waiting for 
removal threads...
2013-04-08 13:17:55.378 [30686] MainThread: [mount] All background threads 
terminated.
2013-04-08 13:17:55.378 [30686] MainThread: [root] Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/bin/mount.s3ql", line 9, in <module>
    load_entry_point('s3ql==1.13.2', 'console_scripts', 'mount.s3ql')()
  File "/usr/lib/s3ql/s3ql/mount.py", line 139, in main
    llfuse.main(options.single)
  File "fuse_api.pxi", line 213, in llfuse.main (src/llfuse.c:18034)
  File "handlers.pxi", line 296, in llfuse.fuse_read (src/llfuse.c:6832)
  File "handlers.pxi", line 297, in llfuse.fuse_read (src/llfuse.c:6776)
  File "/usr/lib/s3ql/s3ql/fs.py", line 974, in read
    tmp = self._read(fh, offset, length)
  File "/usr/lib/s3ql/s3ql/fs.py", line 1006, in _read
    with self.cache.get(id_, blockno) as fh:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 538, in get
    el = backend.perform_read(do_read, 's3ql_data_%d' % obj_id)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 63, in wrapped
    return fn(self, *a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 250, in perform_read
    with self.open_read(key) as fh:
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 494, in open_read
    fh = self.backend.open_read(key)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 63, in wrapped
    return fn(self, *a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 274, in open_read
    resp = self._do_request('GET', '/%s%s' % (self.prefix, key))
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 399, in _do_request
    raise get_S3Error(tree.findtext('Code'), tree.findtext('Message'))
  File "/usr/lib/s3ql/s3ql/fs.py", line 974, in read
    tmp = self._read(fh, offset, length)
  File "/usr/lib/s3ql/s3ql/fs.py", line 1006, in _read
    with self.cache.get(id_, blockno) as fh:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 538, in get
    el = backend.perform_read(do_read, 's3ql_data_%d' % obj_id)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 63, in wrapped
    return fn(self, *a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 250, in perform_read
    with self.open_read(key) as fh:
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 494, in open_read
    fh = self.backend.open_read(key)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 63, in wrapped
    return fn(self, *a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 274, in open_read
    resp = self._do_request('GET', '/%s%s' % (self.prefix, key))
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 399, in _do_request
    raise get_S3Error(tree.findtext('Code'), tree.findtext('Message'))
S3Error: InternalError: We encountered an internal error. Please try again.

Original comment by netflix....@gmail.com on 8 Apr 2013 at 5:01