Adds a new "simple" method for collecting oplogs needed to construct
a consistent backup. It implements an algorithm similar to what mongodump
--oplog already does, albeit for multiple shards.
It does not begin tailing the oplogs for all shards at the beginning
of the backup. Instead, it runs mongodump for all shards and waits until
they have all finished.
Then it collects the delta between when each shard's dump ended and the time
when the last one finished.
The following stages, especially the Resolver, which brings all shard's
oplogs forward to a consistent point in time, are unchanged.
Rationale for this addition:
With the existing "tailer" approach, our backups very often failed with the
error message "Tailer host changed". This appears to be a common problem
with oplog tailing in general, judging from what you can find on the internet.
It appears that for some reason the oplog tailing cursors get aborted by
mongod with an error stating "operation exceeded time limit", code: 50.
With this new simpler oplog fetching method, that apparently does not happen.
The most important difference/drawback compared to the current tailer is that
the simple approach fails if the oplog of one of the shards is so busy that
by the time the deltas are to be collected it has rolled over, so that
operations are no longer available. This, however, will only be the case
on very busy systems where one might argue the oplog size should be increased
anyway.
In general the simple method should be a little less resource intensive, because
there is not additional I/O while mongodumps are runnig.
This change is backwards compatbile for callers. To use the new method, a new
configuration parameter needs to be specified: --oplog.tailer.method simple.
The default value for this option is tailer which can also be explictly set
to select the classic implementation.
Implementation Notes:
Common functionality between the original Tailer and the new simple
implementation was extracted into a new common base class "OplogTask".
In a few places some variables were extracted or renamed to (hopefully)
make the code a little more readable, despite the additions.
In the Resolver class the thread pool's join() method is called to fix
spurious (harmless) error messages like the following when finishing:
Process PoolWorker-8:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 380, in get
rrelease()
Adds a new "simple" method for collecting oplogs needed to construct a consistent backup. It implements an algorithm similar to what mongodump --oplog already does, albeit for multiple shards.
It does not begin tailing the oplogs for all shards at the beginning of the backup. Instead, it runs mongodump for all shards and waits until they have all finished.
Then it collects the delta between when each shard's dump ended and the time when the last one finished.
The following stages, especially the Resolver, which brings all shard's oplogs forward to a consistent point in time, are unchanged.
Rationale for this addition:
With the existing "tailer" approach, our backups very often failed with the error message "Tailer host changed". This appears to be a common problem with oplog tailing in general, judging from what you can find on the internet. It appears that for some reason the oplog tailing cursors get aborted by mongod with an error stating
"operation exceeded time limit", code: 50
.With this new simpler oplog fetching method, that apparently does not happen.
The most important difference/drawback compared to the current tailer is that the simple approach fails if the oplog of one of the shards is so busy that by the time the deltas are to be collected it has rolled over, so that operations are no longer available. This, however, will only be the case on very busy systems where one might argue the oplog size should be increased anyway.
In general the simple method should be a little less resource intensive, because there is not additional I/O while mongodumps are runnig.
This change is backwards compatbile for callers. To use the new method, a new configuration parameter needs to be specified:
--oplog.tailer.method simple
. The default value for this option istailer
which can also be explictly set to select the classic implementation.Implementation Notes: