Percona-Lab / mongodb_consistent_backup

A tool for performing consistent backups of MongoDB Clusters or Replica Sets
https://www.percona.com
Apache License 2.0
276 stars 81 forks source link

New Bugfix Release #279

Closed corey-hammerton closed 5 years ago

corey-hammerton commented 5 years ago

Is there any ETA on the publishing of a new release? Either a big-fix or a minor version increase?

We are affected by #277, among others, and need to implement a fix. Please include an RPM for CentOS 7.

arturoera commented 5 years ago

We also have been affected with similar scenario as seen by @corey-hammerton, In our case we have sharded instances with up to 64 shards, and heavy write load on them which we think its causing the oplog tailing to simply die. I created a small patch to consistent_backup to add an additional couple of flags to be able to disable oplog tailing, (at mongodump). With this in place we have had a better success rate, but not 100%. The problem of course is that the backup is valid only from the point it started, because with no oplog we can't guarantee a consistent backup to the point the dump finished, which for our biggest instances it can take up to 16 hours. https://github.com/arturoera/mongodb_consistent_backup/commits/disable_mongodump_oplog

dbmurphy commented 5 years ago

Arturo,

Disabling the Oplog defeats the purpose of this tool and you will not have a consistent backup. Technically speaking the backup is not valid from the starting point as there are many mongos commands that can be run while the balancer is off that would cause issue with this process, also not all shard will hit the same shard key ranges at the same time so there is a change a manual moveChunk, shardCollection or other command will result in missing data.

In your case, you should replace mongodump completely with “createBackup” command. The challenge is this is a good deal of work in this tool as you need to run that command after preparing local folder to store the backup in, then you need to SCP/rsync the backups into the normal backup location. Maybe this works for your needs but it seem like this would be suboptimal for people using the flag not fully understanding the implication.

Cheers David

On Oct 2, 2018, at 9:03 PM, Arturo Ochoa notifications@github.com wrote:

We also have been affected with similar scenario as seen by @corey-hammerton https://github.com/corey-hammerton, In our case we have sharded instances with up to 64 shards, and heavy write load on them which we think its causing the oplog tailing to simply die. I created a small patch to consistent_backup to add an additional couple of flags to be able to disable oplog tailing, (at mongodump). With this in place we have had a better success rate, but not 100%. The problem of course is that the backup is valid only from the point it started, because with no oplog we can't guarantee a consistent backup to the point the dump finished, which for our biggest instances it can take up to 16 hours. https://github.com/arturoera/mongodb_consistent_backup/commits/disable_mongodump_oplog https://github.com/arturoera/mongodb_consistent_backup/commits/disable_mongodump_oplog — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Percona-Lab/mongodb_consistent_backup/issues/279#issuecomment-426410883, or mute the thread https://github.com/notifications/unsubscribe-auth/ADtQPB8fuYW56qVQIzelinGpLhrslHPoks5ug8algaJpZM4XEFTz.

corey-hammerton commented 5 years ago

I built an interim RPM in-house using the same process as in the Makefile into our setup with the new code. Unfortunately the updates to the threads have no effect on stuck processes after oplog resolver completion. Will update #277 with the relevant information.

timvaillancourt commented 5 years ago

Hi @corey-hammerton, I've cut a new 1.4.0 release here: https://github.com/Percona-Lab/mongodb_consistent_backup/releases/tag/1.4.0.

There are some PIP/PEX-related build issues with the debian8 binary, so it has been excluded for now.