Percona-Lab / mongodb_consistent_backup

A tool for performing consistent backups of MongoDB Clusters or Replica Sets
https://www.percona.com
Apache License 2.0
276 stars 81 forks source link

cannot dump with oplog if admin.system.version is modified #284

Closed zygis closed 5 years ago

zygis commented 5 years ago

I have a problem for a few days in a row. MCB stopped working. Every time errors are the same. Any ideas how to fix this?


[2018-11-04 07:03:16,271] [INFO] [TailThread-6] [TailThread:status:87] Oplog tailer c1-s3/production-mongodb-c1-s3-n2:27017 status: 1475116 oplog changes, ts: Timestamp(1541307837, 115)
[2018-11-04 07:03:16,326] [INFO] [TailThread-10] [TailThread:status:87] Oplog tailer c1-s2/production-mongodb-c1-s2-n2:27017 status: 2026403 oplog changes, ts: Timestamp(1541307837, 121)
[2018-11-04 07:03:17,028] [INFO] [TailThread-8] [TailThread:status:87] Oplog tailer c1-s4/production-mongodb-c1-s4-n2:27017 status: 1728395 oplog changes, ts: Timestamp(1541307838, 102)
[2018-11-04 07:03:17,172] [INFO] [TailThread-7] [TailThread:status:87] Oplog tailer c1-s5/production-mongodb-c1-s5-n1:27017 status: 1519438 oplog changes, ts: Timestamp(1541307838, 117)
[2018-11-04 07:03:17,864] [INFO] [MongodumpThread-17] [MongodumpThread:wait:130] c1-s1/production-mongodb-c1-s1-n2:27017:   [#.......................]  .oplog  208367/4937823  (4.2%)
[2018-11-04 07:03:19,498] [INFO] [MongodumpThread-17] [MongodumpThread:wait:130] c1-s1/production-mongodb-c1-s1-n2:27017:   [#.......................]  .oplog  236353/4937823  (4.8%)
[2018-11-04 07:03:19,499] [INFO] [MongodumpThread-17] [MongodumpThread:wait:130] c1-s1/production-mongodb-c1-s1-n2:27017:   Failed: error dumping oplog: error writing data for collection `.oplog` to disk: cannot dump with oplog if admin.system.version is modified
[2018-11-04 07:03:23,179] [ERROR] [MainProcess] [Stage:run:99] Stage mongodb_consistent_backup.Backup did not complete!
[2018-11-04 07:03:23,180] [CRITICAL] [MainProcess] [Main:exception:207] Problem performing backup! Error: Stage mongodb_consistent_backup.Backup did not complete!
[2018-11-04 07:03:23,180] [INFO] [MainProcess] [Main:cleanup_and_exit:163] Starting cleanup procedure! Stopping running threads
[2018-11-04 07:03:23,184] [INFO] [MainProcess] [Sharding:restore_balancer_state:142] Restoring balancer state to: True
[2018-11-04 07:03:23,191] [INFO] [MainProcess] [Mongodump:close:184] Stopping all mongodump threads
[2018-11-04 07:03:23,192] [INFO] [MainProcess] [Mongodump:close:192] Stopped all mongodump threads
[2018-11-04 07:03:26,198] [ERROR] [MainProcess] [Rsync:close:144] Stopping Rsync upload threads
[2018-11-04 07:03:26,300] [INFO] [MainProcess] [Main:cleanup_and_exit:193] Cleanup complete, exiting
[2018-11-04 07:03:26,301] [INFO] [MainProcess] [Logger:rotate:104] Running rotation of log files
[2018-11-04 07:03:26,301] [INFO] [MainProcess] [Logger:compress:83] Compressing log file: /var/log/mcb/backup.cluster1.20181103_0300.log```
zygis commented 5 years ago

Today backup completed successfully. But still interesting what could cause this situation.

timvaillancourt commented 5 years ago

Hi @zygis, this error is being returned by 'mongodump', which is called by mongodb_consistent_backup but it is a separate project here: https://github.com/mongodb/mongo-tools.

As we do not maintain 'mongodump' there is nothing we're able to do about this error. There must be a good reason mongodump throws this error and fails the backup but I'm not sure of it at this time, but I believe it is done to ensure consistency.

Closing (unfortunately) as wontfix.

zygis commented 5 years ago

Looks like balancer is the problem. Now with 15 shards, almost every run ends with the same error. I disabled balancer before backup and everything worked as expected. Is there a way to "sleep" for some time after disabling balancer?