apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.48k stars 1.31k forks source link

BackupCorrectnessClean test failure on 6.3 #4716

Closed sfc-gh-jfu closed 2 years ago

sfc-gh-jfu commented 3 years ago

This test runs into a timeout after spamming stdout/stderr with many messages that seem to be reporting the status of the backups. The tags backup1, backup2, and backup3 all appear several times.

Details: LogBytes written - 0 RangeBytes written - 0 Last complete log version and timestamp - 119205798, 1970/01/01.00:02:59+0000 Last complete snapshot version and timestamp - N/A, N/A Current Snapshot start version and timestamp - 119305798, 1970/01/01.00:02:59+0000 Expected snapshot end version and timestamp - 119305798, 1970/01/01.00:02:59+0000 Backup supposed to stop at next snapshot completion - Yes

{"SchemaVersion":"1.0.0","BackupAgentsPaused":false,"Tag":"backup2","UID":"fdcfb626c691e1d729f7d90a4396144d","Status":{"Name":"Running","Description":"has been started","Completed":false,"Running":true},"Restorable":false,"DestinationURL":"file://simfdb/backups/backup-1970-01-01-00-02-08.768193","StopAfterSnapshot":true,"SnapshotIntervalSeconds":87,"LogBytesWritten":0,"RangeBytesWritten":0,"LatestLogEnd":{"Version":119205798,"EpochSeconds":179,"Timestamp":"1970/01/01.00:02:59+0000"},"CurrentSnapshot":{"Begin":{"Version":119305798,"EpochSeconds":179,"Timestamp":"1970/01/01.00:02:59+0000"},"EndTarget":{"Version":119305798,"EpochSeconds":179,"Timestamp":"1970/01/01.00:02:59+0000"},"IntervalSeconds":0,"ExpectedProgress":100,"LastDispatch":{}},"Errors":[]}

The backup on tag `backup3' is in progress to file://simfdb/backups/backup-1970-01-01-00-02-28.342118.

BackupUID: 3e97e46a9b6a9d6b24d5871e9c1d1370 BackupURL: file://simfdb/backups/backup-1970-01-01-00-02-28.342118 Snapshot interval is 84 seconds. The initial snapshot is still running.

Eventually, the spam changes to messages of this nature instead:

Tag: backup2_0 UID: 6c6c70d39d7fbbaf8c12d3d88203f850 State: starting Blocks: 0/0 BlocksInProgress: 0 Files: 0 BytesWritten: 0 CurrentVersion: -1 ApplyVersionLag: 0 LastError: ''Backup key ranges doesn't overlap with key ranges filter' on 'restore_start'' 19s ago. 1794.674363 dfa3f4afbb14a4c00e5752e6e74e2285 Wait # 260 for 1 tasks to end

The timestamps and lack of any bytes written make me suspicious with the backup submission and the TaskBucket processing.

Ensemble 20210423-044228-nightly_correctness_release-6.3-0daa283027479e7c SHA https://github.com/apple/foundationdb/commit/2217c8904d18927f32ea7d80f93463e5608d90c6

sfc-gh-satherton commented 3 years ago

The problem here is the restore was started with invalid arguments by the the backup correctness workload. This causes this repeating error: 'Backup key ranges doesn't overlap with key ranges filter' during restore startup.

This was fixed recently by #4684 but will need to be cherry-picked to release-6.3.