duplicati / duplicati

Store securely encrypted backups in the cloud!
Other
11.3k stars 906 forks source link

recreate "Remote file referenced as ..." error after interrupted backup then compact #5023

Open ts678 opened 1 year ago

ts678 commented 1 year ago

Environment info

Description

I can reliably (with about 10 minutes on an interrupt test script) get an error like

Remote file referenced as dblock.zip by dindex.zip, but not found in list, registering a missing remote file

The error here is on recreate, but the one inside some restores can probably hit it.

Remote file referenced ... but not found in list, registering a missing remote file #4586 is possibly the same problem, but I have a better repro and analysis on this issue.

It looks like interrupt after dindex put during backup can leave DB thinking there's a missing dindex even though the dindex put worked. The next backup fixes this, but uses a different dindex name. This is different from the case in repair with just a lost dindex. Possibly here it doesn't know the name because IndexBlockLink didn't get filled out. Regardless, the extra dindex file is harmless until dblock+new dindex from the fix get deleted by a compact. After that, there's old dindex pointing to a dblock that's not there. Before that, maybe recreate was happy because it found two dindex, but both pointed to the same dblock, and had the same information, so no harm done. Just a guess.

Seems like a good time to stop for some feedback and questions based on data so far.

Here's a high-level log which is now embellished with some output from an SQL query:

SELECT "Name" FROM "Remotevolume" WHERE "Type" = "Index" AND ( "State" = "Verified" OR "State" = "Uploaded" ) AND "ID" NOT IN (SELECT "IndexVolumeID" FROM "IndexBlockLink")'

Running backup at 2023-09-10 08:11:30.415663
Exit code 0
Running recreate at 2023-09-10 08:11:38.921437
Running repair at 2023-09-10 08:11:38.921437
Exit code 0
Statistics: started: 18 timeout: 3 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:11:58.611870
Timed out after 3 seconds
Statistics: started: 19 timeout: 4 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:12:06.945352
Timed out after 4 seconds
Statistics: started: 20 timeout: 5 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:12:16.464174
Timed out after 10 seconds
Statistics: started: 21 timeout: 6 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:12:31.811753
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip
Running recreate at 2023-09-10 08:12:39.262241
Running repair at 2023-09-10 08:12:39.262241
Exit code 0
Statistics: started: 22 timeout: 6 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:13:01.969719
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip
Running recreate at 2023-09-10 08:13:09.442398
Running repair at 2023-09-10 08:13:09.442398
Exit code 0
Statistics: started: 23 timeout: 6 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:13:31.168298
Timed out after 2 seconds
Statistics: started: 24 timeout: 7 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:13:38.449298
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip
Running recreate at 2023-09-10 08:13:47.858636
Running repair at 2023-09-10 08:13:47.858636
Exit code 0
Statistics: started: 25 timeout: 7 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:14:11.596258
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip
Running recreate at 2023-09-10 08:14:26.134430
Running repair at 2023-09-10 08:14:26.134430
Exit code 0
Statistics: started: 26 timeout: 7 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:14:52.249091
Timed out after 6 seconds
Statistics: started: 27 timeout: 8 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:15:03.624671
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip
Running recreate at 2023-09-10 08:15:12.084502
Running repair at 2023-09-10 08:15:12.084502
Exit code 0
Statistics: started: 28 timeout: 8 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:15:37.768793
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip
Running recreate at 2023-09-10 08:15:46.206783
Running repair at 2023-09-10 08:15:46.206783
Exit code 0
Statistics: started: 29 timeout: 8 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:16:09.923359
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip
Running recreate at 2023-09-10 08:16:18.360990
Running repair at 2023-09-10 08:16:18.360990
Exit code 0
Statistics: started: 30 timeout: 8 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:16:45.544337
Timed out after 16 seconds
Statistics: started: 31 timeout: 9 missing: 0 extra: 0 marked: 0

Running backup at 2023-09-10 08:17:06.831278
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip
Running recreate at 2023-09-10 08:17:13.263109
Running repair at 2023-09-10 08:17:13.263109
Exit code 0
Remote file referenced as duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip by duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip, but not found in list, registering a missing remote file
Found 1 missing volumes; attempting to replace blocks from existing volumes
Found 1 missing volumes; attempting to replace blocks from existing volumes
Statistics: started: 32 timeout: 9 missing: 0 extra: 0 marked: 0

So let's try a glogg profiling log search, starting with those two files. Add on the original dindex, and Compacting so we can also get that context:

2023-09-10 08:11:35 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-09-10 08:12:09 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (38, "duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip", "Blocks", "Temporary", -1, 0, 0); SELECT last_insert_rowid();
2023-09-10 08:12:09 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (38, "duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip", "Blocks", "Temporary", -1, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.000
2023-09-10 08:12:09 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:12:09 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:12:10 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (38, "duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid();
2023-09-10 08:12:10 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (38, "duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.000
2023-09-10 08:12:10 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip (18.09 KB)
2023-09-10 08:12:10 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip (18.09 KB)
2023-09-10 08:12:20 -04 - [Information-Duplicati.Library.Main.Operation.Backup.RecreateMissingIndexFiles-RecreateMissingIndexFile]: Re-creating missing index file for duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip
2023-09-10 08:12:20 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (39, "duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid();
2023-09-10 08:12:20 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (39, "duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.000
2023-09-10 08:12:20 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip (2.29 KB)
2023-09-10 08:12:20 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip (2.29 KB)
2023-09-10 08:12:25 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-09-10 08:12:36 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-09-10 08:12:37 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip (18.09 KB)
2023-09-10 08:12:37 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip (18.09 KB)
2023-09-10 08:12:37 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: UPDATE "RemoteVolume" SET "VerificationCount" = MAX(1, CASE WHEN "VerificationCount" <= 0 THEN (SELECT MAX("VerificationCount") FROM "RemoteVolume") ELSE "VerificationCount" + 1 END) WHERE "Name" = "duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip"
2023-09-10 08:12:37 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: UPDATE "RemoteVolume" SET "VerificationCount" = MAX(1, CASE WHEN "VerificationCount" <= 0 THEN (SELECT MAX("VerificationCount") FROM "RemoteVolume") ELSE "VerificationCount" + 1 END) WHERE "Name" = "duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip" took 0:00:00:00.222
2023-09-10 08:13:06 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there are 1 fully deletable volume(s)
2023-09-10 08:13:44 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-09-10 08:14:16 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 34.15% wasted space and the limit is 25%
2023-09-10 08:14:24 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip (2.29 KB)
2023-09-10 08:14:24 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip (2.29 KB)
2023-09-10 08:14:24 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: UPDATE "RemoteVolume" SET "VerificationCount" = MAX(1, CASE WHEN "VerificationCount" <= 0 THEN (SELECT MAX("VerificationCount") FROM "RemoteVolume") ELSE "VerificationCount" + 1 END) WHERE "Name" = "duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip"
2023-09-10 08:14:25 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: UPDATE "RemoteVolume" SET "VerificationCount" = MAX(1, CASE WHEN "VerificationCount" <= 0 THEN (SELECT MAX("VerificationCount") FROM "RemoteVolume") ELSE "VerificationCount" + 1 END) WHERE "Name" = "duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip" took 0:00:00:00.184
2023-09-10 08:14:25 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:14:25 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:14:25 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: UPDATE "RemoteVolume" SET "VerificationCount" = MAX(1, CASE WHEN "VerificationCount" <= 0 THEN (SELECT MAX("VerificationCount") FROM "RemoteVolume") ELSE "VerificationCount" + 1 END) WHERE "Name" = "duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip"
2023-09-10 08:14:25 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: UPDATE "RemoteVolume" SET "VerificationCount" = MAX(1, CASE WHEN "VerificationCount" <= 0 THEN (SELECT MAX("VerificationCount") FROM "RemoteVolume") ELSE "VerificationCount" + 1 END) WHERE "Name" = "duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip" took 0:00:00:00.069
2023-09-10 08:14:57 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-09-10 08:15:08 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-09-10 08:15:43 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-09-10 08:16:14 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-09-10 08:16:49 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 28.32% wasted space and the limit is 25%
2023-09-10 08:16:50 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:16:50 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:16:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:16:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:16:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip (2.29 KB)
2023-09-10 08:16:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip (2.29 KB)
2023-09-10 08:16:52 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: UPDATE "RemoteVolume" SET "DeleteGraceTime" = 1694355412 WHERE "Name" = "duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip" 
2023-09-10 08:16:52 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: UPDATE "RemoteVolume" SET "DeleteGraceTime" = 1694355412 WHERE "Name" = "duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip"  took 0:00:00:00.000
2023-09-10 08:16:52 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: UPDATE "RemoteVolume" SET "DeleteGraceTime" = 1694355412 WHERE "Name" = "duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip" 
2023-09-10 08:16:52 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: UPDATE "RemoteVolume" SET "DeleteGraceTime" = 1694355412 WHERE "Name" = "duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip"  took 0:00:00:00.000
2023-09-10 08:16:52 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip"
2023-09-10 08:16:52 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip" took 0:00:00:00.000
2023-09-10 08:16:52 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip"
2023-09-10 08:16:52 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip" took 0:00:00:00.000
2023-09-10 08:17:11 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required

Notable moments:

2023-09-10 08:12:09 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:12:10 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip (18.09 KB)
2023-09-10 08:12:20 -04 - [Information-Duplicati.Library.Main.Operation.Backup.RecreateMissingIndexFiles-RecreateMissingIndexFile]: Re-creating missing index file for duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip
2023-09-10 08:12:20 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip (2.29 KB)

Running backup at 2023-09-10 08:12:31.811753
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip

2023-09-10 08:16:49 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 28.32% wasted space and the limit is 25%
2023-09-10 08:16:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip (4.99 MB)
2023-09-10 08:16:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-i0e19c49bd46b47fdafbb5bc0e2df893a.dindex.zip (2.29 KB)

Running backup at 2023-09-10 08:17:06.831278
Exit code 0
duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip
Running recreate at 2023-09-10 08:17:13.263109
Running repair at 2023-09-10 08:17:13.263109
Exit code 0
Remote file referenced as duplicati-b6587050dc3b146fa9cfcb1493aaf9751.dblock.zip by duplicati-icebfdb6777d24dda8b75fed3787c71ac.dindex.zip, but not found in list, registering a missing remote file

Steps to reproduce

  1. Well, there's a still-changing test script timeout5.py for Windows that I can provide and explain for someone if they're interested.

Screenshots

Debug log

gpatel-fr commented 1 year ago

Hello

could you attach the current version of your script ?

ts678 commented 1 year ago

timeout5.zip

To get it going the first time, uncomment the --create line to get random file. After that, the existing file just gets modified. random_file3.py is posted in the forum.

This script is a recent offshoot of timeout4.py which looked mainly at backup but (most recently -- things change sometimes) recreated the database after each run. The current config here leaves it to build up whatever damage it gets, and the recreate database is done only for testing the ability to get that done. The recreate has some oddities, such as giving a clean exit code despite problems. To maximize the chances of seeing some oddity, the approach here filters normal output, showing "extras".

gpatel-fr commented 1 year ago

Hello

after happily spending 2 more hours of my life understanding and translating the program(s) to Linux, it's running with latest trunk. I see timeouts but as I understand it, this is as intended. To follow up later...

gpatel-fr commented 1 year ago

I'm now at 483 tests, 49 timeouts and 0 problems. That's not much to go on. Edit: now that I think of it, in the Github automatic tests, the interrupt problems are always happening on the Windows instance, never on the platforms running Mono (Ubuntu and Mac).

ts678 commented 1 year ago

I'm now at 483 tests, 49 timeouts and 0 problems.

FWIW the timeout might need some tuning depending on system speed. I'm testing on an old CPU and a hard drive, i.e. slow. I'm not sure what to say about Windows versus non. I'd have expected similar results (after timing tuning), but I use Windows.

0 problems

Does this build have the fix for Missing, which is generally the most common? Extra is next. This one might be harder to hit. Do you have a debugger? I think my "Steps to reproduce" for Extra used one, and if need be maybe I can set steps up for this...

I'm still trying to find some way to hit these tight timings well programmatically (and not with random timing) using a test tool. Current thought is maybe use rclone backend with test wrapper around rclone. I've done such things before for random errors.

gpatel-fr commented 1 year ago

After 800 iterations I finally got 2 missings. Success ! To be continued...

ts678 commented 1 year ago

Try reducing the timeout. You were timing out about 1 out of 10 runs. One of my prolific runs was doing 1 out of 3. Stats here.

gpatel-fr commented 1 year ago

I tried it again this morning. The problem is that I am running this on a SSD and the backup is a lot faster, so I added a few variables to make the backup longer (raising the amount of data changed in the source file so that the compression takes longer). Attaching the ported Linux version to this post.

timeout.zip

I still have to investigate what happens, but now at least I understand what the code is doing.

ts678 commented 1 year ago

Thanks for the portability changes. I'm also thinking the first line might be better off naming python3 rather than python.

For something slower than an SSD, you could try using an old USB flash drive. Some of those look as slow as a hard drive.

gpatel-fr commented 1 year ago

Here is my last run result: (...) duplicati-id3193852be8f41a193bb127e1844a5e2.dindex.zip duplicati-i814abe9698e246a8ae4afd83b1a103a0.dindex.zip Running test recreate at 2023-09-29 13:09:04.216747 Statistics: started: 1468 timeout: 734 missing: 0 extra: 4 marked: 0

Running backup at 2023-09-29 13:09:09.212235 Timed out after 2 seconds Statistics: started: 1469 timeout: 735 missing: 0 extra: 4 marked: 0

So I had 4 results of the type

Running backup at 2023-09-29 13:01:44.946118 backup error code 100 stderr was:

ErrorID: ExtraRemoteFiles Found 1 remote files that are not recorded in local storage, please run repair

Running repair at 2023-09-29 13:01:47.180415

but these were fixed by the repair. Not a single occurrence of Remote file referenced as... in almost 3 hours and more than 700 interrupts. Maybe the difference is that I am running the test on trunk (it has warwickmm fix). If you are interested I could make a build of current trunk for you to use it instead of last canary.

what is not fixed by the repair is this kind of unpleasantness (as output by a test of of the resulting database):

duplicati-ic10487878b9b4122878d2708ee2560c2.dindex.zip: 70 errors Extra: +XD4zyC74Mh2AQVIDwU22KaQ4aG4rfbKpeG4SOFr54Q= Extra: /+7t4N+xvZ2cYgwnSd87KZdBSYGGkV2VaH0LE5wYivs= Extra: 1dt28QUbzAZMNnFCy04sBPJmV+tZr/m56qeY+ySGaZc= Extra: 1vlHJtS80uX54YBDHVugMCJem4YnNO30fy8B/aYsfBE= Extra: 1w/9XbiXKupCcuELN+xoM2MVFI7NWuhtcyg+jMuPtJI= Extra: 4AcRDaO3U/QRpiJhSFPMWibL1uiPkZEddptmu87hmVA= Extra: 4CJUpnMOT72ak73bGHDI4XPY0raz3yukeyGYiVtur3M= Extra: 5fgQlLcDjF9Sxn9xgCqyU6BnrZCJwxuN3BrmXTRvl00= Extra: 5jnu2ZXEuwVI4IwrCwcKCfGBCW8LVyhtnEITgPQ5AIw= Extra: 9vyzpFcvQ4I7jfOUL5mBT2pKZys2D6wq5G5zveABY7Q= ... and 60 more

I know that @jojo-1000 has a fix for that but I am not keen on it as it's broad and complex. The main problem I have with it is that it would take so much time to evaluate it and be sure that it don't break other things.

One the one hand, it's very annoying to have a db inconsistency that can't be fixed by the repair. On the other hand, AFAICT it's not a breaking problem, that is, it don't block restoring data.

ts678 commented 1 year ago

ErrorID: ExtraRemoteFiles

After interrupted backup, next backup failed with remote files that are not recorded in local storage #4485 might be that. There are steps to reproduce, but they call for using a breakpoint -- because I don't have a good enough testing tool yet.

If you can set breakpoints, that might help reproduce this one as well, because possibly the timing is hard to hit reliably...

OR

Maybe the difference is that I am running the test on trunk (it has warwickmm fix).

Although it's less sure due to other differences, it certainly seemed to drive the Missing failure away (except for test Extra). I think I've been living with those for a long time, they appear to be harmless, and we're starting to understand the cause. Especially since it seems there's a lot to it, I'd be happy with letting them go in a later round of fixes. Maybe this issue too.

Sorry it's so hard to cause. If you have an old retired hard drive gathering dust, those can also be attached using USB, OR linux: how to simulate hard disk latency? I want to increase iowait value without using a cpu power offers a software path.

I paused poking deeper when you began, but I can either take a test build to see if it solves it, or try gathering more data. The test script already has a little bit of ability to keep a database rotation, and I might be able to put this together better.

ts678 commented 1 year ago

One the one hand, it's very annoying to have a db inconsistency that can't be fixed by the repair. On the other hand, AFAICT it's not a breaking problem, that is, it don't block restoring data.

I'm not sure it blocks anything, however I'd change my mind fast if an impact is found beyond some space waste and test noise. Because I have a recreated database from my other hypersensitive-to-Backblaze-errors test, I looked to see if DB recreate stops test noise. Answer is no, however it did move the ten duplicate blocks to the DuplicateBlock table. Here are the comparisons:

Recreated database showing DuplicateBlock table now put to use:

image

Output from test:

image

Output from checker12.py:

Duplicate block r3pM2JRfPVw3tQmMFciQSDTjT73FgqGQEHlGyy4lXh4=
Duplicate block HIt91bfHXnwKVgBWVIlsaA3c3m1eXmDQNRH8x4MSwHM=
Duplicate block h8skx4xWsRIdiAaEsYZM7fSwqhpumbJxAdDssIm27FQ=
Duplicate block ctwK16rxslY8CN98I6Dk7X5v3r10IpsV0vVIGs9pe6w=
Duplicate block VmA3Ot6j6iYs1OyIsJAQ5YHwPIIg4h8oVxx6pb5oypM=
Duplicate block jNvDGAQ36s2mzKRrgNGRvopxEoBn/ST69C7Ge/jP2DU=
Duplicate block cDIujNWplCJDpwvkXxYU87mAR3JTfCdXhCbO3nwWNek=
Duplicate block 9alEVhYd/Ac1eGLIOUcVUEA9EnzBNjOWiSmLF/yn7bQ=
Duplicate block lfMF0Gq3owd98uXLsN9SsgaQPpO6gMNjHi+hFbYbINU=
Duplicate block GavOCeqATk4NVX7YnPCwPyyadBMX8542sxsmzkNgKEA=
gpatel-fr commented 1 year ago

I have run the adapted (attached, now really portable) code on Windows and Linux now and despite running for hours, I have never seen a rebuild failure, even when using a remote backend. I think that I will try the breakpoint trick and if it does not show anything new, I'll give up on this problem as not reproducible. timeout.zip

gpatel-fr commented 1 year ago

Here is a build of current trunk:

https://github.com/duplicati/duplicati/actions/runs/6390728398#artifacts

ts678 commented 1 year ago

Thanks. Running. So far the Missing situation looks good (as expected), got an Extra, but haven't hit the issue written here, except on first run when possibly there was some leftover from prior error. I fresh-started on DB and destination (saved it at switch time, in case I need to revert things to continue with old setup), and will test and switch things until some pattern arises. Will take time.

ts678 commented 1 year ago

Got one. Interesting stats here. This isn't all of them because it's a restart after I stopped the first run to post the above update. First few tries were also a bit of a mess because timeout seemingly clipped Duplicati's initial DB setup. The next run got unhappy seeing some tables already there. Looking manually, more than half (but not all) were there, and all of them were empty. Solved problem by temporarily setting a long timeout until database got going. Then Windows kept pausing activity to run virus scans, repeatedly over multiple runs. This caused timeouts. Now that thing sort of settled down (I hope), here's the ending of new run:

Running backup at 2023-10-03 08:49:20.286532
Exit code 0
duplicati-i75779c3c72e845e59325844b67818a12.dindex.zip
Running recreate at 2023-10-03 08:49:27.535066
Running repair at 2023-10-03 08:49:27.536162
Exit code 0
Statistics: started: 8 timeout: 1 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 08:49:49.257841
Timed out after 7 seconds
Statistics: started: 9 timeout: 2 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 08:50:01.670172
Timed out after 28 seconds
Statistics: started: 10 timeout: 3 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 08:50:48.594584
Timed out after 1 seconds
Statistics: started: 11 timeout: 4 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 08:50:55.462649
Timed out after 6 seconds
Statistics: started: 12 timeout: 5 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 08:51:06.941900
Exit code 0
duplicati-i75779c3c72e845e59325844b67818a12.dindex.zip
Running recreate at 2023-10-03 08:51:23.277162
Running repair at 2023-10-03 08:51:23.277162
Exit code 0
Remote file referenced as duplicati-b3d6c2957d5cd44f4808c4670cb7d1a3f.dblock.zip by duplicati-i75779c3c72e845e59325844b67818a12.dindex.zip, but not found in list, registering a missing remote file
Found 1 missing volumes; attempting to replace blocks from existing volumes
Found 1 missing volumes; attempting to replace blocks from existing volumes
Statistics: started: 13 timeout: 5 missing: 0 extra: 0 marked: 0
gpatel-fr commented 1 year ago

In my version of your code I have indeed disabled the timeout in the first run because if this happens the database is not correct, and the next run fails immediately. It's not a great way to test interruptions. Also, I reset completely the situation before running:

rm *.sqlite -rf
rm for_kill_test_1.log -rf
rm dest/* -rf
python timeout.py -i=18 -p=60 -s=80000 /shared/duplicati/Duplicati/GUI/Duplicati.GUI.TrayIcon/bin/Debug

or for windows test:

del /Q *.sqlite
del for_kill_test_1.log
del /Q dest\*.*
python timeout.py -i 12 -s 70 -s 90000 "c:\program files\duplicati 2"

if you don't do something like that your tests will not be reliable.

ts678 commented 1 year ago

I'm not seeing such -i -p and -s options on the Python man page or Windows built in --help, so I hope that part doesn't matter. On previous run, I did clean up the destination and database, but I also did a clean (during sleep at end) kill, then a script restart.

Regardless, it was easy to do an an even cleaner test (except for the options). To help database, first run had long timeout, then I put the timeout back and started again. After 20 minutes of that continuous second run, issue error came up, and it looked like:

Running backup at 2023-10-03 11:03:56.233179
Exit code 0
duplicati-i56b579aff85d41338c9e9311f236ab8c.dindex.zip
Running recreate at 2023-10-03 11:04:03.471799
Running repair at 2023-10-03 11:04:03.472799
Exit code 0
Statistics: started: 48 timeout: 11 missing: 0 extra: 1 marked: 0

Running backup at 2023-10-03 11:04:24.999199
Timed out after 3 seconds
Statistics: started: 49 timeout: 12 missing: 0 extra: 1 marked: 0

Running backup at 2023-10-03 11:04:33.285586
Timed out after 6 seconds
Statistics: started: 50 timeout: 13 missing: 0 extra: 1 marked: 0

Running backup at 2023-10-03 11:04:44.916977
Exit code 0
duplicati-i56b579aff85d41338c9e9311f236ab8c.dindex.zip
Running recreate at 2023-10-03 11:05:00.281617
Running repair at 2023-10-03 11:05:00.282617
Exit code 0
Remote file referenced as duplicati-b944738c59962417693cee644ee518ee3.dblock.zip by duplicati-i56b579aff85d41338c9e9311f236ab8c.dindex.zip, but not found in list, registering a missing remote file
Found 1 missing volumes; attempting to replace blocks from existing volumes
Found 1 missing volumes; attempting to replace blocks from existing volumes
Statistics: started: 51 timeout: 13 missing: 0 extra: 1 marked: 0

I enhanced the repair to make a profiling log, and used the same one as the backup uses in order to get a more integrated view.

In glogg log viewer, I used a filter on the two files of interest, plus a couple of other key markers that might be of some interest:

duplicati-b944738c59962417693cee644ee518ee3.dblock.zip|duplicati-i56b579aff85d41338c9e9311f236ab8c.dindex.zip|MissingFileDetected|Compacting

An interleaved annotated commentary:

Running backup at 2023-10-03 10:59:13.773314
Timed out after 4 seconds
Statistics: started: 38 timeout: 11 missing: 0 extra: 0 marked: 0

2023-10-03 10:59:17 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-b944738c59962417693cee644ee518ee3.dblock.zip (4.91 MB)
2023-10-03 10:59:17 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-i56b579aff85d41338c9e9311f236ab8c.dindex.zip (18.09 KB)

Running backup at 2023-10-03 10:59:23.146377
Exit code 100
stderr was:

ErrorID: ExtraRemoteFiles
Found 1 remote files that are not recorded in local storage, please run repair

Running repair at 2023-10-03 10:59:26.343708
Exit code 0
duplicati-i56b579aff85d41338c9e9311f236ab8c.dindex.zip
Running recreate at 2023-10-03 10:59:30.574240
Running repair at 2023-10-03 10:59:30.574863
Exit code 0
Statistics: started: 39 timeout: 11 missing: 0 extra: 1 marked: 0

2023-10-03 10:59:29 -04 - [Information-Duplicati.Library.Main.Operation.RepairHandler-AcceptNewIndexFile]: Accepting new index file duplicati-i56b579aff85d41338c9e9311f236ab8c.dindex.zip

Running backup at 2023-10-03 11:04:44.916977
Exit code 0
duplicati-i56b579aff85d41338c9e9311f236ab8c.dindex.zip
Running recreate at 2023-10-03 11:05:00.281617
Running repair at 2023-10-03 11:05:00.282617
Exit code 0
Remote file referenced as duplicati-b944738c59962417693cee644ee518ee3.dblock.zip by duplicati-i56b579aff85d41338c9e9311f236ab8c.dindex.zip, but not found in list, registering a missing remote file
Found 1 missing volumes; attempting to replace blocks from existing volumes
Found 1 missing volumes; attempting to replace blocks from existing volumes
Statistics: started: 51 timeout: 13 missing: 0 extra: 1 marked: 0

2023-10-03 11:04:49 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 29.03% wasted space and the limit is 25%
2023-10-03 11:04:49 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b944738c59962417693cee644ee518ee3.dblock.zip (4.91 MB)
2023-10-03 11:04:49 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b944738c59962417693cee644ee518ee3.dblock.zip (4.91 MB)
2023-10-03 11:04:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-b944738c59962417693cee644ee518ee3.dblock.zip (4.91 MB)
2023-10-03 11:04:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-b944738c59962417693cee644ee518ee3.dblock.zip (4.91 MB)

Source assist:

https://github.com/duplicati/duplicati/blob/f6440a8e706b3c30ac430205853b21d50796d6cd/Duplicati/Library/Main/Operation/RepairHandler.cs#L238-L241

From early look, it looks like it began by doing the usual dblock and dindex put, lost track of the dindex (DB rollback?), accepted dindex back in (did it do that right?), deleted original paired dblock, and failed because original paired dindex was still there. The databases are really messy here. The one-back database at 11:04:38 has Verified files of 19 dblock and 19 dindex (good) but IndexBlockLink has 21 rows with a -1 as the dindex ID for duplicati-b944738c59962417693cee644ee518ee3.dblock.zip. The final working database at 11:04:59 has Verified files of 15 dblock and 16 dindex, with IndexBlockLink having 15 rows. The testrecreate database at 11:05:14 has Verified files of 15 dblock and 16 dindex, with IndexBlockLink having 16 rows.

gpatel-fr commented 1 year ago

such -i -p and -s options

I have modified your code in the version I attached here: https://github.com/duplicati/duplicati/issues/5023#issuecomment-1743874894

Basic problem is, I can't reproduce in at least 3 runs 10 times bigger than you are doing, on Linux AND windows, with a lot more interruptions. It's not possible to debug that.

ts678 commented 1 year ago

I have modified your code in the version I attached here

I know you changed it quite a bit. I downloaded your file before, looked at diffs in WinMerge, but I still can't spot those options, although I don't know all the ways Python may pass them in. I generally use argparse, but I think one can rummage in sys.argv. Nothing like either is in the script that I can see, and single-character options are hard to guess, e.g. -p=60 or -s 70 (a typo?).

Possibly the version I picked up isn't the same as your latest? Regardless, anything wrong with the method that I had just used? Requiring a full Duplicati backup reset after every Duplicati backup is not the usual use scenario, and I'm being pretty careful to be nice to the backup except for the timeout kills that the script is doing. Sorry you can't cause this. Are all tests still on an SSD?

Regardless, until a developer can reproduce this and dig, I guess it's on me to try to look at the situation as much as I'm able to. There is plenty more that could be done to get better history, e.g. at intermediate points of interest (e.g. as named) on the way.

There is also still the hope that someday there will be a better tester to force specific scenarios to occur reliably to run as repro. Breakpoint was used in the other issue on Extra, but it's awkward. I'd like something nicer, even if some manual control is used.

Possibly I can write something in Python that will be enough for an operator to say something like do it, fail it, or kill Duplicati... Previous test efforts used either random backend failure or (I think) a file read that would say whether to fail or not fail request.

ts678 commented 1 year ago

Possibly the version I picked up isn't the same as your latest

I see my problem. You posted two versions, and I was still looking at the first one. I'll see if the new options seem interesting.

EDIT 1:

They look noncritical, so I might just stick with the tunings I have (at least for now), as they seem pretty good at getting this.

timeout.py --help
usage: timeout.py [-h] [-i INTERRUPT_INTERVAL] [-p PERCENT_CHANGE] [-s SIZE] [-u DESTINATION_URI] path_to_duplicati

Duplicati interrupts test app

positional arguments:
  path_to_duplicati     path to duplicati executable directory

options:
  -h, --help            show this help message and exit
  -i INTERRUPT_INTERVAL, --interrupt-interval INTERRUPT_INTERVAL
                        max interrupt interval in seconds
  -p PERCENT_CHANGE, --percent-change PERCENT_CHANGE
                        percent of source file to change
  -s SIZE, --size SIZE  size of data to change in a given block (must be < to block size)
  -u DESTINATION_URI, --destination-uri DESTINATION_URI
                        path to file containing a custom uri (default is 'file' uri with destination = 'dest' subdirectory)

EDIT 2:

The portability improvements and the use of a current directory for things (unlike old everywhere plan) were tempting. Thanks. It did seem like it still needs some manual editing to create its file the first time, but at least it sets a long timeout on first loop.

ts678 commented 1 year ago

Running your latest version at defaults, as somewhat described above, produced the following interesting points in initial run:

Running backup at 2023-10-03 18:23:22.541726
Running test recreate at 2023-10-03 18:23:30.796273
Statistics: started: 14 timeout: 2 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 18:23:52.326412
Timed out after 6 seconds
Statistics: started: 15 timeout: 3 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 18:24:03.495774
Timed out after 18 seconds
Statistics: started: 16 timeout: 4 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 18:24:26.676681
backup terminated successfully but database is inconsistent.
duplicati-id165dc4df6e24cff8bc3c3b207f1e93c.dindex.zip
Running test recreate at 2023-10-03 18:24:33.906013
Statistics: started: 17 timeout: 4 missing: 0 extra: 0 marked: 0
Running backup at 2023-10-03 18:33:01.546970
backup terminated successfully but database is inconsistent.
duplicati-id165dc4df6e24cff8bc3c3b207f1e93c.dindex.zip
Running test recreate at 2023-10-03 18:33:10.829733
Statistics: started: 37 timeout: 11 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 18:33:46.084920
Timed out after 23 seconds
Statistics: started: 38 timeout: 12 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 18:34:14.257995
Timed out after 8 seconds
Statistics: started: 39 timeout: 13 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 18:34:27.526412
Timed out after 1 seconds
Statistics: started: 40 timeout: 14 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 18:34:33.645955
backup error code 100
stderr was:

Duplicati.Library.Interface.RemoteListVerificationException: Found 1 remote files that are not recorded in local storage, please run repair
   at Duplicati.Library.Main.Operation.FilelistProcessor.VerifyRemoteList(BackendManager backend, Options options, LocalDatabase database, IBackendWriter log, IEnumerable`1 protectedFiles)
   at Duplicati.Library.Main.Operation.BackupHandler.PreBackupVerify(BackendManager backend, String protectedfile)
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at CoCoL.ChannelExtensions.WaitForTaskOrThrow(Task task)
   at Duplicati.Library.Main.Controller.<>c__DisplayClass14_0.<Backup>b__0(BackupResults result)
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.Backup(String[] inputsources, IFilter filter)
   at Duplicati.CommandLine.Commands.Backup(TextWriter outwriter, Action`1 setup, List`1 args, Dictionary`2 options, IFilter filter)
   at Duplicati.CommandLine.Program.ParseCommandLine(TextWriter outwriter, Action`1 setup, Boolean& verboseErrors, String[] args)
   at Duplicati.CommandLine.Program.RunCommandLine(TextWriter outwriter, TextWriter errwriter, Action`1 setup, String[] args)

Statistics: started: 41 timeout: 14 missing: 0 extra: 0 marked: 0

Source assist:

                ls = []
                for row in cursor.execute('SELECT "Name" FROM "Remotevolume" WHERE "Type" = "Index" AND ( "State" = "Verified" OR "State" = "Uploaded" ) AND "ID" NOT IN (SELECT "IndexVolumeID" FROM "IndexBlockLink")'):
                    ls.append(row[0])
                connection.close()
                if ls != []:
                    print(r'backup terminated successfully but database is inconsistent.')

If you still have only SSDs and aren't willing to try a slow device on USB, the throttle options do work on files, plus if you have an Internet destination to use, those are somewhat inherently slow anyway (at least on a single connection, compared to local SSD).

gpatel-fr commented 1 year ago

Yes, I forgot there was a bug in this version:-( Fixed version timeout.zip

ts678 commented 1 year ago

It might be necessary to switch the delay to a float instead of using random.randint which possibly fits my system timing fairly consistently and misses yours equally consistently. Possibly random.uniform could be used instead, to ensure all points get hit.

One consistent looking profiling log correlation with upcoming problem is where CommitAfterUpload starts but doesn't finish. Windows older tools are pretty crude, but a search led me to Grep, the PowerShell way where I've been looking at results from various recent logs including my script and yours posted yesterday and today. Script from today just noticed its first anomaly:

Running backup at 2023-10-03 20:29:09.222903
Running test recreate at 2023-10-03 20:29:26.568806
Statistics: started: 131 timeout: 33 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 20:29:46.029967
Timed out after 1 seconds
Statistics: started: 132 timeout: 34 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 20:29:52.182181
Timed out after 4 seconds
Statistics: started: 133 timeout: 35 missing: 0 extra: 0 marked: 0

Running backup at 2023-10-03 20:30:01.415965
backup terminated successfully but database is inconsistent.
duplicati-i65e9d8f98c4541fc92f2e39b0b748f8a.dindex.zip
duplicati-ic808036500e7476586d52fb556bb48f9.dindex.zip
Running test recreate at 2023-10-03 20:30:10.674854
Statistics: started: 134 timeout: 35 missing: 0 extra: 0 marked: 0

sls "The operation Backup has started" for_kill_test_1.log -Context 1,0 gives one line of context before, so look for:

  for_kill_test_1.log:241848:2023-10-03 20:29:56 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Operation.Common.DatabaseCommon-CommitTransactionAsync]: Starting - CommitAfterUpload
> for_kill_test_1.log:241849:2023-10-03 20:30:02 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started

but this as a sensitive spot was kind of known already. The problem was hitting it at the right time, so random float might help. Alternative to a breakpoint could be a custom build that just slows down around there, to create a larger window for a timeout.

gpatel-fr commented 1 year ago

FIY I have done a few long tests with remote backends (SSH and AFTP) and the result was even more stable than with the file backend (according to the counters). Well, in my last run log I have 4 occurrences of 'Starting - CommitAfterUpload' before 'Backup has started'. Now for thinking a bit differently, since I can't repro and you can so easily, there must be something different that explains it. First of all, I have assumed that Windows is Windows. This is not necessarily the case, I have used Windows server for my tests, you are using Windows 10. I have a Win 10 VM somewhere, I will try the tests again with it. Now, on Windows my file tests were done on a NAS, that is, a network drive (to be slower, because it has classic hard drives while the server has SSD). But it may be that the system handles differently networked files. For example, you mentioned antivirus. Did you disable the Windows real time antivirus scanning on your backend ?

ts678 commented 1 year ago

For example, you mentioned antivirus. Did you disable the Windows real time antivirus scanning on your backend ?

No disable, but the virus scans I mentioned looked more like executable scanning, e.g. in Windows Defender event log:

Microsoft Defender Antivirus has uploaded a file for further analysis. Filename: C:\tmp\duplicati-2.0.7.3_canary_2023-10-03\Duplicati.Library.Backend.Rclone.dll Sha256: 6b031aa91395c28fd264fe7b1dad07436edaaa41bff8c9a66f37b30f98f7b4c4

Well, in my last run log I have 4 occurrences of 'Starting - CommitAfterUpload' before 'Backup has started'.

Are you seeing any anomalies at all (I don't mean all the way to recreate failure, just anything odd) when they happen? This seems to be a multi-step failure (which if true probably also reduces its chances of happening, even if I can hit it...).

there must be something different that explains it.

What do you think of my timing hypothesis (although admittedly your ability to kill during commit weakens that idea)? Especially on a fast system with probably more predictable timing (no hard drive), integer timeout may miss weak spot.

What I could probably try is to test my version of the script, and see if trying non-integer timeouts weakens that repro. Puzzlingly, your version doesn't repro this recreate fail, even after running overnight. I don't know if any logic changed. Superficially the new tuning flag defaults look like the same ones I had hardcoded. Both seem to make dblocks of 5 MB containing about 50 blocks, which is right for a 10% change on a 500 block data file. It also means that hitting a dindex CommitAfterUpload (be careful which ones you look at, and in what context) is a one-try-per-backup thing with unknown timings, so it's hard to target the weak spot.

I'm still not sure how to force the timings I want. Guided methods include hooking in at the backend API in an rclone wrapper, monitoring the Duplicati log or console output in real time, or using a filesystem watcher to notice when a dindex file shows up.

The Duplicati output watcher is probably the most portable scheme, however it might interfere with the nice easy timed killing...

ts678 commented 1 year ago

since I can't repro and you can so easily, there must be something different that explains it.

One significant change in even 9/29 post script is the total reliance on the recreate exit code:

if exit_code == 0: return

I began a search by looking inside the most ancient extra dindex, and seeing a bad dblock ref. Running manual recreate, its log file at error level got lots of error messages yet exit was zero. The early return also prevents the display of that extra noise followed by the stop of the loop.

My logic does not know what exactly Duplicati will error, so it prints "surprise" lines for review.

But despite the insensitivity to anomalies, the latest script did finally succumb. Its final output:

Running backup at 2023-10-04 12:54:15.073233
backup error code 100
Running repair at 2023-10-04 12:54:19.294806
backup terminated successfully but database is inconsistent.
duplicati-i65e9d8f98c4541fc92f2e39b0b748f8a.dindex.zip
duplicati-ic808036500e7476586d52fb556bb48f9.dindex.zip
duplicati-i9428040616ed442fbe1e701e47ae3002.dindex.zip
duplicati-i106e008c1c0444f6aefb4570306dba48.dindex.zip
duplicati-i5483d1786fa14435955f988721c67ddc.dindex.zip
duplicati-i27149bd4b3b04e7393845d817c53065a.dindex.zip
duplicati-i13deab4634f84df09a4d85839d31b0eb.dindex.zip
duplicati-i2adba8b5e7ce4337bd14859defde4b5c.dindex.zip
duplicati-i203faaa3bcd6456d8997015a1dfef455.dindex.zip
duplicati-i354046602ff54251a7083cf38ac8eec1.dindex.zip
duplicati-icd23d251c4e14544b739696970029008.dindex.zip
duplicati-ibb705861afdc4042b1591ca8567fa1df.dindex.zip
duplicati-i8a09537d6baa473aa6b8c1d12163d783.dindex.zip
duplicati-i203010e424974f1aac44c9f876daec3a.dindex.zip
duplicati-iad2b7127e05643c49d627b9ef31f79b8.dindex.zip
duplicati-ib76b542136cf48329d6e9a861d210441.dindex.zip
duplicati-i499f9f90f2b0407e8ece32d071610a59.dindex.zip
duplicati-i44899f9068fb4d08bdca00689d164dc5.dindex.zip
duplicati-ia856a188b8214fceb6a71ea3ff82b1a4.dindex.zip
duplicati-i6c4659f7a4b84c7ea6f5508f5a5f7ed9.dindex.zip
duplicati-i0262e2b953ad476695256fd0c3e6f40c.dindex.zip
duplicati-if6ebae8dfcf7402a9abfe7e0ad9d3764.dindex.zip
duplicati-i2a85d4154f8b447485aded0682f3a941.dindex.zip
duplicati-i840cf1b45bc6480891b411f7a707c67a.dindex.zip
duplicati-i649955627b424c8b8fa81ca08a46fe3d.dindex.zip
duplicati-i655ca7f2f4d04328982d97fc27140d4f.dindex.zip
duplicati-i3ec831d6969a49a6968e50e011eddc09.dindex.zip
duplicati-i24030aa33bab4f76a24e641e49bbf260.dindex.zip
duplicati-i07f67d77b65844be9bca59f18d82fb69.dindex.zip
duplicati-i948b4e87b2524743ad37b6b950b981ff.dindex.zip
duplicati-i81382298a4754ded82b230eeb5f5f103.dindex.zip
duplicati-i5eb6444ad4da44cdabc4c8de30a77099.dindex.zip
duplicati-ifacddde412224d85a299cab0c355ddf9.dindex.zip
Running test recreate at 2023-10-04 12:54:22.596760
Statistics: started: 2189 timeout: 762 missing: 0 extra: 54 marked: 0

Running backup at 2023-10-04 12:54:58.320715
backup terminated successfully but database is inconsistent.
duplicati-i65e9d8f98c4541fc92f2e39b0b748f8a.dindex.zip
duplicati-ic808036500e7476586d52fb556bb48f9.dindex.zip
duplicati-i9428040616ed442fbe1e701e47ae3002.dindex.zip
duplicati-i106e008c1c0444f6aefb4570306dba48.dindex.zip
duplicati-i5483d1786fa14435955f988721c67ddc.dindex.zip
duplicati-i27149bd4b3b04e7393845d817c53065a.dindex.zip
duplicati-i13deab4634f84df09a4d85839d31b0eb.dindex.zip
duplicati-i2adba8b5e7ce4337bd14859defde4b5c.dindex.zip
duplicati-i203faaa3bcd6456d8997015a1dfef455.dindex.zip
duplicati-i354046602ff54251a7083cf38ac8eec1.dindex.zip
duplicati-icd23d251c4e14544b739696970029008.dindex.zip
duplicati-ibb705861afdc4042b1591ca8567fa1df.dindex.zip
duplicati-i8a09537d6baa473aa6b8c1d12163d783.dindex.zip
duplicati-i203010e424974f1aac44c9f876daec3a.dindex.zip
duplicati-iad2b7127e05643c49d627b9ef31f79b8.dindex.zip
duplicati-ib76b542136cf48329d6e9a861d210441.dindex.zip
duplicati-i499f9f90f2b0407e8ece32d071610a59.dindex.zip
duplicati-i44899f9068fb4d08bdca00689d164dc5.dindex.zip
duplicati-ia856a188b8214fceb6a71ea3ff82b1a4.dindex.zip
duplicati-i6c4659f7a4b84c7ea6f5508f5a5f7ed9.dindex.zip
duplicati-i0262e2b953ad476695256fd0c3e6f40c.dindex.zip
duplicati-if6ebae8dfcf7402a9abfe7e0ad9d3764.dindex.zip
duplicati-i2a85d4154f8b447485aded0682f3a941.dindex.zip
duplicati-i840cf1b45bc6480891b411f7a707c67a.dindex.zip
duplicati-i649955627b424c8b8fa81ca08a46fe3d.dindex.zip
duplicati-i655ca7f2f4d04328982d97fc27140d4f.dindex.zip
duplicati-i3ec831d6969a49a6968e50e011eddc09.dindex.zip
duplicati-i24030aa33bab4f76a24e641e49bbf260.dindex.zip
duplicati-i07f67d77b65844be9bca59f18d82fb69.dindex.zip
duplicati-i948b4e87b2524743ad37b6b950b981ff.dindex.zip
duplicati-i81382298a4754ded82b230eeb5f5f103.dindex.zip
duplicati-i5eb6444ad4da44cdabc4c8de30a77099.dindex.zip
duplicati-ifacddde412224d85a299cab0c355ddf9.dindex.zip
Running test recreate at 2023-10-04 12:55:10.650327
Statistics: started: 2190 timeout: 762 missing: 0 extra: 54 marked: 0

Running backup at 2023-10-04 12:55:43.922786
backup error code 100
Running repair at 2023-10-04 12:55:48.154066
Repair error code 100
stderr was:

ErrorID: MissingDblockFiles
The backup storage destination is missing data files. You can either enable `--rebuild-missing-dblock-files` or run the purge command to remove these files. The following files are missing: duplicati-b11b3eb27a9c34d9eb24ae0d8a03b4504.dblock.zip, duplicati-b2af422f23319450ea18cc744d350b71a.dblock.zip, duplicati-b87f5ff26a44c45fcafb50a812c3306ee.dblock.zip

error running repair / recreate, exiting...
Statistics: started: 2191 timeout: 762 missing: 1 extra: 54 marked: 0

EDIT:

Interestingly, I can't find the problem the last message was complaining about. I see signs that some dblocks were once there (examples being the backup profiling log and the database), but the dblock I sampled might have been deleted by something.

Regardless, the database now noisily recreates, and didn't have to read any dblock files. A GUI Recreate popup says 33 errors. I think there might still be some loose ends on exact sequences, but this starts to explain why the revised script is saying little.

gpatel-fr commented 1 year ago

One significant change in even https://github.com/duplicati/duplicati/issues/5023#issuecomment-1740464433 script is the total reliance on the recreate exit code

You are right ! I missed that your script was stopping while the repair exit code was 0 (success). This is the reason. From my point of view if the repair succeeds, Duplicati is still working so it withstood successfully a crash - I did not understand the stop = true if there were error lines in the repair output. Is it an anomaly that the backend has troubles when the application is stopped without any care at all ? Especially after more than 700 crashes. I'd say that Duplicati is incredibly resilient at least in this use case. Has anyone submitted the backup market leaders to such trashing I wonder.

ts678 commented 1 year ago

Especially after more than 700 crashes.

and probably 54 scripted repairs to fix the Extra problem. It'd have been better if Repair weren't needed. Missing files type of error (and Repair) is looking pretty good except for the last run, so the fix seems to be holding up -- and test still has Extra blocks...

All in all, looking not too bad for a tough situation (although I haven't yet tried a restore). I'm still looking for a way to repro some uglier breakage (e.g. pass 3 of dblock downloads, or total recreate failure even after that), but haven't found a path to disaster yet.

With this one, there's also a question of whether the user will perceive a disaster when Duplicati pops up with a big red error box. Presumably, the database recreate inside a "Direct restore" could also hit this, and make them wonder if the restore worked right. There are reports around on this message, and it's usually a struggle for support to handle or direct user to fixing the error safely.

The results from my test script seemed inexplicably worse (even after knowing about message lack), so I might still poke it further.

Jojo-1000 commented 1 year ago

I was able to reproduce the original issue in a test case:

It looks like interrupt after dindex put during backup can leave DB thinking there's a missing dindex even though the dindex put worked.

Your assumptions about that seem to be correct. Not sure how that can be prevented though. A recreate between extra index creation and compact fixes it, because the link table is recreated from scratch and compact will delete both index files after that. Why is the extra index not detected as an extra file to require repair? Or we need to make the upload transaction more resilient, so it does not "forget" about the file.

I also got the missing dblock error once in tests, however that might have been the first time backup, so the database might not have been properly created.

gpatel-fr commented 1 year ago

I have finally added back the repair messages in my version - also added a --create option to automatically recreate the source file.

timeout.zip

All considered, the root cause seems pretty obvious to me: Duplicati is basically wrapping all its operations in the database, then waiting for the upload to finish. As the data files (dblock) are big compared to the index files, when a crash occurs, the index files are safely on the backend, but not the data files. While one could consider all kinds of schemes to make this better, it all comes from system failures. Expecting any complex system to withstood multiple and repeated failures without some care is a difficult problem. Trying to solve it leads to more complex systems, needing more care. There is currently no nuclear plant without a large maintenance teams.

For the next canary/beta, I'd prefer to add workarounds to better handle these failures (and others):

With these changes, users having a successful backup will be sure that the database will rebuild - and if a warning appear, they will know that they need to do some repairing right now (not learn of it when the system disk fails), and do a database repair, without this repair taking weeks. They could of course have missing data - but how to expect otherwise if the system is crashing (for reasons unrelated to the backup software) during a backup ?

So I will leave this issue for now. It will be handled later, at least when Duplicati gets out of beta :-)

gpatel-fr commented 1 year ago

@ts678 Don't take my previous post to meaning that you are wrong to search for the root cause of backends being badly damaged to the point of needing to scan the whole of it to repair it. It's just that this particular example does not demonstrate it, so it's not a priority to me for next version.

ts678 commented 1 year ago

Why is the extra index not detected as an extra file to require repair?

It was in my run last night, but I think this time the fatal error suppressed the script message output. The backup also isn't as hyper-sensitive to stop the test when anything unexpected comes out. Perhaps it could be, but false stops can be a nuisance.

If the script sees ErrorID: ExtraRemoteFiles it just goes to repair without detailing exact error messages seen. Log shows:

2023-10-04 21:39:15 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip
2023-10-04 21:39:15 -04 - [Error-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraRemoteFiles]: Found 1 remote files that are not recorded in local storage, please run repair
2023-10-04 21:39:15 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Operation.BackupHandler-PreBackupVerify]: PreBackupVerify took 0:00:00:00.070
2023-10-04 21:39:15 -04 - [Error-Duplicati.Library.Main.Operation.BackupHandler-FatalError]: Fatal error
Duplicati.Library.Interface.RemoteListVerificationException: Found 1 remote files that are not recorded in local storage, please run repair

In output to terminal, one can see a 100 exit code and the repair, and the extra count up by 1 the next time stats come out.

Running backup at 2023-10-04 21:39:05.267049
Timed out after 3 seconds
Statistics: started: 31 timeout: 12 missing: 0 extra: 1 marked: 0

Running backup at 2023-10-04 21:39:13.419801
backup error code 100
Running repair at 2023-10-04 21:39:16.599331
Repair error code 0
backup terminated successfully but database is inconsistent.
duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip
Running test recreate at 2023-10-04 21:39:20.806282
Repair error code 0
Statistics: started: 32 timeout: 12 missing: 0 extra: 2 marked: 0

The version run here is tweaked to give a report even if repair has 0 exit code, so the message grammar looks a little strange. The inconsistency report on dindex file is courtesy of the database check being done (which my simpler original script lacked).

So something's gone wrong, but it's not captured in depth right here beyond what can infer from the messages being made.

Matching immediately earlier terminal output against a CommitAfterUpload non-finish (one sees next backup starting) gets:

  for_kill_test_1.log:115633:2023-10-04 21:38:59 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Operation.Common.DatabaseCommon-CommitTransactionAsync]: Starting - CommitAfterUpload
> for_kill_test_1.log:115634:2023-10-04 21:39:06 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
Running backup at 2023-10-04 21:38:56.090424
Timed out after 4 seconds
Statistics: started: 30 timeout: 11 missing: 0 extra: 1 marked: 0

Running backup at 2023-10-04 21:39:05.267049
Timed out after 3 seconds
Statistics: started: 31 timeout: 12 missing: 0 extra: 1 marked: 0

As the data files (dblock) are big compared to the index files, when a crash occurs, the index files are safely on the backend, but not the data files.

I'm not seeing this. however some things are still murky. Here's the rest of the terminal output after the first snipped posted:

Running backup at 2023-10-04 21:39:45.273145
backup terminated successfully but database is inconsistent.
duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip
Running test recreate at 2023-10-04 21:39:55.555880
Repair error code 0
Statistics: started: 33 timeout: 12 missing: 0 extra: 2 marked: 0

Running backup at 2023-10-04 21:40:19.231601
backup terminated successfully but database is inconsistent.
duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip
Running test recreate at 2023-10-04 21:40:35.589630
Repair error code 0
Remote file referenced as duplicati-bdd08553ca5e54bee9c8af4fd574639ca.dblock.zip by duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip, but not found in list, registering a missing remote file
Found 1 missing volumes; attempting to replace blocks from existing volumes
Found 1 missing volumes; attempting to replace blocks from existing volumes
error running repair / recreate, exiting...
Statistics: started: 34 timeout: 12 missing: 0 extra: 2 marked: 0

and the nature of the recreate message is that the dindex file is present, but somehow missing the dblock that it is referencing.

BackendEvent.*(duplicati-bdd08553ca5e54bee9c8af4fd574639ca.dblock.zip|duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip)|StartingOperation|Compacting

search filter in glogg log view on a profiling log where I also put the repair output (not sure if it should be in a separate log) is

2023-10-04 21:38:57 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2023-10-04 21:38:59 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-bdd08553ca5e54bee9c8af4fd574639ca.dblock.zip (4.91 MB)
2023-10-04 21:38:59 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-bdd08553ca5e54bee9c8af4fd574639ca.dblock.zip (4.91 MB)
2023-10-04 21:38:59 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip (18.09 KB)
2023-10-04 21:38:59 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip (18.09 KB)
2023-10-04 21:39:06 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2023-10-04 21:39:14 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2023-10-04 21:39:17 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Repair has started
2023-10-04 21:39:19 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip ()
2023-10-04 21:39:19 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip (18.09 KB)
2023-10-04 21:39:22 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Repair has started
2023-10-04 21:39:35 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip (18.09 KB)
2023-10-04 21:39:35 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip (18.09 KB)
2023-10-04 21:39:46 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2023-10-04 21:39:53 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-10-04 21:39:56 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Repair has started
2023-10-04 21:40:09 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip (18.09 KB)
2023-10-04 21:40:09 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip (18.09 KB)
2023-10-04 21:40:20 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2023-10-04 21:40:24 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 28.53% wasted space and the limit is 25%
2023-10-04 21:40:28 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-bdd08553ca5e54bee9c8af4fd574639ca.dblock.zip (4.91 MB)
2023-10-04 21:40:28 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-bdd08553ca5e54bee9c8af4fd574639ca.dblock.zip (4.91 MB)
2023-10-04 21:40:30 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-bdd08553ca5e54bee9c8af4fd574639ca.dblock.zip (4.91 MB)
2023-10-04 21:40:30 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-bdd08553ca5e54bee9c8af4fd574639ca.dblock.zip (4.91 MB)
2023-10-04 21:40:36 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Repair has started
2023-10-04 21:40:47 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip (18.09 KB)
2023-10-04 21:40:47 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-ifbb64758acf14f1ea4dda02a215fafa7.dindex.zip (18.09 KB)

which seems to very well show that both the dblock and its dindex file were there until the dblock got deleted in a compact that didn't delete its dindex at the same time, probably because IndexBlockLink lacked the link. I don't have midway-database details.

it's not a priority to me for next version.

There are more than enough things to fix, only one you, and several others with perhaps more time to see what they can set up. Generally it's a progressive learning process where someone gets a hint of a possible problem, then tries to get a good test case. Here I tried to set up a database similar to what your database checker suggested, but haven't found the right recipe for ending.

So that's where it sits at this moment. Please feel free to work on your priorities, and maybe someone else can get more on this. Even if it gets figured out, it's not a disaster, just kind of painful and tough to clean up for a usual not-deeply-technical end user.

We have sort of an improvement pipeline running, where issues are at different stages of progression, on their way to a commit. Shipping the most important and least potentially risky (also meaning very well understood) commits faster makes lots of sense, however a strict priority order risks starving progress on low-risk changes that might lead to better customer experience overall. Developer experience also gets better if pull requests get in, and keeping developers here matters. We'll find a balance someday.

ts678 commented 1 year ago

Thanks for the test case. It's good to know there are other ways to cause these. I tested more, using a better instrumented script. I modified a timeout.py to copy DB after "inconsistent" is found, then stop after the test recreate. Restart after a look removed --create option and added --interrupt-interval=1000000 to give a smoother ride to expected test recreate failure.

Seemingly what happened is a dblock was uploaded, its dindex was uploaded, but timeout killed before CommitAfterUpload end. Another backup ran, noticed a dblock without a dindex, and uploaded one -- with a different random name, so now both are around, however Duplicati only knows one is attached to the dblock. No problem for awhile. Test recreate picks up both. Problem is when compact deletes dblock and only dindex it knows about, it leaves first dindex without any associated dblock, so recreate errors. This one looks a bit different from others, as the dblock delete deleted one of its two dindex files.

Running backup at 2023-10-06 11:14:21.483832
Timed out after 4 seconds
Statistics: started: 26 timeout: 8 missing: 0 extra: 1 marked: 0

Running backup at 2023-10-06 11:14:30.656253
backup terminated successfully but database is inconsistent.
duplicati-id27afbb111aa4677a6592fceb58b363b.dindex.zip
Running test recreate at 2023-10-06 11:14:39.931609
test db recreate: error code 0
Statistics: started: 27 timeout: 8 missing: 0 extra: 1 marked: 0

Running backup at 2023-10-06 19:16:14.439978
backup terminated successfully but database is inconsistent.
duplicati-id27afbb111aa4677a6592fceb58b363b.dindex.zip
Running test recreate at 2023-10-06 19:16:28.799711
test db recreate: error code 0
===========> Remote file referenced as duplicati-be725ea9f1f7f4c74b7cab148edc22bec.dblock.zip by duplicati-id27afbb111aa4677a6592fceb58b363b.dindex.zip, but not found in list, registering a missing remote file
===========> Found 1 missing volumes; attempting to replace blocks from existing volumes
===========> Found 1 missing volumes; attempting to replace blocks from existing volumes
Statistics: started: 6 timeout: 0 missing: 0 extra: 0 marked: 0

Excerpts:
2023-10-06 11:14:24 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-be725ea9f1f7f4c74b7cab148edc22bec.dblock.zip (4.91 MB)
2023-10-06 11:14:24 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-be725ea9f1f7f4c74b7cab148edc22bec.dblock.zip (4.91 MB)
2023-10-06 11:14:25 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-id27afbb111aa4677a6592fceb58b363b.dindex.zip (18.09 KB)
2023-10-06 11:14:25 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-id27afbb111aa4677a6592fceb58b363b.dindex.zip (18.09 KB)
Adjacent:
2023-10-06 11:14:25 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Operation.Common.DatabaseCommon-CommitTransactionAsync]: Starting - CommitAfterUpload
2023-10-06 11:14:31 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
Excerpts:
2023-10-06 11:14:34 -04 - [Information-Duplicati.Library.Main.Operation.Backup.RecreateMissingIndexFiles-RecreateMissingIndexFile]: Re-creating missing index file for duplicati-be725ea9f1f7f4c74b7cab148edc22bec.dblock.zip
2023-10-06 11:14:34 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-i4a0c052890fa4efcbba2fa745ec6fa75.dindex.zip (18.09 KB)
2023-10-06 11:14:34 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-i4a0c052890fa4efcbba2fa745ec6fa75.dindex.zip (18.09 KB)
2023-10-06 19:16:18 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 27.00% wasted space and the limit is 25%
2023-10-06 19:16:19 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-be725ea9f1f7f4c74b7cab148edc22bec.dblock.zip (4.91 MB)
2023-10-06 19:16:20 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-be725ea9f1f7f4c74b7cab148edc22bec.dblock.zip (4.91 MB)
2023-10-06 19:16:21 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-be725ea9f1f7f4c74b7cab148edc22bec.dblock.zip (4.91 MB)
2023-10-06 19:16:21 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-be725ea9f1f7f4c74b7cab148edc22bec.dblock.zip (4.91 MB)
2023-10-06 19:16:21 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-i4a0c052890fa4efcbba2fa745ec6fa75.dindex.zip (18.09 KB)
2023-10-06 19:16:21 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-i4a0c052890fa4efcbba2fa745ec6fa75.dindex.zip (18.09 KB)

Second test tried to get rid of the manual stop and restart, just letting the timeouts continue until the unwanted recreate error. Messiness ensued, and I'll hold off that writeup for now. Next script tweak and third test disables the timeout on a DB problem, also copying the database. It also modifies the consistency test to detect a dblock where IndexBlockLink has -1 IndexVolumeID.

Duplicati ordinarily recreates missing dindex files, but it doesn't seem to look carefully enough for a valid (not -1) ID value here:

https://github.com/duplicati/duplicati/blob/706c64a89954d70d6a0977228cbe150ba6f0d745/Duplicati/Library/Main/Database/LocalBackupDatabase.cs#L807-L813

Anyway, run looks like it uploaded dblock, had its dindex clipped by timeout in CommitAfterUpload, isn't seen in IndexBlockLink, but this isn't a problem until the compact deletes the dblock but not dindex, then test recreate has a dindex without its dblock.

I now have repair/recreate profiling log in a different file so that it doesn't clutter up backup profiling log, which is filtered below:

2023-10-07 17:50:30 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip (4.91 MB)
2023-10-07 17:50:30 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip (4.91 MB)
2023-10-07 17:50:31 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip (18.09 KB)
2023-10-07 17:50:31 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip (18.09 KB)
2023-10-07 17:50:51 -04 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-SchedulingMissingFileForDelete]: scheduling missing file for deletion, currently listed as Uploading: duplicati-20231007T215024Z.dlist.zip
2023-10-07 17:50:51 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip
2023-10-07 17:50:51 -04 - [Error-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraRemoteFiles]: Found 1 remote files that are not recorded in local storage, please run repair
2023-10-07 17:51:35 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2023-10-07 17:52:16 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 26.87% wasted space and the limit is 25%
2023-10-07 17:52:23 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip (4.91 MB)
2023-10-07 17:52:23 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip (4.91 MB)
2023-10-07 17:52:24 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip (4.91 MB)
2023-10-07 17:52:24 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip (4.91 MB)

Repair profiling log has:

2023-10-07 17:50:55 -04 - [Information-Duplicati.Library.Main.Operation.RepairHandler-AcceptNewIndexFile]: Accepting new index file duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip
2023-10-07 17:50:55 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (24, "duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip", "Index", "Uploading", 18529, 0, 0); SELECT last_insert_rowid();
2023-10-07 17:50:55 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (24, "duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip", "Index", "Uploading", 18529, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.003

Although Duplicati doesn't log IndexBlockLink work, accepting this dindex file didn't go as far as connecting it up to its dblock.

Running backup at 2023-10-07 17:50:49.124132
backup: error code 100
Running repair at 2023-10-07 17:50:52.366612
repair: error code 0
backup terminated successfully but database is inconsistent.
duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip
Running test recreate at 2023-10-07 17:50:56.604028
test db recreate: error code 0
Statistics: started: 23 timeout: 10 missing: 0 extra: 1 marked: 0

Running backup at 2023-10-07 17:51:22.065450
backup terminated successfully but database is inconsistent.
duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip
Running test recreate at 2023-10-07 17:51:41.628169
test db recreate: error code 0
Statistics: started: 24 timeout: 10 missing: 0 extra: 1 marked: 0

Running backup at 2023-10-07 17:52:07.212061
Running test recreate at 2023-10-07 17:52:31.434965
test db recreate: error code 0
===========> Remote file referenced as duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip by duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip, but not found in list, registering a missing remote file===========> Found 1 missing volumes; attempting to replace blocks from existing volumes
===========> Found 1 missing volumes; attempting to replace blocks from existing volumes
error running test db recreate, exiting...
Statistics: started: 25 timeout: 10 missing: 0 extra: 1 marked: 0

BackendEvent.*(duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip|duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip)|Missing|Extra|Compacting

was backup log filter, derived from the error message, then see what timeout cut short, and if there are any other files involved.

EDIT 1:

While adding the enhanced IndexBlockLink dblock check, I inadvertently left out a similar check for dindex, but one was enough.

Due to the limited database snapshots, I don't have a super early one, but I have one before and after a repair. Before shows duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip absent from Remotevolume, and possibly it was always absent. After shows duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip as Remotevolume ID 104. Recreate error says that this dindex references duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip and that's ID 103. It did not have an IndexBlockLink entry before repair. After repair it's a new IndexBlockLink row, paired with IndexVolumeID of -1 which isn't going to be very useful to find dindex.

Profiling log for repair shows suspicious sequence where it looks in Remotevolume for its ID, then later gets into Remotevolume:

2023-10-07 17:50:55 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip"
2023-10-07 17:50:55 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip" took 0:00:00:00.000
2023-10-07 17:50:55 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip"
2023-10-07 17:50:55 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-b6ce5d31cff0141c78802f638118bcb62.dblock.zip" took 0:00:00:00.000
2023-10-07 17:50:55 -04 - [Information-Duplicati.Library.Main.Operation.RepairHandler-AcceptNewIndexFile]: Accepting new index file duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip
2023-10-07 17:50:55 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (24, "duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip", "Index", "Uploading", 18529, 0, 0); SELECT last_insert_rowid();
2023-10-07 17:50:55 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (24, "duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip", "Index", "Uploading", 18529, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.003

https://github.com/duplicati/duplicati/blob/706c64a89954d70d6a0977228cbe150ba6f0d745/Duplicati/Library/Main/Operation/RepairHandler.cs#L233-L241

The -1 might be the default value when GetRemoteVolumeID is given a filename that it can't find. That then gets in IndexVolumeID.

Jojo-1000 commented 1 year ago

Seemingly what happened is a dblock was uploaded, its dindex was uploaded, but timeout killed before CommitAfterUpload end. Another backup ran, noticed a dblock without a dindex, and uploaded one -- with a different random name, so now both are around, however Duplicati only knows one is attached to the dblock.

The way I see it, the pre backup verify should catch this as unknown extra files on the remote and ask the user to repair. Is there not a similar check for block and list files? The repair should delete the extra index, and no problems can occur at a later stage. I would say it is not preventable with the current database transactions that repairs sometimes become necessary after interrupted backups.

Everything else would require more complex logic to resolve this (e.g. check for extra indexes explicitly and try to re-establish the link).

ts678 commented 1 year ago

Backup log shows the detection:

2023-10-07 17:50:51 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip
2023-10-07 17:50:51 -04 - [Error-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraRemoteFiles]: Found 1 remote files that are not recorded in local storage, please run repair

Repair scripted run puts file back in Remotevolume but not IndexBlockLink. It matched it to its dblock but didn't get dindex ID:

2023-10-07 17:50:55 -04 - [Information-Duplicati.Library.Main.Operation.RepairHandler-AcceptNewIndexFile]: Accepting new index file duplicati-ib726af21d5cd4e4f9eee8d3b01ae706e.dindex.zip

Meanwhile I'm trying to figure out how it got all the way through the upload without a Remotevolume entry, or was it deleted? Rigged a test with --throttle-upload=1KB for slow dblock uploads that I Control-C. Those were Uploading in Remotevolume. The temporary transaction wrapper logic has me a little worried, because I think it means a commit might not commit just then.

https://github.com/duplicati/duplicati/blob/706c64a89954d70d6a0977228cbe150ba6f0d745/Duplicati/Library/Main/Database/LocalDatabase.cs#L542-L563

ts678 commented 1 year ago

https://github.com/duplicati/duplicati/blob/706c64a89954d70d6a0977228cbe150ba6f0d745/Duplicati/Library/Main/Enums.cs#L54-L90

seems like it should give some info to RemoteListAnalysis, but then that'd have to do the right thing. Anyway, where's my info?

Jojo-1000 commented 1 year ago

I am stepping through the test case in debug mode, and it does look like the index file is in the database with state Uploading (at least with my kind of interruption by exception, probably an external kill is different in that regard). That is why there is no error about a missing file for me. It is then put into state Verified when the next backup finishes normally, but another index file is created anyways. Probably the link should already be put when the index is inserted as uploading. If the upload is not finished by the next run, the normal RemoveRemoteVolumes function is called which will remove that link again.

https://github.com/duplicati/duplicati/blob/706c64a89954d70d6a0977228cbe150ba6f0d745/Duplicati/Library/Main/Operation/Backup/BackendUploader.cs#L274-L287

I moved the index block link from after index upload to after block upload (line 284 to after 280). After that, no extra file is created after the interrupted run in the test.

That leaves the other case (that you have): "Accepting new index file" needs to check which block file the index refers to and add that link as well. From the code, it looks like it is supposed to do that:

https://github.com/duplicati/duplicati/blob/706c64a89954d70d6a0977228cbe150ba6f0d745/Duplicati/Library/Main/Operation/RepairHandler.cs#L233-L241

I am going to add a test to check that this works as intended.

EDIT: It seems there was a bug in that code, which put ID -1 in the link table (because the volume is not in the database yet). I fixed both bugs and added an extra test for repair.

ts678 commented 1 year ago

probably an external kill is different in that regard

If it's relevant, my impression is that Duplicati doesn't catch things like SIGTERM (Linux). I'm even less clear on Windows behavior, except that Control-C tends to kill the foreground process and leave Duplicati's actual work child still running.

os.environ['AUTOUPDATER_Duplicati_SKIP_UPDATE'] = 'true';

prevents that, speeds things up, maybe hammers the update server less, and won't accidentally start installed updates.

I suppose we could look to see if there's a hot journal laying around that might do a transaction rollback when we look. Something like that would look fine stepping through things, but hard kill rollback would remove the work that had been done.

I'm tempted to have the script look around and maybe collect database and any journals after every timeout, to aid the search.

I have to say it'd be nice if there was good documentation on the transaction design, to avoid having to read the code as needed to see if perhaps the maybe-commit didn't really commit in my case. But then why do dblocks get to Uploading?

It seems there was a bug in that code, which put ID -1 in the link table (because the volume is not in the database yet).

That sounds like the one I described earlier, in a post ending with

The -1 might be the default value when GetRemoteVolumeID is given a filename that it can't find. That then gets in IndexVolumeID.

similar to a read-before-write error. If that's what you fixed, thanks for it, however make sure you like the dindex checking.

ts678 commented 1 year ago
This one wasn't anything special except that it's a deeper look.

Two differences are I check both dblock and dindex, and grab the
database and any database -journal whenever a timeout gets done.

It's also clear that a sensitive spot was at a 4 second timeout.
Other systems will differ. An optimal timeout plan is still TBD.

Running backup at 2023-10-08 17:36:30.138230
Timed out after 4 seconds
Statistics: started: 21 timeout: 6 missing: 0 extra: 0 marked: 0

  for_kill_test_1.log:34179:2023-10-08 17:36:33 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Operation.Common.DatabaseCommon-CommitTransactionAsync]: Starting - CommitAfterUpload
> for_kill_test_1.log:34180:2023-10-08 17:36:40 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started

2023-10-08 17:36:32 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (20, "duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip", "Blocks", "Temporary", -1, 0, 0); SELECT last_insert_rowid();
2023-10-08 17:36:32 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (20, "duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip", "Blocks", "Temporary", -1, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.000
2023-10-08 17:36:33 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip (4.91 MB)
2023-10-08 17:36:33 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip (4.91 MB)
2023-10-08 17:36:33 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (20, "duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid();
2023-10-08 17:36:33 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (20, "duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.000
2023-10-08 17:36:33 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip (18.09 KB)
2023-10-08 17:36:33 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip (18.09 KB)
2023-10-08 17:36:41 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip

I have database and journal from 17:36:34. Neither shows dindex,
even looking with a hex editor which shows many other filenames.
In a browser, applying journal or not applying appears the same.
I suspect journal is cold, but header doesn't exactly match doc:
https://www.sqlite.org/fileformat.html#the_rollback_journal

duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip shows Uploaded

Running backup at 2023-10-08 17:36:39.426748
backup: error code 100
Running repair at 2023-10-08 17:36:42.627191
repair: error code 0
backup terminated successfully but database is inconsistent.
duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip
duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip
Running test recreate at 2023-10-08 17:36:45.866666
test db recreate: error code 0
Statistics: started: 22 timeout: 6 missing: 0 extra: 1 marked: 0

From backup log:
2023-10-08 17:36:41 -04 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-SchedulingMissingFileForDelete]: scheduling missing file for deletion, currently listed as Uploading: duplicati-20231008T213631Z.dlist.zip
2023-10-08 17:36:41 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip
2023-10-08 17:36:41 -04 - [Error-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraRemoteFiles]: Found 1 remote files that are not recorded in local storage, please run repair

From repair log:
2023-10-08 17:36:45 -04 - [Information-Duplicati.Library.Main.Operation.RepairHandler-AcceptNewIndexFile]: Accepting new index file duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip

I have database from before and after the repair.
New duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip is ID 125.
Old duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip is ID 124.
IndexBlockLink associates dblock 124 with dindex -1.
IndexBlockLink associates dindex 125 with nothing.
So repair broke it both ways, and below I show my test for both ways:

Running backup at 2023-10-08 17:37:11.430993
backup terminated successfully but database is inconsistent.
duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip
duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip
Running test recreate at 2023-10-08 17:37:19.719943
test db recreate: error code 0
Statistics: started: 23 timeout: 6 missing: 0 extra: 1 marked: 0

(omit four like that)

Running backup at 2023-10-08 17:39:59.715255
backup terminated successfully but database is inconsistent.
duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip
Running test recreate at 2023-10-08 17:40:15.054063
test db recreate: error code 0
===========> Remote file referenced as duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip by duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip, but not found in list, registering a missing remote file
===========> Found 1 missing volumes; attempting to replace blocks from existing volumes
===========> Found 1 missing volumes; attempting to replace blocks from existing volumes
error running test db recreate, exiting...
Statistics: started: 28 timeout: 6 missing: 0 extra: 1 marked: 0

From backup log:
2023-10-08 17:40:05 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 25.55% wasted space and the limit is 25%
2023-10-08 17:40:07 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip (4.91 MB)
2023-10-08 17:40:07 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip (4.91 MB)
2023-10-08 17:40:08 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip (4.91 MB)
2023-10-08 17:40:08 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip (4.91 MB)

From repair log:
2023-10-08 17:40:27 -04 - [Error-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-MissingFileDetected]: Remote file referenced as duplicati-b5aa184afac73458ea4f1ab799200d70b.dblock.zip by duplicati-i410010182cbe4d27875b6be1f43176b6.dindex.zip, but not found in list, registering a missing remote file
2023-10-08 17:40:27 -04 - [Warning-Duplicati.Library.Main.Database.LocalRecreateDatabase-MissingVolumesDetected]: Found 1 missing volumes; attempting to replace blocks from existing volumes
2023-10-08 17:40:27 -04 - [Verbose-Duplicati.Library.Main.Database.LocalRecreateDatabase-ReplacedMissingVolumes]: Replaced blocks for 1 missing volumes; there are now 1 missing volumes
2023-10-08 17:40:30 -04 - [Warning-Duplicati.Library.Main.Database.LocalRecreateDatabase-MissingVolumesDetected]: Found 1 missing volumes; attempting to replace blocks from existing volumes
2023-10-08 17:40:30 -04 - [Verbose-Duplicati.Library.Main.Database.LocalRecreateDatabase-ReplacedMissingVolumes]: Replaced blocks for 1 missing volumes; there are now 1 missing volumes
Jojo-1000 commented 1 year ago

Except for the questions about the missing transaction, it looks like the problems are the same.

If you want to try my PR, I can trigger a build. In there I also added a check to see that the block volume in an index file exists, so other cases of this broken reference will already error with a repair instead of remaining undetected until a recreate.

ts678 commented 1 year ago

If you want to try my PR, I can trigger a build.

Might as well. Today I was just tweaking the timing based on the idea that 4 seconds is bad. 1 - 3 held up well. Still testing 5+. The goal would be to set sub-second timing variations to zoom in on the sensitive spot to see if more bug variations show up.

Jojo-1000 commented 1 year ago

There is the build: https://github.com/Jojo-1000/duplicati/actions/runs/6470994290

I would expect that no more "Remote file referenced as" errors occur. I'm not sure if it eliminates the extra index files with your lack of database entries (depends whether missing indexes are created before or after checking existing). But if extra indexes are created, they should not cause any problems.

ts678 commented 1 year ago

I would expect that no more "Remote file referenced as" errors occur.

Got this. Great!

I'm not sure if it eliminates the extra index files

Did not get this.

2023-10-10 11:38:09 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i28bae137e3fc43ef9e41b782c56979c5.dindex.zip
2023-10-10 11:42:40 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ifafc32a3f96f4dcba9f33cf3bd40dfa9.dindex.zip
2023-10-10 12:01:07 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-iaea9d0b259834ccf8fb24cec02c83b4d.dindex.zip
2023-10-10 12:10:49 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i7d32b66133764e3fa18da161554a9c27.dindex.zip
2023-10-10 12:22:22 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-b826f637d18ac4040b42c66900e60c774.dblock.zip
2023-10-10 12:30:54 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-baf87c3d354c447ffb961a8d52e7bc2e1.dblock.zip
2023-10-10 12:34:51 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-b0587bdacb78e4f46971a514e35f87f89.dblock.zip
2023-10-10 13:04:54 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-iec8a9884cead431b96a4d0818f2b5853.dindex.zip
2023-10-10 14:07:00 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i2fbfe468c99b40eba2393a23404becd1.dindex.zip
2023-10-10 14:34:19 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-b9ae58575346f42969e02c2779319b3c4.dblock.zip
2023-10-10 15:15:11 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-b9f5fd23e6e124baf84ae8ae9b90fb39f.dblock.zip
2023-10-10 16:09:47 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-b543106108c3f48cc8cbfa5f47c603ca7.dblock.zip
2023-10-10 16:13:05 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ba5e91d0859b146dcb6fc71ee64ab167f.dblock.zip
2023-10-10 16:14:15 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-bd7207d99c12a46daa41ccfc4db2fec54.dblock.zip
2023-10-10 16:18:09 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-beb032aa66ddf4d49bea916201bb7be6b.dblock.zip
2023-10-10 16:22:32 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-b8f5bc02205f841da9ecc0c39b25bbcb2.dblock.zip
2023-10-10 16:26:31 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i44d83f0dc47c46a4a3146fe70ee1270f.dindex.zip
2023-10-10 16:26:31 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i86039229df5a49b3a3cf2a4e3e60b6c4.dindex.zip
2023-10-10 16:45:10 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-be0ce1c8104ae45dbba339a3377e5a2af.dblock.zip
2023-10-10 16:48:29 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i3a824ac11c9b40bfb7ff3d41c9a530c5.dindex.zip
2023-10-10 17:31:10 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-b199bf69195644a34af3867e64e9349cd.dblock.zip
2023-10-10 17:33:24 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ba4272a703a2d43f49443bcf0ca989909.dblock.zip
2023-10-10 18:12:57 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-b074f89872f7d4b05a9f4cf62730f7b2f.dblock.zip
2023-10-10 18:43:16 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-bf88da6ef71044d8e96bb9aa7890c4c42.dblock.zip
2023-10-10 19:31:48 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i4f05e9031bba467eb7886980db3d5730.dindex.zip
2023-10-10 19:34:57 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-bbbb05e68cd334d599d76db3513ef8fc8.dblock.zip
2023-10-10 20:04:04 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip

I was hoping this would keep going all night, but it hit another error which I think I found in source but haven't studied much:

Running backup at 2023-10-10 20:02:44.376612
Running test recreate at 2023-10-10 20:03:07.075982
test db recreate: error code 0
Statistics: started: 1151 timeout: 456 missing: 0 extra: 25 marked: 0

Running backup at 2023-10-10 20:03:42.186494
Timed out after 4 seconds
Statistics: started: 1152 timeout: 457 missing: 0 extra: 25 marked: 0

Running backup at 2023-10-10 20:03:51.411113
Timed out after 5 seconds
Statistics: started: 1153 timeout: 458 missing: 0 extra: 25 marked: 0

Running backup at 2023-10-10 20:04:01.895068
backup: error code 100
Running repair at 2023-10-10 20:04:06.158596
repair: error code 0
===========> Failed to accept new index file: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip, message: Too many source blocklist entries in VUhVDl4dEzv0363fyzuMW+50b3TN2nj+YWwN3xZ2Hkc= => Too many source blocklist entries in VUhVDl4dEzv0363fyzuMW+50b3TN2nj+YWwN3xZ2Hkc=
error running repair, exiting...
Statistics: started: 1154 timeout: 458 missing: 0 extra: 26 marked: 0

https://github.com/duplicati/duplicati/blob/706c64a89954d70d6a0977228cbe150ba6f0d745/Duplicati/Library/Main/Database/LocalRepairDatabase.cs#L565-L577

EDIT 1 with some preliminary log mentions of the mentioned dindex, which unfortunately doesn't seem to be around any more:

From backup log:

2023-10-10 20:03:55 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (1181, "duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid();
2023-10-10 20:03:55 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (1181, "duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.000
2023-10-10 20:03:56 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip (18.09 KB)
2023-10-10 20:03:56 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip (18.09 KB)
2023-10-10 20:04:04 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip

From repair log:

2023-10-10 20:04:09 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip ()
2023-10-10 20:04:09 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip (18.09 KB)
2023-10-10 20:04:10 -04 - [Error-Duplicati.Library.Main.Operation.RepairHandler-FailedNewIndexFile]: Failed to accept new index file: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip, message: Too many source blocklist entries in VUhVDl4dEzv0363fyzuMW+50b3TN2nj+YWwN3xZ2Hkc=
2023-10-10 20:04:10 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (1185, "duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip", "Index", "Deleting", 18525, 0, 0); SELECT last_insert_rowid();
2023-10-10 20:04:10 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (1185, "duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip", "Index", "Deleting", 18525, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.001
2023-10-10 20:04:10 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip (18.09 KB)
2023-10-10 20:04:10 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip (18.09 KB)
2023-10-10 20:04:11 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: UPDATE "RemoteVolume" SET "DeleteGraceTime" = 1696989851 WHERE "Name" = "duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip" 
2023-10-10 20:04:12 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: UPDATE "RemoteVolume" SET "DeleteGraceTime" = 1696989851 WHERE "Name" = "duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip"  took 0:00:00:00.248
2023-10-10 20:04:12 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip"
2023-10-10 20:04:12 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-i075d92e3e50540cab1cc3ebe6212606a.dindex.zip" took 0:00:00:00.000

EDIT 2: The good news is the destination recreates nicely into a database, and test is OK except for typical Extra blocks issue.

ts678 commented 1 year ago

I ran an experiment overnight to see if DataBlockProcessor.cs uploads did differently than SpillCollectorProcess.cs uploads which happen at the end of a backup which is probably where the previous test was. Unlike with previous set of Extra files, dblock now seems exempt from this problem, possibly because they're well taken care of in terms of updating State and doing the Commit.

This test raises changes from 10% to 50%, adds upload throttle of 1MB/second to slow down, disables parallel uploads to drag work out, and lowers remote volume size to 1MB for more uploads for timeouts of uniform distribution from 10 to 15 seconds.

Although this test adds a bit of new information, I'm still not sure of the exact transaction design of a backup. I'm attempting combining looking at code with experimenting on seeing commits that really run (TemporaryTransactionWrapper) might not. CommitTransactionAsync also has a restart flag, and both of these variabilities require a pretty broad look to predict a result.

Running backup at 2023-10-12 10:19:09.613573
backup: error code 100
Running repair at 2023-10-12 10:19:19.212540
repair: error code 0
Running test recreate at 2023-10-12 10:19:29.809147
test db recreate: error code 0
Statistics: started: 412 timeout: 342 missing: 0 extra: 69 marked: 0

Traceback (most recent call last):
  File "C:\tmp\timeout.20231005\timeout.modified.py", line 256, in <module>
    time.sleep(5)
KeyboardInterrupt
^C
2023-10-11 21:05:17 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i3b8dad03de724958835362e2876c6c7d.dindex.zip
2023-10-11 21:10:48 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i97070360825442e8a7e8b7513f9d867d.dindex.zip
2023-10-11 21:13:15 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-iae14fdcb0fc64d708550c2a01e4496b6.dindex.zip
2023-10-11 21:21:20 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ic3b3cd1ffb4648caa67050a847541551.dindex.zip
2023-10-11 21:26:00 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i98f8c7ba8b8a4d7d9bf52c9ef5fbdec7.dindex.zip
2023-10-11 21:31:46 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ifdb8b765e98649cd8963967bed1e685f.dindex.zip
2023-10-11 21:39:46 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i5af4200f16e8430bae4353cc355f6402.dindex.zip
2023-10-11 21:44:38 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i4fec749948c24ae18cc28b49567040ce.dindex.zip
2023-10-11 21:49:54 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-if43b8ff3fe4843cfbcbcd24276964503.dindex.zip
2023-10-11 21:54:49 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ie4507a40f3cb45c7ad889f291146e576.dindex.zip
2023-10-11 21:59:34 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i4a04aa8287c94b8793d6a8f6af69cb49.dindex.zip
2023-10-11 22:04:35 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i9ddc0bc050564871abe6d302e6b7eabd.dindex.zip
2023-10-11 22:10:07 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-id683d0451ab3451aa2b8135b65b2fc35.dindex.zip
2023-10-11 22:15:10 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i6a72c7798cb44f748c417557de691b18.dindex.zip
2023-10-11 22:20:17 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i9bdc5ab19f5e4c21be18f564c8d32d05.dindex.zip
2023-10-11 22:27:09 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i2fbda8976e14419f938a77c7641f74dd.dindex.zip
2023-10-11 22:33:01 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-icd33b050b99341f183094f7a72ba0d9c.dindex.zip
2023-10-11 22:38:26 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ib67c9903b0a443f5bb3adb04f143037d.dindex.zip
2023-10-11 22:47:45 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ifb447ab99fbd434888b0e481503ce6a2.dindex.zip
2023-10-11 22:54:26 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i497e3368239f4bfe87c18bfc63e8f7e7.dindex.zip
2023-10-11 23:05:45 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i467b9cffe0ca4ef3b30f1add0bc29797.dindex.zip
2023-10-11 23:15:25 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i3d3cfc68b11a4d559ebe6add727bc84c.dindex.zip
2023-10-11 23:23:00 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ica4a8162df9c43e29ce604b2d5afdfb1.dindex.zip
2023-10-11 23:30:41 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i118256115c3e42c49b3cd891e3904334.dindex.zip
2023-10-11 23:38:34 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ie9866f0d39f84b6a81964274d824ba98.dindex.zip
2023-10-11 23:50:03 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i8dbbb973233d4c4eb06394ed7c2053be.dindex.zip
2023-10-11 23:59:27 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i4e860d22a88642b0b8895c648983a8dc.dindex.zip
2023-10-12 00:09:50 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i7d29fbcada694663b068c7bea5096720.dindex.zip
2023-10-12 00:20:51 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-id654c485dfb34c8c9c74710fc5dba1ae.dindex.zip
2023-10-12 00:34:17 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ibe3b48b4fa86401eb2423b839c7e11ae.dindex.zip
2023-10-12 00:47:07 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i33d84e0afb184cca9f9e1971b318d2fb.dindex.zip
2023-10-12 00:58:56 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i1e9686e68ba14be5980955df15894d1b.dindex.zip
2023-10-12 01:10:16 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i980ad0cc928e41c6986daa8329b4bc0c.dindex.zip
2023-10-12 01:20:14 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i03ae336f19fb49039e04a2053c57baa3.dindex.zip
2023-10-12 01:32:03 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i5bb2ec913fbf4ba29b75ad42fc401f44.dindex.zip
2023-10-12 01:43:16 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ib54bde53097f489bb27489e0e81801b3.dindex.zip
2023-10-12 01:54:34 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-iabcc681c2d944f3d8dbc060751c4bdb0.dindex.zip
2023-10-12 02:05:17 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ib638032558da4392bc680f5468f1c00b.dindex.zip
2023-10-12 02:16:51 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i85ca01fabe24437bbeeba8d830e671a3.dindex.zip
2023-10-12 02:27:33 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i5703096fd47f43f781c93dac8957987d.dindex.zip
2023-10-12 02:40:21 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-iaba35a4aae53493b976b22ccc4c1631d.dindex.zip
2023-10-12 02:51:29 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-id55817479b45433a955f7b46466479fa.dindex.zip
2023-10-12 03:04:23 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i111fe07a5b4e47018553c911419176f1.dindex.zip
2023-10-12 03:19:38 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-icebedf20c78d4ba5a805146287e965ba.dindex.zip
2023-10-12 03:37:45 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i5ac429863f924c33b49fb5a78fd0b689.dindex.zip
2023-10-12 03:57:05 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ibe469229a47e4713bdcba83f357195af.dindex.zip
2023-10-12 04:17:26 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-id49a8c60fd4547ad82fb0f850270cd45.dindex.zip
2023-10-12 04:34:01 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i960f5b7ffed544fcbe579926b5c7d1e6.dindex.zip
2023-10-12 04:50:44 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-if375de2de882473f999eea5dc60d3e05.dindex.zip
2023-10-12 05:08:32 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-iddd1a070c69a4a35bfd5624ee19787ca.dindex.zip
2023-10-12 05:21:04 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-iac2769f412e74dab9c4e93796f4d6ec5.dindex.zip
2023-10-12 05:33:25 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i0a62336cf37a4867bd230fecb0d8701f.dindex.zip
2023-10-12 05:46:51 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i8b5ad64a5a384647b46da2077b1bc2b7.dindex.zip
2023-10-12 05:58:43 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ied17af4566f14a2ca5d128adc55d0ad7.dindex.zip
2023-10-12 06:11:17 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ibce6f2dc41a445a1895625b4f5f66ccd.dindex.zip
2023-10-12 06:24:15 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i77a675a8db4b46bb9234fbe5ad28d906.dindex.zip
2023-10-12 06:38:01 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i1516bea086524f6ebe0dc6e6f529976e.dindex.zip
2023-10-12 06:51:08 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i4d2861c7becb44c1930cc94eb8ca781d.dindex.zip
2023-10-12 07:05:04 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ifbc6d959b7ce418da8795eda22be823e.dindex.zip
2023-10-12 07:22:58 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i1336015f69774d828df99465744fd572.dindex.zip
2023-10-12 07:37:55 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ie7f0ef5740a54a17a799f3373538e06d.dindex.zip
2023-10-12 08:00:15 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i952c3623515546db999526395aadaa4b.dindex.zip
2023-10-12 08:21:14 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i72da4ef2c13548e7bf481514cab3c7c1.dindex.zip
2023-10-12 08:38:44 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i7065448a1f7d4a3bae76ad31f57b0621.dindex.zip
2023-10-12 08:57:01 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i4d6ac4ac8cbc49faba034e3b606689a8.dindex.zip
2023-10-12 09:16:22 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ib45c544b042c45718b879fc83f3e43a3.dindex.zip
2023-10-12 09:40:02 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i797954781be9470f8f7be8c642d8073a.dindex.zip
2023-10-12 10:00:25 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-if444b84f73a345f098104a2a3168e6e9.dindex.zip
2023-10-12 10:18:59 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ic5256caa4988495f935d5875c2fd97db.dindex.zip
2023-10-12 10:19:16 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-ic5256caa4988495f935d5875c2fd97db.dindex.zip
duplicatibot commented 1 year ago

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/database-recreation-not-really-starting/16948/87

duplicatibot commented 1 month ago

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/backup-corrupted/19143/5