Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.86k stars 451 forks source link

Upgrader causes database errors #8162

Closed qstokkink closed 2 months ago

qstokkink commented 2 months ago

Our upgrade script uses iterdump to get the scheme and content of an existing database.

https://github.com/Tribler/tribler/blob/b8b2563898194b601ddc9d05f92da41a009e8168/src/tribler/upgrade_script.py#L104

Then, if an existing database is found, the lines of the dump are sequentially executed to inject the old data into the new database:

https://github.com/Tribler/tribler/blob/b8b2563898194b601ddc9d05f92da41a009e8168/src/tribler/upgrade_script.py#L110-L114

This does exactly that. However, the content that is included in this dump also includes ids, for example:

INSERT INTO "Peer" VALUES(1,X'...','2024-01-05 13:09:23.894782');
INSERT INTO "Peer" VALUES(2,X'...','2024-01-05 13:11:05.888214');
INSERT INTO "Peer" VALUES(3,X'...','2024-01-05 13:11:05.901082');
INSERT INTO "Peer" VALUES(4,X'...','2024-01-05 13:11:10.887803');

This is where the approach becomes less fine: if an id already exists, this imported data gets dropped. However, it gets worse: if another entry that points to this id does get imported in another table, it may point to a completely different object with the same id. I believe this causes errors like #8111.


In short, I think we need to change our approach and reconstruct objects from the existing data. Instead of copying the data, we would create actual new objects (in the orm) from the old data. The downside is that this is much slower than our blind iterdump approach.

qstokkink commented 2 months ago

Reconstruction SQL:

StatementOP (tribler.db)

SELECT SubjectResource.name, SubjectResource.type, ObjectResource.name, ObjectResource.type, Statement.added_count, Statement.removed_count, Statement.local_operation, Peer.public_key, Peer.added_at, StatementOp.operation, StatementOp.clock, StatementOp.signature, StatementOp.updated_at, StatementOp.auto_generated
FROM StatementOp
INNER JOIN Peer ON StatementOp.peer=Peer.id
INNER JOIN Statement ON StatementOp.statement=Statement.id
INNER JOIN Resource AS SubjectResource ON Statement.subject=SubjectResource.id
INNER JOIN Resource AS ObjectResource ON Statement.object=ObjectResource.id
;

ChannelNode (metadata.db)

SELECT ChannelNode.infohash, ChannelNode.size, ChannelNode.torrent_date, ChannelNode.tracker_info, ChannelNode.title, ChannelNode.tags, ChannelNode.metadata_type, ChannelNode.reserved_flags, ChannelNode.origin_id, ChannelNode.public_key, ChannelNode.id_, ChannelNode.timestamp, ChannelNode.signature, ChannelNode.added_on, ChannelNode.status, ChannelNode.xxx, ChannelNode.tag_processor_version, TorrentState.seeders, TorrentState.leechers, TorrentState.last_check, TorrentState.self_checked, TorrentState.has_data
FROM ChannelNode
INNER JOIN TorrentState ON ChannelNode.health=TorrentState.rowid
;

TrackerState (metadata.db)

Simple reinsert possible (the url field is UNIQUE so duplicates are automatically rejected).

TorrentState_TrackerState (metadata.db)

SELECT TorrentState.infohash, TrackerState.url
FROM TorrentState_TrackerState
INNER JOIN TorrentState ON TorrentState_TrackerState.torrentstate=TorrentState.rowid
INNER JOIN TrackerState ON TorrentState_TrackerState.trackerstate=TrackerState.rowid
;