funmaker / Hybooru

Hydrus-based booru-styled imageboard in React
https://booru.funmaker.moe/
MIT License
100 stars 17 forks source link

Unable to initialize DB: duplicate key value violates unique constraint "tag_postids_tagid_idx" #39

Closed Xunnamius closed 8 months ago

Xunnamius commented 10 months ago

Hi there. First: thank you for your work on this project.

Unfortunately, I've been encountering an error when rebuilding the database this past week. Log output is below. Might be a consequence of excessive tag sibling/parent loops.

I addressed it by adding ON CONFLICT DO NOTHING to applyTagSiblings, which matches all of the other INSERT INTO ... SELECT ... FROM ... statements in tag-related functions in that file.

In a couple days, if this turns out to be a sufficient fix, I can make a PR if you'd like.

Log output ``` Posts (153553/153553) [####################] Done in 3.24s Posts (3299/3299) [####################] Done in 45ms Posts (2470/2470) [####################] Done in 61ms Posts (628/628) [####################] Done in 25ms Urls (601716/601716) [####################] Done in 2.02s Notes [####################] Done in 0ms Tags (39961003/39961003) [####################] Done in 2:04 Mappings (2031885282/2031885282)[####################] Done in 53:36 Mappings (1205433/1205433) [####################] Done in 19.84s Mappings (45176/45176) [####################] Done in 253ms Tag parents (29235/29235) [####################] Done in 156ms Tag siblings (302637/302637) [####################] Done in 1.59s Importing options... [####################] Done in 12ms Resolving file relations... [####################] Done in 103ms Normalizing tags (6/6) [####################] Done in 46:12:14 Warning! Detected 71 loops in tag siblings! Warning! Detected 46 loops in tag parents! Resolving tag parents... [####################] Done in 36:40:41 Indexing (24/24) [####################] Done in 3:25:06 Resolving tag siblings... [--------------------] Unable to initialize DB! error: duplicate key value violates unique constraint "tag_postids_tagid_idx" at Parser.parseErrorMessage (/repos/hydribooru/node_modules/pg-protocol/src/parser.ts:369:69) at Parser.handlePacket (/repos/hydribooru/node_modules/pg-protocol/src/parser.ts:188:21) at Parser.parse (/repos/hydribooru/node_modules/pg-protocol/src/parser.ts:103:30) at Socket. (/repos/hydribooru/node_modules/pg-protocol/src/index.ts:7:48) at Socket.emit (node:events:514:28) at addChunk (node:internal/streams/readable:376:12) at readableAddChunk (node:internal/streams/readable:349:9) at Socket.Readable.push (node:internal/streams/readable:286:10) at Pipe.onStreamRead (node:internal/stream_base_commons:190:23) { length: 215, severity: 'ERROR', code: '23505', detail: 'Key (tagid)=(X) already exists.', hint: undefined, position: undefined, internalPosition: undefined, internalQuery: undefined, where: undefined, schema: 'public', table: 'tag_postids', column: undefined, dataType: undefined, constraint: 'tag_postids_tagid_idx', file: 'nbtinsert.c', line: '663', routine: '_bt_check_unique' } ```
Xunnamius commented 10 months ago

Update: said fix solved the issue and the database now rebuilds successfully. Maybe I try my hand at speeding up the rebuild process next :)

Log output HyBooru started on port localhost:8080 Environment: production. Press Ctrl-C to terminate. Database update detected, rebuilding... Rebuilding Database! Posts (153553/153553) [####################] Done in 3.97s Posts (3299/3299) [####################] Done in 56ms Posts (2470/2470) [####################] Done in 72ms Posts (628/628) [####################] Done in 38ms Urls (601716/601716) [####################] Done in 2.55s Notes [####################] Done in 0ms Tags (39961003/39961003) [####################] Done in 2:26 Mappings (2031885282/2031885282)[####################] Done in 1:03:51 Mappings (1205433/1205433) [####################] Done in 27.63s Mappings (45176/45176) [####################] Done in 311ms Tag parents (29235/29235) [####################] Done in 166ms Tag siblings (302637/302637) [####################] Done in 1.53s Importing options... [####################] Done in 21ms Resolving file relations... [####################] Done in 206ms Normalizing tags (6/6) [####################] Done in 52:16:48 Warning! Detected 71 loops in tag siblings! Warning! Detected 46 loops in tag parents! Resolving tag parents... [####################] Done in 35:06:26 Indexing (24/24) [####################] Done in 3:08:08 Resolving tag siblings... [####################] Done in 1.24s Counting tags usage... [####################] Done in 28.26s Calculating statistics... [####################] Done in 9:18 Finalizing... [####################] Done in 357ms Database rebuild completed in 91:57:05
funmaker commented 10 months ago

Hi, thanks for your investigation.

However I can't accept this fix for now. This shouldn't happen and this fix only works around some other underlying issue. What applyTagSiblings does is it copies post lookup tables from better/preferred siblings to other siblings so they can be used for searching. The other siblings shouldn't have any lookup tables by this point because normalizeTagRelations is supposed to move all mappings to better siblings, leaving worse ones "empty". Maybe there is a possibility that a tag could have two better siblings, which I thought isn't allowed by hydrus but all kinds of shit happens there so it's better to improve hybooru tolerance to hydrus fuckery.

I will have to investigate this issue. It is possible that we might need to drop double-better siblings situations, or arguably, better, join lookup tables together on conflict. There is also possibility that the loop resolution in during tag normalization has some bug in it.

Could you try uploading your database(without images) for sake of testing? It could greatly help me debug this issue.

Xunnamius commented 10 months ago

Ah, gotcha. I'm not at liberty to upload the hydrus db (it's also ~100gb). But if Hybooru has a verbose debug mode or crash dump / log-generating option, I'm more than willing to rebuild the database on another drive and send you the outputs.

funmaker commented 10 months ago

Does that 100GB includes images or not?

As of hybooru, there is really nothing to log here, it just runs an SQL query and something goes wrong in the database. You can try to make a minimal reproduction for this bug or upload just the part of your database that is causing that issue, ie without caches, images and stuff. Otherwise, there is little I can do without any database to debug on.

If you need a place to upload to I can give you an account on my server so you can just scp/rsync/ftp into it.

Xunnamius commented 10 months ago

That 100GB (~89GB) database does not include images or thumbnails, it's just that massive. It ballooned to that size after downloading the PTR. I'll see if I can reproduce it just rebuilding off the PTR and, if so, I'll upload the result.

Update: I've been assuming you want the client.mappings.db (~60GB), client.master.db (~20GB), client.caches.db (~6GB), and client.db. If you only want client.db, that's only 2GB. Or if I'm misunderstanding entirely, please let me know :)

funmaker commented 10 months ago

I would need at least mappings and client.db, but all would be preferable. Otherwise, I will have to fake the rest of the database or something.

You can try with PTR but AFAIK there are other people using hybooru with large databases synced with PTR without this particular issue.

funmaker commented 8 months ago

Should be fixed as of v1.10.0 Let me know if you are still experiencing this issue.