9-FS / nhentai_archivist

downloads hentai from nhentai.net and converts to CBZ
MIT License
116 stars 7 forks source link

Bug - Tag Filtering sometimes not applying to downloads #47

Open WatToDoWad opened 4 days ago

WatToDoWad commented 4 days ago

Just posting this issue here first to check if anyone else has experienced the same, but it seems that recently, some titles have made it past the tag filtering and I'm not entirely sure why that is the case.

NHENTAI_TAGS = ['language:"english"', '-tag:"yaoi"', '-tag:"scat"', '-tag:"guro"', '-tag:"futanari"', '-tag:"smegma"', '-tag:"snuff"', '-tag:"mutilation"', '-tag:"cannibalism"', '-tag:"vore"']
CF_CLEARANCE = ""
CLEANUP_TEMPORARY_FILES = true
CSRFTOKEN = ""
DATABASE_URL = "./db/db.sqlite"
DOWNLOADME_FILEPATH = "./config/downloadme.txt"
LIBRARY_PATH = "./hentai/"
LIBRARY_SPLIT = 10000
SLEEP_INTERVAL = 50000
USER_AGENT = ""

image

Expected Behaviour

Well... it clearly shouldn't be downloading this title...

Actual behaviour

image

It did download the title.😭. Now I have to prune it from the database manually and delete it from my folders.

9-FS commented 4 days ago

Hi, first of all thanks for writing a great bug report with all the information that I need to try to reproduce the error.

I have pasted your configuration and my resulting downloadme.txt has 92.427 results. A manual search via the nhentai.net search bar with "language:"english" -tag:"yaoi" -tag:"scat" -tag:"guro" -tag:"futanari" -tag:"smegma" -tag:"snuff" -tag:"mutilation" -tag:"cannibalism" -tag:"vore"" also yielded 92.427 results, so I assume that the code currently uses the API correctly. In my downloadme.txt, your example hentai 539129 did not show up. I randomly clicked through roughly 100 hentai on the website and they also did not have any of the excluded tags. Maybe try to do another search. Does the problem still persist? If not, I'll assume it's just another random nhentai.net API fuckywucky... They really do not have the most reliable API and some things are just out of my hand.

WatToDoWad commented 4 days ago

Hi, first of all thanks for writing a great bug report with all the information that I need to try to reproduce the error.

I have pasted your configuration and my resulting downloadme.txt has 92.427 results. A manual search via the nhentai.net search bar with "language:"english" -tag:"yaoi" -tag:"scat" -tag:"guro" -tag:"futanari" -tag:"smegma" -tag:"snuff" -tag:"mutilation" -tag:"cannibalism" -tag:"vore"" also yielded 92.427 results, so I assume that the code currently uses the API correctly. In my downloadme.txt, your example hentai 539129 did not show up. I randomly clicked through roughly 100 hentai on the website and they also did not have any of the excluded tags. Maybe try to do another search. Does the problem still persist? If not, I'll assume it's just another random nhentai.net API fuckywucky... They really do not have the most realiable API and some things are just out of my hand.

How often does the API fuckywuckys? Could there be a better way of checking tags on the archiver rather than blindly trusting the API?

9-FS commented 4 days ago

This is the first time I've heard about this particular bug, but during the history of this project unexpected behaviour due to unclean API design and straight up API fuckywuckys haven been a regular occurence...

So yeah... Basically all bugs I've had in version 3 (Rust rewrite) so far were because of the API.

There is the possibility to not rely on the API search. But that would require to mass download all of the metadata before doing the hentai selection with the database locally. And to keep an updated database this would need to be done every time before downloading. That takes just way too long.

There are 2 advantages to this: Implementation of multiple searches connected via logical OR could be finally implemented and we wouldn't need to rely on nhentai's flaky search API. But at the moment I don't see how that approach is practicable.

Another approach could be to use nhentai's search API for metadata download and then validate it locally. This would add another layer of unwanted complexity though and would make debugging way more difficult in the future as nhentai search results and downloadme's would not necessarily be the same any more.

Please notify me though if this bug should become a regular occurence. Otherwise I'm currently inclined to not change anything. Until then, you could use some SQL to display all of the undesired hentai you have downloaded for more efficient deletion.