Improve HDoujin info.txt parsing

Difegue / LANraragi

Web application for archival and reading of manga/doujinshi. Lightweight and Docker-ready for NAS/servers.

https://lrr.tvc-16.science

MIT License

2.25k stars 160 forks source link

Improve HDoujin info.txt parsing #1053

Closed HDoujinDownloader closed 2 months ago

HDoujinDownloader commented 3 months ago

Currently, tags are only extracted from the TAGS field for HDoujin's info.txt files. I've updated the plugin to extract tags from other fields as well (artist, series, language, parody, etc.), and namespace them accordingly.

HDoujinDownloader commented 3 months ago

Thank you for the feedback!

I updated the JSON parser to read the summary and make it more consistent with the output from the TXT file parser. It was adding all the fields as tags (including titles and URLs), but I've limited it to a more relevant subset. I also updated it to work with different JSON configurations (the outer manga_info may or may not be present based on user settings). The namespace-related issues should be resolved now as well.

I think the tests should be updated to reflect the new summary/description parsing here

Correct me if I'm wrong, but it doesn't look like there are any tests for this format right now. I could possibly add some.

Difegue commented 2 months ago

Thanks! The JSON parser was a pretty old bit of code so I'm not surprised if it was worse than the txt version.
There are indeed no specific tests for the HDoujin plugin - Adding some with samples like the other plugins have would be welcome, but that's not blocking me from merging this in the meantime.

holopin-bot[bot] commented 2 months ago

Congratulations @HDoujinDownloader, the maintainer of this repository has issued you a holobyte! Here it is: https://holopin.io/holobyte/cm0063ggy18850clbr48u3ufu

This badge can only be claimed by you, so make sure that your GitHub account is linked to your Holopin account. You can manage those preferences here: https://holopin.io/account. Or if you're new to Holopin, you can simply sign up with GitHub, which will do the trick!

Boontato commented 2 months ago

I updated the JSON parser to read the summary and make it more consistent with the output from the TXT file parser. It was adding all the fields as tags (including titles and URLs), but I've limited it to a more relevant subset. I also updated it to work with different JSON configurations (the outer manga_info may or may not be present based on user settings). The namespace-related issues should be resolved now as well.

Thanks squiddy for working on this, I actually enjoyed that it would pull URLs since in mihon/tachi i could search nhentai codes and it would resolve because the url is part of the tags and it was useful at least for me.

when i saw this PR i was hoping that it would fix the ability for this plugin to pull the title too because right now im using a secondary plugin just to pull title information from the metadata file.

HDoujinDownloader commented 2 months ago

@Boontato Oh! I didn't even notice plugins could specify a gallery title. I'll get that fixed and submit a new PR in a bit.

@Difegue What's your take on having URLs in the tags (e.g. url:https://nhentai.net/g/XXXXXX/)? If the use case is just being able to search by NHentai code, maybe there's a better way to do it.

Difegue commented 2 months ago

You should use source:nhentai.net/xxxx tags if you want to add URLs to the metadata, there's support for those in the browser extension and a few other spots.

Boontato commented 2 months ago

Yes I have been using tag rules to convert url namespace to source namespaces. mihon also allowed specifying which namespace to use to pull the url too.