Difegue / LANraragi

Web application for archival and reading of manga/doujinshi. Lightweight and Docker-ready for NAS/servers.
https://lrr.tvc-16.science
MIT License
2.16k stars 152 forks source link

The search result of E-Hentai Plugin is not correct. #569

Closed PasserDreamer closed 2 years ago

PasserDreamer commented 2 years ago

LRR Version and OS Docker Version 0.8.3

Bug Details The search result in sad panda messed up when mixing English and Japanese as keywords. (e.g. The title with "JK" all got the same result.) After doing some research, I found that sad panda search both title and tag by default. To resolve this issue, use advanced search option. (https://sad🐼/?f_cats=0&f_search={Keyword Here}&advsearch=1&f_sname=on)

Difegue commented 2 years ago

Ad you can see here, the metadata plugin does use advanced search by default.

If you enable "debug mode" in server settings, the plugin logs will show the full search URLs used; I recommend you take a look at those and see how it differs from what you're expecting. We might be able to make some improvements then 👍

PasserDreamer commented 2 years ago

https://github.com/Difegue/LANraragi/blob/2465b56862a6e807529558c791ad86c249d5b4a5/lib/LANraragi/Plugin/Metadata/EHentai.pm#L136

Remove &f_stags=on will be ok since tag search is not required.

[2021-12-24 08:21:22] [E-Hentai] [debug] Using URL https://exhentai.org?advsearch=1&f_sname=on&f_stags=on&f_sdt2=on&f_spf=&f_spt=&f_sfu=on&f_sft=on&f_sfl=on&f_search=%22%5B%E3%81%84%E3%81%A1%E3%81%94%E3%82%AF%E3%83%AC%E3%83%BC%E3%83%97%E5%A4%A7%E7%9B%9B%E7%B5%84%20%28%E6%A8%AA%E5%8D%81%E8%BC%94%29%5D%20%E6%8F%B4%E4%BA%A4JK%E3%81%AB%E7%B5%B6%E5%AF%BE%E4%B8%AD%E5%87%BA%E3%81%97%E3%81%99%E3%82%8B%E3%83%9E%E3%83%B3%EF%BC%81%20%E3%80%8C%E3%81%8A%E3%81%A3%E3%81%A8%E3%82%8A%E3%81%8A%E3%83%90%E3%82%AB%E7%B3%BB%E3%81%BD%E3%81%A1%E3%82%83%E5%A8%98%E7%B7%A8%E3%80%8D%22&f_sh=on+language%3Ajapanese (archive title) [2021-12-24 08:21:24] [E-Hentai] [debug] EH API Tokens are 1826976 / 4ca07a70c7 [2021-12-24 08:21:25] [E-Hentai] [debug] E-H API returned this JSON: {"gmetadata":[{"gid":1826976,"token":"4ca07a70c7","archiver_key":"455648--4f54b2c884a897d43771c8a80f5b0cae55efa7f5","title":"ARTIST JK","title_jpn":"","category":"Western","thumb":"https:\/\/ehgt.org\/09\/c3\/09c3b195bd15fcfd9069be2c4ade315e89c8c5b8-574607-900-841-jpg_l.jpg","uploader":"ranma-chan","posted":"1633193205","filecount":"816","filesize":268997262,"expunged":false,"rating":"2.88","torrentcount":"0","torrents":[],"tags":["male:shotacon","female:lolicon","mixed:incest","other:western imageset"]}]} [2021-12-24 08:21:25] [E-Hentai] [info] Sending the following tags to LRR: male:shotacon, female:lolicon, mixed:incest, other:western imageset, category:western

polak14 commented 2 years ago

tag search is required when searching for language.

Anyway since this is about the EH plugin, a suggestion i would like to make is stripping gallery IDs from title. Its a problem when you download galleries via h@h. for example: "[Hardcore Zayaku Souten (Hirayan)] 45XY45 (Girls' Frontline) [English] [MegaFagget] [Digital] [1543221]" will return no results but "[Hardcore Zayaku Souten (Hirayan)] 45XY45 (Girls' Frontline) [English] [MegaFagget] [Digital]" will.

$title =~ s/ \[[0-9]+\]//g; i just copied this part from the fakku plugin and adjusted it for my needs, it works fine for me.

Also there appears to be no delay between thumbnail and title search and I get banned fairly often due to it.

Difegue commented 2 years ago

It's annoying that h@h only gives the gallery ID, since if we had the gallery token alongside it we could just hit the API directly and skip the thumbnail/title search.
I don't think there's a way to get the token, but someone might know more about this than I do.

As for delays, there is normally one, but it depends on whether you receive the excessive request warning first:
https://github.com/Difegue/LANraragi/blob/959b79455b0a736e18c1b7273506690117f9f41b/lib/LANraragi/Plugin/Metadata/EHentai.pm#L215

FeudalNoodle commented 2 years ago

Regarding the H@H issue, I'm currently looking into bastardizing one of the JSON plugins. Each gallery downloaded via H@H comes with a galleryinfo.txt containing a snapshot of all relevant metadata at the time of download.

The .txt files look like this:

Title: xxxxxxxxxx Upload Time: 2020-11-11 00:00 Uploaded By: Katlan Downloaded: 2021-12-26 22:09 Tags: language:english, language:translated, artist:yyyy, male:dark skin, female:fox girl, [...]

Uploader's Comments:

[...]

Downloaded from E-Hentai Galleries by the Hentai@Home Downloader <3

I'll haven't really dabbled with REGEX or perl before so it may take a solid amount of fiddling.

Difegue commented 2 years ago

Not really; Looking at the format it's likely you can just hack on the HDoujin plugin which has a similar format.

Simply editing this line: https://github.com/Difegue/LANraragi/blob/959b79455b0a736e18c1b7273506690117f9f41b/lib/LANraragi/Plugin/Metadata/Hdoujin.pm#L75
with galleryinfo.txt and this one: https://github.com/Difegue/LANraragi/blob/959b79455b0a736e18c1b7273506690117f9f41b/lib/LANraragi/Plugin/Metadata/Hdoujin.pm#L88 to use Tags instead of TAGS should be enough to get this working in a basic way.
You can then extend to add support for Title and Upload Time, likely.

Bundling dedicated plugins for all possible variants of .txt/.json files isn't viable long term as it seems a different one pops up every 6 months. This is a case where having a real plugin repository would certainly help..

Difegue commented 2 years ago

This issue is turning into a dumping grounds for EH issues so I'm converting it to a discussion -- as far as the original issue goes, I've added an adjustment to disable f_stags unless default language is toggled.