Open maltbeverage opened 1 week ago
Forgot to add, I'm on the latest commit, f30ff59.
Found it I think: https://github.com/RicterZ/nhentai/blob/f30ff59b2ba62338bcd3281cd957d1f9b89c705f/nhentai/parser.py#L156
If I'm reading it right, this explains the parsing of the invalid webp url. In this case it's a broken thumbnail url, but if there was ever a case where nhentai implemented a different thumbnail extension from the actaul doujin image extension, that might bork the downloader.
I suppose the alternative would be to follow each url on the gallery page and extract the image url one at a time, but that sounds a bit expensive. Maybe error checking for a 404 code when downloading and failing with an error might be a good way to go. I'd rather have a failure on rare edge cases than an archive with missing images.
I'll check it out
Need more sample doujinshi, https://t5.nhentai.net/galleries/3115455/1t.jpg.webp
returns 403
After some investigations, I found that:
https://t5.nhentai.net/galleries/3115455/1t.jpg.webp
, returns 404https://i.nhentai.net/galleries/3115455/1.jpg
, seems like the uploader's mistake or bug of nhentai websitehttps://i3.nhentai.net/galleries/3115455/136.webp
, works fineNeed to determine whether it is an isolated case or the norm.
Here are all of the codes with bad thumbnail image urls. I've scraped everything released recently to check.
538005 538006 538020 538028 538045
Thanks for taking as a look.
I noticed some more galleries with issues:
538053 538058 538063 538087 538088 538090 538098 538148 538159
Looks like this'll be an issue until it's fixed on the nhentai side.
Here is a quick and dirty workaround if anyone needs to get this working:
https://github.com/maltbeverage/nhentai/commit/ea52cff2ad5eaad6de8aa71acdb52317dc78cd02
This should string split the two extensions and then use the first one.... probably introducing new edge cases with this, but it works for now.
Not sure if my problem is related, if not, please ignore and I will open a new Issue - since the downtime of nhentai earlier this week I can't download certain doujinshis, I haven't updated the script until earlier today to try and fix it but it keeps happening in the same way.
[11:56:00] doujinshi_parser: Fetching doujinshi information of id 538003 [11:56:01] doujinshi_parser: Tried yo get image id failed
It stays there for some time and just dies, I use the favorites method but even when trying to download just that one, same result. Not knowledgeable enough in either python or the scripts interaction with the site to know what might cause it, so I'm not sure if the same workaround would work for me.
Not sure if my problem is related, if not, please ignore and I will open a new Issue - since the downtime of nhentai earlier this week I can't download certain doujinshis, I haven't updated the script until earlier today to try and fix it but it keeps happening in the same way.
[11:56:00] doujinshi_parser: Fetching doujinshi information of id 538003 [11:56:01] doujinshi_parser: Tried yo get image id failed
It stays there for some time and just dies, I use the favorites method but even when trying to download just that one, same result. Not knowledgeable enough in either python or the scripts interaction with the site to know what might cause it, so I'm not sure if the same workaround would work for me.
Not the same issue as this one. nHentai started using webp images which did not have support for parsing until https://github.com/RicterZ/nhentai/commit/f30ff59b2ba62338bcd3281cd957d1f9b89c705f was commited a couple days ago.
If you git clone and install from source, it should work. I'm not sure if this fix has been pushed out to any other install methods.
I'm using the nhentaiGUI, so I just edited the couple of lines in the files I have, works now, thanks for the info and fix. Solved on my part.
i have the same problem it seem like it happen with any doujin uploaded recently ie: 538703 but when running the parser module as a standalone python file the code seem to run normally
print(doujinshi_parser("538703"))
After webp images started showing up, I noticed a few galleries were pulling in broken webp images. On closer inspection, the downloaded files contain html that show a 404 error.
Example 538028, the first thumbnail is referencing an invalid url:
Looks like an issue with nhentai. I can remove the .webp extension from the thumbnail url
and the thumbnail image will load in.
The broken thumnail links to page 1 of the doujin and does indeed have a working image:
So this broken thumnail might be messing up the parsing somehow. When comming accross these broken thumbnails, I think it attempts to download
which does not exist, the actual file is
and this successfully saves as a .webp file, but the contents are the html of the 404 error.
I'm thinking this might be a parsing logic issue if the thumbnail url is somehow used to determine the file extension of the downloaded image file.
This only affects around 5 galleries at the moment.