ccloli / E-Hentai-Downloader

Download E-Hentai archive as zip file
GNU General Public License v3.0
1.87k stars 138 forks source link

[Feature request] Save the URL of each image to a text file by making only one HTTP request (or two if the gallery has two pages, or three if it has three pages, etc) #303

Open a84r7a3rga76fg opened 2 weeks ago

a84r7a3rga76fg commented 2 weeks ago

Edit: Instead of saving the URL, it'd be even better if it could save the page number, image token and the extension of every image file, e.g. https://exhentai.org/s/d1b07750bd/3042776-1 will be saved as 0001_d1b07750bd.jpg.

ccloli commented 2 weeks ago

If you want the page link, then just make sure you've set Settings -> Advanced -> Record and save gallery info as File info.txt, and checked includes Page Links, which is the default setting.

Saving image link is not available since the link will only valid for a short time, after that you'll see an error.

If you want to rename the file, for now it's not possible, but if you just want the image token, I guess you can just get the SHA-1 hash of the image file, then get the first 10 letters.

a84r7a3rga76fg commented 2 weeks ago

I don't want the image URL. I forgot to add that I want it to perform the action without wasting any GP, credits or hath, and without making too many page requests. I think it should just make one page request if all of the image URLs are in one page.

ccloli commented 2 weeks ago

So do you mean you want the page links or the image links without costing GP or downloading images?

a84r7a3rga76fg commented 2 weeks ago

Preferably none of those. Does my edit not show up? What I'm really looking for is saving the page number, image token (SHA-1) and the extension of every image file, e.g. https://exhentai.org/s/d1b07750bd/3042776-1 will be saved as 0001_d1b07750bd.jpg.

If you're wondering why, it's because the latest restriction has made it impossible for anyone without GP, credit or hath to download the original images. Most people don't have enough GP, credit or hath to download a single gallery. Our only option is to use torrents, and these torrents often have unsorted image files.

ccloli commented 2 weeks ago

Preferably none of those. Does my edit not show up?

I did saw that but don't understand, probably it's nearly 6 AM in my timezone and I need a sleep. 😴

What I'm really looking for is saving the page number, image token (SHA-1) and the extension of every image file, e.g. https://exhentai.org/s/d1b07750bd/3042776-1 will be saved as 0001_d1b07750bd.jpg.

So do you want to rename the download file? For now it's not possible, but Soonâ„¢.

If you just want to extract them from the page link and get a plain text list for such naming (d1b07750bd/3042776-1 -> 0001_d1b07750bd.jpg), probably you can try the first option's code to get page links, and ask GPT to write an automated script for you.

However the page link only contains page number and image token, the file extension is not available. You need to extract the file extension from the thumbnail URL, but the link doesn't include the page number, and the script doesn't grab thumbnail URL actually, so you need to DIY.

An example of image grid's page source code:

image

it's because the latest restriction has made it impossible for anyone without GP, credit or hath to download the original images.

It's not restricted in latest update, but in last year. The latest change is just hide image limits for normal user, all the other rules are still the same as previous 2023-08 updates (latest galleries can grab with image limits only, except peak hours and/or old galleries applied).

Our only option is to use torrents, and these torrents often have unsorted image files.

If what you mean is to download with torrents, then calculate the hash for each file, then compare with the image link, then that'd make sense, but for that case you may probably only need the page number and page token. I'm still quite not understand why you need the file extension since you've already got the file from torrents, and to order them it's pretty sure you need to write a script to do that, then you can just extract the file extension part from the file name.


Time to sleep, if you've anything to update, I'll reply it ~10 hours later, sorry for let you wait. 😴

a84r7a3rga76fg commented 2 weeks ago

So do you want to rename the download file?

No, I want the page number, the image token and the extension of every image saved to a text file.

I'm still quite not understand why you need the file extension since you've already got the file from torrents

Sometimes whoever creates the torrent likes to change the extension from png to jpg or jpg to jpeg. You'd be surprised how often they do it.

ccloli commented 2 weeks ago

Sometimes whoever creates the torrent likes to change the extension from png to jpg or jpg to jpeg. You'd be surprised how often they do it.

Then I'd say I'm afraid it's not available, since to avoid the case you said (see #2 which is just your case), the script extracts the filename from the image file request's response HTTP header, so that it'll use the original file name and correct file extension, and definitely costs limits or GPs.

Since the script is focused on downloading file, so I'm not going to add such feature to extract file extension from thumbnail URL (and the thumbnail URL is only available when you switch to large thumbnail grid layout….

a84r7a3rga76fg commented 2 weeks ago

It can be a separate script.

ccloli commented 2 weeks ago

It can be a separate script.

Then I'd say do it yourself, since it's not related to the script's function, and I do really need a sleep, truly sorry for that. 🥲