Nandaka / DanbooruDownloader

*booru image downloader
http://nandaka.devnull.zone/
392 stars 40 forks source link

After downloading 1000 images from Sankaku, I can only download in batches of 20 at a time before it starts skipping #225

Open sugnimmingus opened 3 years ago

sugnimmingus commented 3 years ago

I've already added my username and password hash to the cookie login. When I try to start a new batch job from, say, page 53, it downloads 20 images, pauses for a sec, and then just skips the rest. I can then make a new batch job at page 54 for example, and it will correctly download the next 20 images from that page, before skipping the rest again.

What's going on here? Why is it skipping everything? There was no skipping for the first 1000 images (50 pages) and as soon as it arrived at page 51, the skipping began. It's worth noting that I have a premium account, and NON-premium users aren't able to browse past page 50. So it seems related to that. It's almost as if it can recognize that I'm premium for the first page, and then it stops registering that I'm premium after that page. Very weird.

How do I fix this?

sugnimmingus commented 3 years ago

I went in to the logs to get a better look at what was going on, and what I found is weird. After downloading the 20 images of a page, when it (correctly) increments to the next page, even though the URL is correct for the next page number, it somehow finds and tries to download the 20 images of page 1. It will continue to do this for each successive page. And since the images from page 1 already exist in my directory, it just skips them, and the cycle repeats. I don't understand why it would be finding the images from page 1, when the URL is correctly labeled as page 54 for example. 1

Nandaka commented 3 years ago

looks like sankaku doesn't use the standard api, as it depends on additional &next= parameter (if you check from browser and disable auto page.

Nandaka commented 3 years ago

try latest releases from https://github.com/Nandaka/DanbooruDownloader/releases ensure the cookies is updated.

sugnimmingus commented 3 years ago

Strange. Since updating, I'm getting this error when I try to run the same search query I was running last night. image

The search query is: sex -3d -animated order:quality height:512..1000 This query was not giving me any issues last night. Any ideas what's going on? I did some teasting and it seems order:quality is the problem. Searching order:quality is returning that error. But it wasn't on the previous version I was using.

Nandaka commented 3 years ago

looks like it failed to find a match to resolve the next parameter

Nandaka commented 3 years ago

yeah, they changed the pattern again. Last time it was the next post id, now looks like it got decimal in it image

sugnimmingus commented 3 years ago

What am I supposed to do with this information? And why would that only apply to order:Quality?

Nandaka commented 3 years ago

no idea, sankaku is not using standard danbooru api. You can try to use https://bionus.github.io/imgbrd-grabber/ as it more mature/maintained.

Nandaka commented 3 years ago

updated exe, just replace the old one DanbooruDownloader3.exe.zip

yami-no-tusbas commented 3 years ago

btw, Grabber by bionus doesn't support Sankaku complex AFAIK, sankaku admin always do it's best to oppose to grabbers programs...

see : https://github.com/Bionus/imgbrd-grabber/issues/2128#issuecomment-697037426

sugnimmingus commented 3 years ago

Looks like your update fixed my problem! Thanks for the help! Will let you know if I have any future errors.

sugnimmingus commented 3 years ago

Nevermind I take it back, it's still looping the same posts after downloading 500 images for some reason, which is less than the original 1000. Any ideas? It seems to have begun this time after reaching page 26. However, interestingly enough resuming from page 26 works and then it begins skipping at page 27. This is the same thing that was happening before except for page 51+. I wonder what's going on...

What's strange is I had tested it on page 51 and it successfully downloaded off of page 52 etc. Yet now it's broken after downloading everything from page 1-25. Any ideas?

Nandaka commented 3 years ago

because the way the server calculate the query and return the post. I think for the first 25 pages, it doesn't require &next= parameter if you search from 1st page, as it also send &commit=Search.

You can check the behavior from the browser by disabling the auto paging feature first (because the program doesn't use it).

Anyway, their site is non-standard so this program cannot use the API and depend on page scraping based on the webpage returned by their server.

sugnimmingus commented 3 years ago

I'm not sure what you're recommending I do. What's strange is I successfully tested page 51, and it correctly proceeded to page 52 without skipping images. But after downloading pages 1-25, it no longer works. I'm really not sure what you're saying I need to do. I understand that the API is unusual, but if I manually visit the website, I can correctly load results from any page. So how do I get the loader to do this? I don't believe it's impossible.

sugnimmingus commented 3 years ago

Okay so I've done some more research. It has to do with the fact that, on Sankaku, you can't edit the page number in the URL if the URL contains ?next. Is there anyway for you to code it in such a way that it drops the ?next part and just inputs the page number for each page?

sugnimmingus commented 3 years ago

One person at Sankaku recommended using this new beta API https://capi-v2.sankakucomplex.com Is this possible?

Also, this is what he said about rewriting the code: Screen Shot 2020-11-18 at 4 48 31 PM

sugnimmingus commented 3 years ago

Any chance you'd be able to address this?

ghost commented 3 years ago

I'm having the same issue. is there any way to make the program ignore the "&next" part of the url? thanks!