Bionus / imgbrd-grabber

Very customizable imageboard/booru downloader with powerful filenaming features.
https://www.bionus.org/imgbrd-grabber/
Apache License 2.0
2.55k stars 216 forks source link

Issue with Sankaku sources #997

Closed urzuse7en closed 5 years ago

urzuse7en commented 7 years ago

chan.sankakucomplex.com and idol.sankakucomplex.com is giving a "no result" when searching due to a link redirection.

In the log it states: "Redirecting page https://chan.sankakucomplex.com/post/index?limit=20&tags=&page=1 to https://chan.sankakucomplex.com/500.html"

Pasting the above link to a browser gives "A technical problem has prevented the page from loading." However if I open the link without limit=20 it works fine.

Problem is, I can't disable or exclude the image limit from the link (I tried to put 0 in Images per page option). Not sure if they changed something or it's really just a "technical problem" since it stopped working even before updating.

I'm using the latest version with Windows 7(x86)

light29 commented 7 years ago

seems cloudflare security has banned you ip like it did mine

urzuse7en commented 7 years ago

I tried a lot of proxies, vpn, tor browser, even my friend's computer. It's just doesn't works. I'll try some more proxy then, hope I can find a working one.

butt-fli commented 7 years ago

i found that if you delete the key "limit", you can visit the page as usual. but i don't know how to edit this, so i sent an email to the developer.

chnoHolic commented 7 years ago

cloudflare-error-1006

Sankaku Complex will unban your ip after an hour or so. They see grabber as a ddosing attempt considering what an e621 dev said in another thread #995

As the developer on e621.net I am trying to narrow down why this tool is making one request to /post/show for every image requested for some users and not for others. The tool seems very capable of pulling the information from the XML or JSON copies and when behaving normally, it appears to do this. When it malfunctions it makes hundreds of thousands of redundant requests without a rate limit and seems to go totally wild and download everything and anything, often repeatedly.

Grabber bases itself a lot on tag types (general, artist, copyright, species, etc) and those are not provided on the API endpoint. However, I agree that it is still possible to get this information by other means than a request for every image to download, by having for example a local database of tag types. But that leaves the case of tag type changes and new tags. Since this logic was implemented almost 5 years ago I simply went with the simplest solution at the time but I'd gladly change it if necessary.

The reason Grabber only sometimes does this queries is depending on the download filename format (more info here: https://github.com/Bionus/imgbrd-grabber/wiki/Filename). If the information to generate the filename is already available through the API (ex: the default %md5%.%ext% filename) this second call won't be done.

I've had my ip blocked twice, and I've attempted to use grabber from a different ip which ended with similar results, So from what I observed this issue is reproducible.

urzuse7en commented 7 years ago

So they did change something after all, because my IP isn't banned. I'll wait for the new release then, see if that fix this. Thanks for the info!

Bionus commented 7 years ago

Hello.

Sankaku is indeed purposedly blocking Grabber requests basing itself on the presence of the limit parameter. The only temporary fix would be to edit the model.xml. However, these requests are being blocked for a reason: Sankaku's owner is willing to completely block Grabber.

It also seems like Sankaku disabled its JSON API endpoints.

haren123fuj commented 7 years ago

Hey.

Are you planning to adjust next version of Grabber to work with sankaku? I really love this program so i'm sad about recent problems with downloading from sankakuchannel.

trollsasha1 commented 7 years ago

Please make a working version for sankaku

camilox3 commented 7 years ago

I thought I was the only one. I love Grabber, we hope to have an update soon.

MasterPetrik commented 7 years ago

Sankaku is indeed purposedly blocking Grabber requests basing itself on the presence of the limit parameter.

Is it blocks all requests or only batch downloads? Heh, anyway, in my country Sankaku is blocked by goverment)

kiranoot commented 7 years ago

As users you are approaching this from an angle that will become a fight. Asking the author of Grabber to bypass intentional limitations placed on the site will be seen as malicious, instead of "making things work again."

A dialog between the users and the site owners needs to be had, and it made clear that you would like to use this software with the site, as well as that you are willing to compromise about server load and download speeds, and that the creator of the tool is open to dialog and making changes to ensure that things work without excessive resource usage.

If the owner of the site is unwilling to compromise then it isn't in the best interest of Grabber to bypass those limitations, as unfortunate for you, the users as that is. Escalation in these matters can lead to undesirable consequences for other users, including site owners choosing to require registration/payment to access content, and then banning users who violate their terms of service. Most site owners don't want to do these actions, or devote their time to solving problems like this. However if they feel pressured and forced into a corner, they will do them.

MasterPetrik commented 7 years ago

Indeed. Sankaku gets no benefits from cooperating with grabber. Moreover, for them, Grabber traffic is a parasite traffic. Users who visits Sankaku loads 1 image per page and watch adwares, low server load, big profit. Users who uses Grabber DON'T visits Sankaku, loads bunch of images in one time and not watch adwares, cause very high server load and brings NO profit.

For example, they better will serve 200 users on the site, who will watch 200 images and 200 times watch adwares, than one parasite Grabber user, who mind to download 200 images.

So I see here is the only one suitable solution: very big delay for batch downloads(like 1 image in 30 seconds\1 minute) and limit for size of batch download like 500 images in a day. I guess better something than nothing.

Anyway using Boorus like grabber do it, it's like piracy. So we are the bad guys here, not Sankaku.

MasterPetrik commented 7 years ago

2017-08-11_065248 win7 mastercore

As you can see, Sankaku disabled it's API not only for downloaders, but even for IQDB search...

trollsasha1 commented 7 years ago

Гандоны

MasterPetrik commented 7 years ago

Indeed, Sankaku is purposedly blocking downloaders.

https://chan.sankakucomplex.com/forum/show/16175 Their mod says that they are not blocking someting, but just updated their API

Bionus commented 7 years ago

Sankaku has a long history of using lots of ways to block downloaders (UA filtering, limiting, blocking some API usages such as "limit", etc.). Working around it this time would go directly against the developer's wishes. That I know because Artifact (Sankaku's developer) contacted me last month asking to stop supporting Sankaku in Grabber. 😞

I truly believe that with the changes provided by issue #995 (notably the tag database system) and if Sankaku provided a proper API, both users and the developer could find common ground.

But, in the current state of things, the one to convince isn't me. 😕

What I know is that Sankaku is working on improvements to provide a proper API, which I really wish to see happening.

maskkulin commented 7 years ago

very sad about this.. I always liked sankakucomplex more, because they have way more pics than any other boorus

MasterPetrik commented 7 years ago

Me too, but hey, it still can be used manually!

fighuass commented 7 years ago

Is Sankaku still broken on Grabber? I'm trying to download some groups and I really, really don't want to do that manually.

MasterPetrik commented 7 years ago

Is Sankaku still broken on Grabber?

If this issue remains open, it means that the problem is still relevant, and no need to duplicate this issue in new topics like you did(https://github.com/Bionus/imgbrd-grabber/issues/1045).

fighuass commented 7 years ago

It seems to work better if you make the intervals in site options longer (I set them all except thumbnail to 4s), although I still got banned.

Setting max simultaneous downloads to 1 also helps a little.

GiRaFa-SM commented 7 years ago

Hey guys, look at this : https://github.com/Nandaka/DanbooruDownloader/issues/134 and it looks like sankaku is working in the last version.

Also, i posted an issue with all sankaku tags in a file, someone suggested this above, if grabber dev do some tweaking it may work.

also, can someone post their sankaku settings.ini file? thanks.

NavinF commented 7 years ago

I spent some time playing around with this and here's what I found:

  1. If I just run v5.5.0 without any changes, I immediately see a ButtFlare ban on the "www." subdomain for an hour.
  2. If I remove the "limit=" URL parameter with this patch, the "capi." JSON endpoints work for a while:
    diff --git a/release/sites/Sankaku/model.xml b/release/sites/Sankaku/model.xml
    index 4c336de4..5cc7e843 100644
    --- a/release/sites/Sankaku/model.xml
    +++ b/release/sites/Sankaku/model.xml
    @@ -5,18 +5,18 @@
        <ImageReplaces>/preview/->/&amp;([^s]).sankakucomplex->\1s.sankakucomplex</ImageReplaces>
        <Login>login={pseudo}&amp;password_hash={password}&amp;appkey={appkey}&amp;</Login>
        <Xml>
    -           <Tags>/post/index.xml?{login}limit={limit}&amp;page={page}&amp;tags={tags}</Tags>
    -           <Pools>/post/index.xml?{login}limit={limit}&amp;page={page}&amp;tags=pool:{pool} {tags}</Pools>
    +           <Tags>/post/index.xml?{login}page={page}&amp;tags={tags}</Tags>
    +           <Pools>/post/index.xml?{login}page={page}&amp;tags=pool:{pool} {tags}</Pools>
            <NeedAuth>true</NeedAuth>
        </Xml>
        <Json>
    -           <Tags>/post/index.json?{login}limit={limit}&amp;page={page}&amp;tags={tags}</Tags>
    -           <Pools>/post/index.json?{login}limit={limit}&amp;page={page}&amp;tags=pool:{pool} {tags}</Pools>
    +           <Tags>/post/index.json?{login}page={page}&amp;tags={tags}</Tags>
    +           <Pools>/post/index.json?{login}page={page}&amp;tags=pool:{pool} {tags}</Pools>
            <NeedAuth>true</NeedAuth>
        </Json>
        <Html>
    -           <Tags>/post/index?{login}limit={limit}&amp;tags={tags}&amp;{pagepart}{altpage}</Tags>
    -           <Pools>/post/index?{login}limit={limit}&amp;tags=pool:{pool} {tags}&amp;{pagepart}{altpage}</Pools>
    +           <Tags>/post/index?{login}tags={tags}&amp;{pagepart}{altpage}</Tags>
    +           <Pools>/post/index?{login}tags=pool:{pool} {tags}&amp;{pagepart}{altpage}</Pools>
            <Post>/post/show/{id}</Post>
            <Limit>20</Limit>
            <MaxLimit>200</MaxLimit>
  3. After making a couple of requests, "capi." stops accepting TCP connections from my IP. I can still open connections from other IPs.
  4. Sometimes "cs." (the subdomain that serves images) stops accepting any TCP connections on port 443, but "chan." still works in the browser. I guess they're just blocking anyone that makes any API request or downloads images.
  5. If I spoof a browser user agent by editing lib/src/models/site.cpp, I get HTTP 403 Forbidden on all API requests.

At this point I wonder if it would be easier to forget about the API and just scrape the main site with Selenium driving a full browser. That's what I'm working on now.

baraa272 commented 7 years ago

please highlight this comment temporary fix for sankaku WITH JDOWNLOADER2 now at least grabber can extract links for images now run jdownloader2 [copy links to clipboard] in grabber now jdownloader2 will analyze all links right click and [start all downloads] voila!!!! please share this until grabber fixed :)

Mad-onanist commented 5 years ago

Anyway using Boorus like grabber do it, it's like piracy. So we are the bad guys here, not Sankaku.

Why? Sankaku is just host of stolen pictures. He got no rights for them, so you wrong.

Birb26 commented 2 years ago

If anyone is interested, I only got this "A technical problem has prevented the page from loading." after loading 20+ tabs of my favorites to remove them. Since I can't find the "deactivate account" I wanted to remove all favorites, log-out and delete password.