bibanon / BASC-py4chan

Python wrapper for 4chan API. The BA's vastly improved fork of Edgeworth's original.
http://basc-py4chan.readthedocs.org/en/latest/index.html
Do What The F*ck You Want To Public License
55 stars 13 forks source link

4chan giving up 4cdn, part 1: files and thumbs #25

Closed mkody closed 7 years ago

mkody commented 7 years ago

Hiro wants to cut some costs and remove the 4cdn.org domain (and CDN?). He already started with the files and thumbs.

Some users are now getting errors from i.4cdn.org, so we should update the code.

antonizoon commented 7 years ago

Does the static s.4cdn.org need to be updated?

mkody commented 7 years ago

I didn't find the new domain, the website is still using a.4cdn.org and s.4cdn.org for now. (thus the "part 1")

antonizoon commented 7 years ago

Do you happen to have the URL of the thread where this was announced? It could be good to check out as well.

antonizoon commented 7 years ago

It looks like at least is.4chan.org works, so I will merge this for now.

mkody commented 7 years ago

He never announced it, but started to make the move already (4chanX did some fixes for it somes weeks ago). It was one of his options: http://archived.moe/qa/thread/706294/

And this is what some people are starting to get, only on the CDN: 1479320704

r3c0d3x commented 7 years ago

Hopefully we can get some confirmation on this from Hiro soon. Thanks for all the info!

antonizoon commented 7 years ago

The a.4cdn.org link might be on the block for replacement. Historically, the url was originally api.4chan.org, and that currently still works fine:

http://api.4chan.org/a/threads.json

Same thing for s.4cdn.org, it used to be static.4chan.org.

The reason moot had changed them to 4cdn.org was because it saved a lot of bandwidth due to the length of the img.4chan.org domain. But i guess that's not as significant if you use gzip, i dunno.

antonizoon commented 7 years ago

I've talked to the fuuka admins and they have seen a few more timeouts than usual on 4cdn.org, but not significant.

However, for them is.4chan.org has significantly reduced performance: well, the images themselves load quickly, but the requests take longer. Maybe it's rate limiting? dunno.

But basically: are you absolutely sure that i.4cdn.org needs to be switched over? They haven't yet switched on the 4chan.org itself. Maybe they planned a switch and have rolled back? Dunno, it's weird.

mkody commented 7 years ago

Yup, that's weird. Sometimes, the website uses is.4chan.org, and then i.4cdn.org, and then back to is.4chan.org... and it can happen multiple times in the same day.

If you search in /qa/ archives, you can see a lot of people are complaining that is.4chan.org is slower too but it has all the links working (which, for some people, isn't the case on i.4cdn.org).

4chan is quite unstable now and is going back and forth. Since many of my users have reported that 4cdn.org wasn't working, I tried to use the 4chan.org domain and... well it worked fine during the last 3 weeks for now.

If needed, we can rollback this if they finally give up. For static, I didn't see any changes to static.4chan.org. Let's hope Hiro and the admins could share some news soon.

mkody commented 7 years ago

Looking at the source code, extension.js is using both (randomly): https://github.com/Floens/ChanTracking/blob/master/javascripts/extension.js#L523-L528

  if (data.no % 3 === 0 && board !== 'f') {
    imgDir = '//is.4chan.org/' + board;
  }
  else {
    imgDir = '//i.4cdn.org/' + board;
  }
irlittz commented 7 years ago

This doesn't bode well for the future. I almost expect Hiro to remove or disable access to the 4chan API sooner or later, in part due to his strange and false (confirmed by various, although not entirely unbiased, sources) claims of current and future costs of hosting 4chan. Well, we will see.

r3c0d3x commented 7 years ago

Fair point - although I really hope that Hiro wouldn't make that irrational of a decision. I'd expect him to see that having the API would reduce the strain (and, consequently, cost) on/of the servers because, if it was removed, people would just go back to scraping the HTML.

irlittz commented 7 years ago

CloudFlare and the various reCAPTCHA mechanisms (image group etc.) are pretty good at preventing that. I fear if he removes access completely, scraping 4chan reliably is over and done with -- unless you are willing to pay money and deal with all the problems that come with using unreliable captcha-solver services.

But this seems to be what its coming down to anyways for Hiro: paying money, specifically to him. From what I read the random "your IP range is blocked" messages are not unknown to 2ch?-members which he also owns. Apparently these are transient bans, simply trying to force users to buy VIP passes to bypass them.

So that's what I expect to happen to the 4chan API access sooner or later: pay up or shut up.

mkody commented 7 years ago

To give a bit of update, only /f/ is using i.4cdn.org now. And everything else is now on is2.4chan.org only... Or I don't know how math works and this modulo can go above 2.

  if (board !== 'f') {
    if (data.no % 3 > 2) {
      imgDir = '//is.4chan.org/' + board;
    }
    else {
      imgDir = '//is2.4chan.org/' + board;
    }
  }
  else {
    imgDir = '//i.4cdn.org/' + board;
  }

Source

I didn't notice anything wrong by using is.4chan.org everywhere for now, tho.