Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.37k stars 459 forks source link

503 error when accessing a protected page #132

Closed NajibAdan closed 6 years ago

NajibAdan commented 6 years ago

Hey. I've received a status code 503 when accessing http://kissmanga.com/.

Here is the HTML source of the protected page: https://pastebin.com/05ANq21d

Thanks.

Anorov commented 6 years ago

Are you using the latest version of cloudflare-scrape?

On Jan 20, 2018 9:24 AM, "NajibAdan" notifications@github.com wrote:

Hey. I've received a status code 503 when accessing http://kissmanga.com/.

Here is the HTML source of the protected page: https://pastebin.com/05ANq21d

Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Anorov/cloudflare-scrape/issues/132, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5FI3ts8b-be3zbMFPkzhMPCJzFjSPFks5tMaLIgaJpZM4RlW6Z .

Xonshiz commented 6 years ago

I can confirm the same issue and yes, I'm also on the latest version of Cfscrape. Seems like this is random. Some time it may connect and sometimes, it may not connect. It's totally out for crunchyroll.com and sometimes work for readcomicsonline.to.

Xonshiz commented 6 years ago

I got one of my scripts that use cfscrape, to generate a connection log...

DEBUG: Starting new HTTP connection (1): readcomiconline.to
DEBUG: http://readcomiconline.to:80 "GET /Comic/Final-Crisis-Sketchbook HTTP/1.1" 503 None
DEBUG: Resetting dropped connection: readcomiconline.to
DEBUG: http://readcomiconline.to:80 "GET /cdn-cgi/l/chk_jschl?jschl_answer=155&jschl_vc=a6edd657f0fbf78449cd7a1287eaf382&pass=1516505208.081-j3ShPBqMDN HTTP/1.1" 302 165
DEBUG: http://readcomiconline.to:80 "GET /Comic/Final-Crisis-Sketchbook HTTP/1.1" 200 None
NajibAdan commented 6 years ago

@Anorov Yes I'm using the latest version of cfscrape.

Anorov commented 6 years ago

I am unable to reproduce this issue with http://kissmanga.com/MangaList, http://readcomiconline.to/Comic/Final-Crisis-Sketchbook, or http://crunchyroll.com/.

@Xonshiz, that connection log appears to be a successful attempt to me, not a failed one. I can see the 503, the challenge answer submission (/cdn-cgi/l/chk_jschl), and the successful redirect to the intended URL, which returned a 200 status. Could you elaborate?

And could one of you please submit a PCAP of a failed scrape attempt? Thanks.

Xonshiz commented 6 years ago

I got the crunchyroll working just fine now.I'll try again with readcomiconline and share the further details.

Xonshiz commented 6 years ago

RCO is working as well. Everything is fine. Thanks!

NajibAdan commented 6 years ago

@Xonshiz how did you get RCO to work? I'm getting 503 on RCO and Kissmanga.

@Anorov here is a link to the PCAP of the failed scrape attempt

Xonshiz commented 6 years ago

When I read @Anorov 's reply, I digged into RCO and seems like it worked. It was some other problem with the scrapping. RCO temp bans you if you hit their site too often. It'll ask you to select captcha and stuff like that which you need to fill via your browser.

NajibAdan commented 6 years ago

RCO temp bans you if you hit their site too often.

Weird. I've only visited their site once to see if I was able to access RCO using a scrapper and got a 503.

Xonshiz commented 6 years ago

Mine is working just fine. There was some issue with my code in parsing the page, that's all. Everything else is working just fine. Maybe share your code and I can test on my side?

NajibAdan commented 6 years ago

The code is simple as it gets

import cfscrape
scraper = cfscrape.create_scraper()
url = [ "http://kissmanga.com/MangaList",
    "http://readcomiconline.to/Comic/Final-Crisis-Sketchbook",
    "http://chruncyroll.com"]
for i in url:
    print scraper.get(i).status_code

which displays this both on my personal computer and my vps:

503
503
200
Xonshiz commented 6 years ago

Just ran the code and mine is giving 200 for all these. Try updating cfscrape again. Or maybe uninstall it and then install it again.

NajibAdan commented 6 years ago

Huh. It finally gave me 200 after re-installing it. Thank for the tip.

Xonshiz commented 6 years ago

Yeah, happens some times because of cache.