MontFerret / ferret

Declarative web scraping
https://www.montferret.dev/
Apache License 2.0
5.75k stars 303 forks source link

Bypass cloudflare #785

Open AgentNemo00 opened 1 year ago

AgentNemo00 commented 1 year ago

How to bypass cloudflare ? I get an 403

itspasindu commented 1 year ago

profile picture How to bypass cloudflare ? I get an 403 View other drafts

There are a few ways to bypass Cloudflare and get around the 403 error.

Method 1: Use a proxy server. A proxy server is a server that acts as an intermediary between your computer and the website you are trying to access. When you use a proxy server, your computer sends the request to the proxy server, which then sends the request to the website. The website then sends the response to the proxy server, which then sends the response to your computer. This can help to bypass Cloudflare's security measures.

Method 2: Use a VPN. A VPN, or virtual private network, is a service that encrypts your traffic and routes it through a server in another location. This can help to bypass Cloudflare's security measures and get around the 403 error.

Method 3: Use a bot. A bot is a software program that can automate tasks on the internet. There are a number of bots that can be used to bypass Cloudflare's security measures.

Method 4: Use a scraper. A scraper is a software program that can be used to extract data from websites. There are a number of scrapers that can be used to bypass Cloudflare's security measures.

Method 5: Use a debugger. A debugger is a software program that can be used to step through the code of a website. This can be helpful in identifying and bypassing Cloudflare's security measures.

It is important to note that using any of these methods to bypass Cloudflare's security measures may violate the website's terms of service.

Chheung commented 1 year ago
  1. If your IP is restricted, try using proxy or VPN
  2. Ensure to mimic a real browser from headless as much as possible with navigator. (i.e. User Agent, webdriver, languages, plugins, notifications ... etc)
  3. If they use some kind of finger printing, try to find a way to bypass it. (Ref: link)
benjiro29 commented 1 year ago

The issue with Cloudflare is that they are heavily into TLS fingerprinting.

The way to bypass it, is to ensure that your TLS is randomized. You can use for instance:

https://github.com/Danny-Dasilva/CycleTLS

This will bypass the TLS fingerprinting that Cloudflare does. And then you use goquery on the result. But that only works on static webpages.