dcts / opensea-scraper

Scrapes nft floor prices and additional information from opensea. Used for https://nftfloorprice.info
MIT License
184 stars 73 forks source link

Not working when deployed (Google Cloud). TimeoutError: waiting for selector `.cf-browser-verification` to be hidden failed: timeout 30000ms exceeded #40

Open mlarcher opened 2 years ago

mlarcher commented 2 years ago

This is what I get running on GCP using offersByScrolling: TimeoutError: waiting for selector ``.cf-browser-verification`` to be hidden failed: timeout 30000ms exceeded It seems it is sometimes working and sometimes failing on this error. Any idea what's happening there ?

dcts commented 2 years ago

Waiting for .cf-browser-verification to be hidden means that you are on the cloudflare page (cf = cloudflare) and within 30 seconds are not being redirected to the actual opensea page. I think most likely opensea is detecting that you run the scraper from a google cloud IP and the cloudflare loop kicks in where it will refresh the page in an endless loop asking you to wait to resolve, which it never does.

I have no way around that currently, deploying scrapers on cloud infrastructure is difficult.

If you (or someone else) has ideas please share, its a very common problem.

One solution that might work but is costly is using a service like bright data (proxy with unblocker API).

mlarcher commented 2 years ago

UPDATE: When running on GCP we now have a less frequent TimeoutError: waiting for selector ``.cf-browser-verification`` to be hidden failed: timeout 30000ms exceeded error, but when we don't have the error we end up with a empty offers list and stats, i.e.:

offers: []
stats: {}

I hope this will be fixed by v7's new approach 🤞

dcts commented 2 years ago

REPORT FROM @mlarcher :

I dug a bit into the code and setup a test case... It seems that on GCP I'm stuck on a page that says

Checking your browser before accessing opensea.io.
This process is automatic. Your browser will redirect to your requested content shortly.

Please allow up to 5 seconds…
DDoS protection by [Cloudflare](https://www.cloudflare.com/5xx-error-landing/)

:(

From what I gathered :

All in all this doesn't seem too good, but not directly related to the current library. Let me know if you have expertise on the matter and know some other way to tackle the problem though :)

dcts commented 2 years ago

Bypassing cloudflare is definately not my expertise. I have tried to solve this problem for some time now, and it is definately possible but as you mentioned its an arms race. I tried these packages: