dcts / opensea-scraper

Scrapes nft floor prices and additional information from opensea. Used for https://nftfloorprice.info
MIT License
184 stars 73 forks source link

Can't fetch anymore collections (Opensea cloudflare update) #31

Closed SKreutz closed 2 years ago

SKreutz commented 2 years ago

Hey everybody,

I noticed that opensea increased the number of times it checked my browser when opening it. I then noticed that the script can't fetch anymore collections.

✅ === OpenseaScraper.rankings() === === OpenseaScraper.rankings() === ...fetching 1 pages (= top 100 collections) ...opening url: https://opensea.io/rankings?sortBy=one_day_volume ...🚧 waiting for cloudflare to resolve ...exposing helper functions through script tag ...scrolling to bottom and fetching collections. ...🥳 DONE. Total Collections fetched: 0 scraped 0 collections:

Is this the same for you guys? Does anyone have an idea on how to fix this?

Thanks

khalilsiu commented 2 years ago

Same here.. try looking for solutions

dcts commented 2 years ago

I can confirm the issue. Cloudflare is taking a very long time to resolve. I tested and could observe the following behavior:

Deploying a quickfix soon that works with extended execution time of up to 2 mins (not optimal though). If anyone has ideas how to resolve cloudflare please share.

I found this repo, will test and report soon: https://github.com/JimmyLaurent/cloudflare-scraper

dcts commented 2 years ago

ok cloudflare-scraper repo I tested does not work, it broke since cloudflare started to reload the page (which we also experience): https://github.com/JimmyLaurent/cloudflare-scraper/issues/39#issuecomment-908615480

Unfortunately the waiting for 2 mins somehow also does not work consistently. Cant figure out why for now. Not sure what else to do.

I have to say sometimes cloudflare is getting more rigid and its a temporary thing, so chances are opensea experiences a ton of traffic and thats why cloudflare is more rigid, but we'll have to see if this issue persist long term.

Sorry theres no fix for now :( if anyone has ideas please share!

khalilsiu commented 2 years ago

would disabling timeout on waitForSelector work? the error im getting is waiting for selector.cf-browser-verificationto be hidden failed: timeout 30000ms exceeded - guess they extended at a variable time.. also perhaps should wait for #__next tag to show?

dcts commented 2 years ago

waitForSelector has a default timeout for 30000ms (30 secs). I tried extending to 2 mins but for me it still did not work. I also wanted to wait for the opensea page to appear, but with no luck. You can try around yourself:

// REPLACE
await page.waitForSelector('.cf-browser-verification', {hidden: true});
// WITH
await page.waitForSelector('#__next', {timeout: 120000});

hidden: true means puppeteer will wait for the selector to disappear. This was the logic before: wait until cloudflare class disappears, but we could also just wait for the opensea selector to appear. But as said above, for me it did not work consistently.

SKreutz commented 2 years ago

ok cloudflare-scraper repo I tested does not work, it broke since cloudflare started to reload the page (which we also experience): JimmyLaurent/cloudflare-scraper#39 (comment)

Unfortunately the waiting for 2 mins somehow also does not work consistently. Cant figure out why for now. Not sure what else to do.

I have to say sometimes cloudflare is getting more rigid and its a temporary thing, so chances are opensea experiences a ton of traffic and thats why cloudflare is more rigid, but we'll have to see if this issue persist long term.

Sorry theres no fix for now :( if anyone has ideas please share!

Thank you for your quick responses. It seems that it works again without changing anything. I think you were right with the assumption that cloudflare changed something because opensea had a lot of traffic. I'll keep this issue open a few more days and I'll report to you if something changed again

dcts commented 2 years ago

Ah great, thanks for the report, its good to know that this issue happens. I know there is a solution to bypass cloudflare but its not that simple to do. They also change stuff a lot so its always a cat and mouse game to keep up with the changes.

Its good that it works again and please do report after you test over the next few days!

khalilsiu commented 2 years ago

the problem is still there when i am using cloud service to run the puppeteer. i see that we are already using the stealth plugin.. :( cloudflare keeps waiting like forever.

SKreutz commented 2 years ago

Did you try using a proxy? If you ran this on a cloud service it's probably just blacklisted by opensea.

I didnt run into anymore problems on Mac. If I try to run the script on Linux (computer in the same network) it stops right here without any errors like it's frozen.

✅ === OpenseaScraper.rankings() === === OpenseaScraper.rankings() === ...fetching 1 pages (= top 100 collections) ...opening url: https://opensea.io/rankings?sortBy=one_day_volume ...🚧 waiting for cloudflare to resolve ...exposing helper functions through script tag ...scrolling to bottom and fetching collections.

I use the same code 1:1 on mac and It worked on Linux a week ago. I'm clueless

dcts commented 2 years ago

@khalilsiu

Which cloud service did you use? I am useing firebase functions (which basically is google cloud) and it works for me. If you want I can share my setup.

dcts commented 2 years ago

@SKreutz can you open another issue for the linux bug?

khalilsiu commented 2 years ago

@khalilsiu

Which cloud service did you use? I am useing firebase functions (which basically is google cloud) and it works for me. If you want I can share my setup.

I'm using compute engine for that, perhaps i should try with firebase in that case I havent tried using a proxy yet.. It would be great if you can share your setup with me @dcts

khalilsiu commented 2 years ago

Did you try using a proxy? If you ran this on a cloud service it's probably just blacklisted by opensea.

I didnt run into anymore problems on Mac. If I try to run the script on Linux (computer in the same network) it stops right here without any errors like it's frozen.

✅ === OpenseaScraper.rankings() === === OpenseaScraper.rankings() === ...fetching 1 pages (= top 100 collections) ...opening url: https://opensea.io/rankings?sortBy=one_day_volume ...🚧 waiting for cloudflare to resolve ...exposing helper functions through script tag ...scrolling to bottom and fetching collections.

I use the same code 1:1 on mac and It worked on Linux a week ago. I'm clueless

Exactly, I took screenshots during the froze and it happens that the waiting page is being redirected to every time, so it is stuck within that waiting room forever.

SKreutz commented 2 years ago

I'm closing this issue since the cloudflare problem is solved and opening a new one