abjer / isds2020

Introduction to Social Data Science 2020 - a summer school course abjer.github.io/isds2020
58 stars 92 forks source link

Problem with exam project #43

Closed Stinth closed 3 years ago

Stinth commented 3 years ago

Hello,

We have slowly begun working on our exam project and have hit a bit of a snag. We are scraping data from the website https://www.gsmarena.com . Up until this point we have been making sure not to stress the servers more than necessary and only made around 40-50 requests over a 2-3 hour period. We are now getting Error code 429 and a screen indicating that we have sent too many requests. However it appears my IP address has been blocked from accessing their website or receiving any response from get requests.

We have contacted their support email and hope to get a response from them in a timely manner. Is there anything else that we can do at this point? Apart from using a VPN, which we at this point don't think is fair if they don't like us scraping their site.

jsr-p commented 3 years ago

hi @Stinth , are you using Selenium or requests? Remember to be transparent whenever you scrape a site. I will not recommend any bypasses to overcome a website blocking your access due to them not wanting to have their site scraped. There probably is not much to do. Maybe you should consider scraping another website?