amarvin / fantasy-football-bot

Automate playing Yahoo Fantasy Football
MIT License
45 stars 9 forks source link

Expedite scraper #58

Closed amarvin closed 1 year ago

amarvin commented 1 year ago

It looks like Yahoo adjusted their web scrapper throttling, so ffbot.scrape() is pretty slow/stops. I found that raising the time between requests from 0.6 sec to 1 sec works, but am looking for a better fix.

amarvin commented 1 year ago

This required very long pauses (>1 min) during scraping to be successful. I've been looking for entirely different scraping approaches that include features to avoid bot detection/ban (e.g. scrapy https://stackoverflow.com/a/35133929/6068036).

amarvin commented 1 year ago

Hmm, Yahoo seems to have relaxed their bot detection and I don't see issues with ffbot.scrape().

amarvin commented 1 year ago

Yahoo blocked me again today, but not when I added the headers in #63. The scraper was expedited from 10 min down to 5 min.

theaprilhare commented 1 year ago

The addition of the Accept: text/html header seems to break the scraper for me.

For example, with that header, I get the following error when calling ffbot.current_week():

Traceback (most recent call last):  
  File "<stdin>", line 1, in <module>  
  File "/home/raghav/.local/lib/python3.10/site-packages/ffbot/scraper.py", line 183, in current_week  
    week = span.text.split()[1]  
AttributeError: 'NoneType' object has no attribute 'text'

Checking the contents of span shows that it is, in fact, None.

The other two added headers (Accept-Encoding: gzip, deflate, br and Accept-Language: en-US) don't break the scraper and result in span having the expected contents, but they do lead to rate limiting in the form of a long pause every 70-100 requests or so, leading to a full scrape taking about 1.75 hours. I've been able to reduce that to 1.5 by adding a 3 second delay between requests, but obviously the prospect of avoiding throttling entirely is accurate...

amarvin commented 1 year ago

@theaprilhare, thanks for reporting this. Can you open a separate issue for it and mention this one? I ran the scraper today without issue, so maybe there's something different between us.

ramGoli commented 1 year ago

I get the same error as @theaprilhare .