Leagify / prospect-scraper-dt2021

Scraping draft prospects for fun and charts.
1 stars 9 forks source link

htmlagilitypack - Operation times out in GitPod #46

Open zo0o0ot opened 3 years ago

zo0o0ot commented 3 years ago
Unhandled exception. System.Net.WebException: The operation has timed out.
   at System.Net.HttpWebRequest.GetResponse()
   at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds)
   at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds)
   at HtmlAgilityPack.HtmlWeb.Load(Uri uri, String method)
   at HtmlAgilityPack.HtmlWeb.Load(String url, String method)
   at HtmlAgilityPack.HtmlWeb.Load(String url)
   at prospectScraper.ProspectScraper.RunTheMockDraft(Boolean parseDate) in /workspace/prospect-scraper-dt2021/src/prospectScraper/ProspectScraper.cs:line 89
   at prospectScraper.Program.Main(String[] args) in /workspace/prospect-scraper-dt2021/src/prospectScraper/Program.cs:line 42

Attempted to run a scrape for a new mock draft, but both the big board and mock draft had this issue tonight. I hope it gets resolved magically, but if not, I'll need to figure out why stuff would magically start timing out when the web page loads fine in a browser.

zo0o0ot commented 3 years ago

Looks like it was a false alarm. I got it working a bit later and have a PR up with #47

zo0o0ot commented 3 years ago

The issue has returned. I'm not sure of the cause of this intermittent issue of timeouts.

zo0o0ot commented 3 years ago

I'm thinking that my GitPod IP was potentially blocked? I downloaded the code locally and used it without an issue.

zo0o0ot commented 3 years ago

Potential solution: google cache?

Example: http://webcache.googleusercontent.com/search?q=cache:https://www.drafttek.com/2021-NFL-Draft-Big-Board/Top-NFL-Draft-Prospects-2021-Page-1.asp

Text only version: http://webcache.googleusercontent.com/search?q=cache:https://www.drafttek.com/2021-NFL-Draft-Big-Board/Top-NFL-Draft-Prospects-2021-Page-1.asp&strip=1&vwsrc=0

zo0o0ot commented 3 years ago

Potentially relevant StackOverflow:

https://stackoverflow.com/questions/48851890/how-to-use-a-proxy-with-in-htmlagilitypack/48853171

https://stackoverflow.com/questions/12099538/using-a-proxy-with-htmlagilitypack

zo0o0ot commented 3 years ago

Opened discussion with GitPod: https://community.gitpod.io/t/scrapysharp-htmlagility-web-call-times-out-in-gitpod/2480/4