Closed dsc03 closed 5 years ago
Hmmmm, can you show me an example of an error where you were blocked? If they were sending their 999 response code, my guess would be that they block IP address ranges from cloud computing providers like AWS. How aggressive were the scrapes that you were running?
So the error is this:
ValueError: Took too long to load company. Common problems/solutions:
1. Invalid LI_AT value: ensure that yours is correct (they
update frequently)
2. Slow Internet: increase the timeout parameter in the Scraper constructor
I was doing very low volume, no more than 50 companies a day.
Are you sure that the LI_AT value didn't just expire? It has happened to me from time to time
So I just refreshed LI_AT cookie and tried running it, but still am getting blocked.
What's strange is that its not actually refreshing my cookie on LinkedIn.
In fact, I realize its not an issue with the cookie because I stopped using a headless driver. I opened the Chrome console when Selenium was running and saw the following errors when it got to the LinkedIn company page.
I searched the second error on StackOverflow, and it seems to be a CORS issue.
Can you show me the code that produces this error? A CORS error would indicate to me that this is a LinkedIn problem, but it's also possible that the error in the console is not related to the error on the page.
I was just replying to you haha. So I'm not sure what changed, but I'm not experiencing the same issues anymore (even using the same cookie), and while I plan to do some more digging to figure out what was going on, looking back I don't think those errors we're actually related. I'll message you once I figure out what was going on if I think it'll help others in the future!
Thanks for all your help. Really appreciate it.
No problem, if you like my package and find it useful, please give it a star! I'm going to close this issue for now, feel free to re-open if the issue comes back.
Hey Austin,
Thanks for responding to my previous comment. I was able to get the scraper to work on a remote server. However, once I started running it remotely, LinkedIn caught on and started blocking me.
I was wondering if you had any BP for either bypassing this or preventing it?
Currently, I'm running this Chrome headless with the following options:
Let me know if you have tips or suggestions. Anything would be appreciated.
-Daniel