austinoboyle / scrape-linkedin-selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
MIT License
454 stars 162 forks source link

Firefox detected that the page redirected in a way that would never complete #13

Closed bepetersn closed 6 years ago

bepetersn commented 6 years ago

I really appreciate your work. After using this code with much success for about a week, I starting getting an error message something like this:

File "/var/www/project/app/scripts/env/src/scrape-linkedin/scrape_linkedin/ProfileScraper.py", line 21, in scrape self.load_profile_page(url, user) File "/var/www/project/app/scripts/env/src/scrape-linkedin/scrape_linkedin/ProfileScraper.py", line 37, in load_profile_page self.driver.get(url) File "/var/www/project/app/scripts/env/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 326, in get self.execute(Command.GET, {'url': url}) File "/var/www/project/app/scripts/env/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in execute self.error_handler.check_response(response) File "/var/www/project/app/scripts/env/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror?e=redirectLoop&u=https%3A//www.linkedin.com/in/ablekh&c=UTF-8&f=regular&d=Firefox%20has%20detected%20that%20the%20server%20is%20redirecting%20the%20request%20for%20this%20address%20in%20a%20way%20that%20will%20never%20complete.

I did a lot of stuff trying to fix this error, thinking LinkedIn had black-listed my IP address... I still think that's what happened. It's not exactly clear to me what was the right mixture of steps, but I did solve this issue nonetheless. I wonder if you are interested in seeing my solution to this issue. I would have to separate out my app-specific stuff from the scraping-specific code.

austinoboyle commented 6 years ago

I haven't experienced this myself, but I'd love to hear more details about the problem and how to reproduce/fix it. Are you sure this wasn't just an issue of an invalid LI_AT cookie? On Chrome, you will receive an error saying the browser 'redirected you too many times' when the session cookie has expired.

bepetersn commented 6 years ago

Oh, wow. I did try to refresh the session cookie when I got an error originally. I remember still not being able to scrape a page. I guess I basically got the scraper working without knowing the cookie ahead of time...

On Mon, Jun 18, 2018 at 8:37 AM, austinoboyle notifications@github.com wrote:

I haven't experienced this myself, but I'd love to hear more details about the problem and how to reproduce/fix it. Are you sure this wasn't just an issue of an invalid LI_AT cookie? On Chrome, you will receive an error saying the browser 'redirected you too many times' when the session cookie has expired.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/austinoboyle/scrape-linkedin-selenium/issues/13#issuecomment-398076349, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUzl5w-yOvYRj3sNipKnUEh1x-tthdfks5t97sOgaJpZM4Uqp1k .

-- Brian Peterson