austinoboyle / scrape-linkedin-selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
MIT License
462 stars 164 forks source link

Profile Scraper Failing to Scroll Down #89

Closed hp2500 closed 3 years ago

hp2500 commented 3 years ago

Hi there,

I am currently using the profile scraper and it worked well up until a few days ago on two different machines. However, since then all calls have been failing, because chromedriver fails to scroll down and I get the typical error message.

Took too long to load profile. Common problems/solutions:

  1. Invalid LI_AT value: ensure that yours is correct (they update frequently)
  2. Slow Internet: increase the time out parameter in the Scraper constructor
  3. Invalid e-mail address (or user does not allow e-mail scrapes) on scrape_by_email call

The scraper still navigates to the correct profiles, but since it doesn't scroll to the bottom of the page no information can be retrieved. Is there an easy fix for this? Maybe something changed about the linkedin page structure? Guidance would be much appreciated.

senihucar commented 3 years ago

Try to increase timeout..

timeout {float}: default time to wait for async content to load default: 10

    self.driver.get(url)
    # Wait for page to load dynamically via javascript
    try:
        myElem = WebDriverWait(self.driver, self.timeout).until(AnyEC(
            EC.presence_of_element_located(
                (By.CSS_SELECTOR, self.MAIN_SELECTOR)),
            EC.presence_of_element_located(
                (By.CSS_SELECTOR, self.ERROR_SELECTOR))
        ))
    except TimeoutException as e:
        raise ValueError(
            """Took too long to load profile.  Common problems/solutions:
            1. Invalid LI_AT value: ensure that yours is correct (they
               update frequently)
            2. Slow Internet: increase the time out parameter in the Scraper
               constructor
            3. Invalid e-mail address (or user does not allow e-mail scrapes) on scrape_by_email call
            """)
vmesel commented 3 years ago

@senihucar this is not a timeout error, the scraper isn't scrolling down at all!

Increased mine to 1000s and same error around here.

hp2500 commented 3 years ago

I can confirm this. I tried different parameter configurations on different computers and with different internet connections. The scraper still fails to scroll down.

aabid-konverge commented 3 years ago

Facing the same issue.The scraper is not able to retrieve any of the profile data.I don't know the exact cause but there might be some changes in css classes/selector in linkedin website.

aabid-konverge commented 3 years ago

The solution for above issue is to change the MAIN_SELECTOR class in ProfileScraper.py on line no. 19 from MAIN_SELECTOR = '.core-rail' to MAIN_SELECTOR = '.scaffold-layout__main'

senihucar commented 3 years ago

@aabid-konverge First of all, thanks! It seems like the CSS selector has been replaced. I adjusted the code, but I still run into the same issue.

Did it fix the issue for you @vmesel?

profile scraper works smoothly after the update. If you setup with Anaconda please check the file directory and update .py files. Company scrapers still not working.

austinoboyle commented 3 years ago

I merged a PR into master that should fix the issue for ProfileScraper, so the live version should work for profiles now. I have yet to look into the CompanyScraper, but I suspect the fix should be quite similar, if not identical.

austinoboyle commented 3 years ago

Actually, it looks like the Company page structure has changed fairly substantially away from what the Scraper currently supports. Filing a new issue for that. I won't have much time to work on it, but pull requests are always very welcome.