federicohaag / LinkedInScraping

Scraping of LinkedIn Profiles: Creates an Excel file containing the personal data and the last job position of all the provided LinkedIn profiles.
119 stars 46 forks source link

Error #8

Closed Nandit12 closed 4 years ago

Nandit12 commented 4 years ago

error

Nandit12 commented 4 years ago

It shows this error for some Linkedin profiles. Not able to pinpoint what's the problem. Here's the profile example:- https://www.linkedin.com/in/yunki/

rangoski commented 4 years ago

i am getting same problem.

gtaybro commented 4 years ago

Same! Not sure what the fix is but seems to be related to the driver not closing because of something going on in the company URLs only. I'd like to be able to ignore this error and just continue through the profiles in my profiles file.

gtaybro commented 4 years ago

Okay I looked into it more....this issue is due to the jobs list sometimes have a list within it that contains all empty strings...an empty string passed to a url is going to throw an error. Figuring out how to fix it now but it should involve adding a "fake" url to the jobs list of lists in order to replace "missing" company urls.

gtaybro commented 4 years ago

Alright I fixed that particular issue by doing something like this in the scraper.py file. This should actually be incorporated into the script I think.

`def scrape_jobs(self):

    try:
        jobs = self.browser.execute_script(
            "return (function(){ var jobs = []; var els = document.getElementById("
            "'experience-section').getElementsByTagName('ul')[0].getElementsByTagName('li'); for (var i=0; "
            "i<els.length; i++){   if(els[i].className!='pv-entity__position-group-role-item-fading-timeline'){   "
            "  if(els[i].getElementsByClassName('pv-entity__position-group-role-item-fading-timeline').length>0){ "
            "     } else {       try {         position = els[i].getElementsByClassName("
            "'pv-entity__summary-info')[0].getElementsByTagName('h3')[0].innerText;       }       catch(err) { "
            "position = ''; }        try {         company_name = els[i].getElementsByClassName("
            "'pv-entity__summary-info')[0].getElementsByClassName('pv-entity__secondary-title')[0].innerText;     "
            "  } catch (err) { company_name = ''; }        try{         date_ranges = els["
            "i].getElementsByClassName('pv-entity__summary-info')[0].getElementsByClassName("
            "'pv-entity__date-range')[0].getElementsByTagName('span')[1].innerText;       } catch (err) {"
            "date_ranges = ''; }        try{         job_location = els[i].getElementsByClassName("
            "'pv-entity__summary-info')[0].getElementsByClassName('pv-entity__location')[0].getElementsByTagName("
            "'span')[1].innerText;       } catch (err) {job_location = ''; }        try{         company_url = "
            "els[i].getElementsByTagName('a')[0].href;       } catch (err) {company_url = ''; }        jobs.push("
            "[position, company_name, company_url, date_ranges, job_location]);     }   } } return jobs; })();")
    except WebDriverException:

        jobs = []

    clean_jobs = []
    for job in jobs: 
        if job[2] != '':
            clean_jobs.append(job)
        elif job[2] == '': 
            clean_jobs.append(['Fake', 'Fake', '[URL OF YOUR PREFERED COMPANY]/', '', ''])

    parsed_jobs = []

    for job in clean_jobs:
        company_industry, company_employees = self.scrape_company_details(str(job[2]))

        parsed_jobs.append(
            Job(
                position=job[0],
                company=Company(
                    name=job[1],
                    industry=company_industry,
                    employees=company_employees,
                ),
                location=Location(job[4]),
                date_range=job[3]
            )
        )

    return parsed_jobs`
Nandit12 commented 4 years ago

@gtaybro Applied these changes but still getting error er

gtaybro commented 4 years ago

hey @Nandit12 that is due to the fact that some people don't have ANY jobs listed on a profile. I fixed this on my code like this:

` clean_jobs = []

    if len(jobs) == 0:
        clean_jobs.append(['Fake', 'Fake', 'https://www.linkedin.com/company/linkedin/', '', ''])
    else: 
        for job in jobs: 
            if job[2] != '':
                clean_jobs.append(job)
            elif job[2] == '': 
                clean_jobs.append(['Fake', 'Fake', 'https://www.linkedin.com/company/linkedin/', '', ''])`
Nandit12 commented 4 years ago

@gtaybro Thanks mate its working now. Here's the changes made to the Scraper.py file :- code