Closed Nandit12 closed 4 years ago
It shows this error for some Linkedin profiles. Not able to pinpoint what's the problem. Here's the profile example:- https://www.linkedin.com/in/yunki/
i am getting same problem.
Same! Not sure what the fix is but seems to be related to the driver not closing because of something going on in the company URLs only. I'd like to be able to ignore this error and just continue through the profiles in my profiles file.
Okay I looked into it more....this issue is due to the jobs list sometimes have a list within it that contains all empty strings...an empty string passed to a url is going to throw an error. Figuring out how to fix it now but it should involve adding a "fake" url to the jobs list of lists in order to replace "missing" company urls.
Alright I fixed that particular issue by doing something like this in the scraper.py file. This should actually be incorporated into the script I think.
`def scrape_jobs(self):
try:
jobs = self.browser.execute_script(
"return (function(){ var jobs = []; var els = document.getElementById("
"'experience-section').getElementsByTagName('ul')[0].getElementsByTagName('li'); for (var i=0; "
"i<els.length; i++){ if(els[i].className!='pv-entity__position-group-role-item-fading-timeline'){ "
" if(els[i].getElementsByClassName('pv-entity__position-group-role-item-fading-timeline').length>0){ "
" } else { try { position = els[i].getElementsByClassName("
"'pv-entity__summary-info')[0].getElementsByTagName('h3')[0].innerText; } catch(err) { "
"position = ''; } try { company_name = els[i].getElementsByClassName("
"'pv-entity__summary-info')[0].getElementsByClassName('pv-entity__secondary-title')[0].innerText; "
" } catch (err) { company_name = ''; } try{ date_ranges = els["
"i].getElementsByClassName('pv-entity__summary-info')[0].getElementsByClassName("
"'pv-entity__date-range')[0].getElementsByTagName('span')[1].innerText; } catch (err) {"
"date_ranges = ''; } try{ job_location = els[i].getElementsByClassName("
"'pv-entity__summary-info')[0].getElementsByClassName('pv-entity__location')[0].getElementsByTagName("
"'span')[1].innerText; } catch (err) {job_location = ''; } try{ company_url = "
"els[i].getElementsByTagName('a')[0].href; } catch (err) {company_url = ''; } jobs.push("
"[position, company_name, company_url, date_ranges, job_location]); } } } return jobs; })();")
except WebDriverException:
jobs = []
clean_jobs = []
for job in jobs:
if job[2] != '':
clean_jobs.append(job)
elif job[2] == '':
clean_jobs.append(['Fake', 'Fake', '[URL OF YOUR PREFERED COMPANY]/', '', ''])
parsed_jobs = []
for job in clean_jobs:
company_industry, company_employees = self.scrape_company_details(str(job[2]))
parsed_jobs.append(
Job(
position=job[0],
company=Company(
name=job[1],
industry=company_industry,
employees=company_employees,
),
location=Location(job[4]),
date_range=job[3]
)
)
return parsed_jobs`
@gtaybro Applied these changes but still getting error
hey @Nandit12 that is due to the fact that some people don't have ANY jobs listed on a profile. I fixed this on my code like this:
` clean_jobs = []
if len(jobs) == 0:
clean_jobs.append(['Fake', 'Fake', 'https://www.linkedin.com/company/linkedin/', '', ''])
else:
for job in jobs:
if job[2] != '':
clean_jobs.append(job)
elif job[2] == '':
clean_jobs.append(['Fake', 'Fake', 'https://www.linkedin.com/company/linkedin/', '', ''])`
@gtaybro Thanks mate its working now. Here's the changes made to the Scraper.py file :-