PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.85k stars 215 forks source link

Scraping glassdoor is too slow #18

Closed PaulMcInnis closed 5 years ago

PaulMcInnis commented 5 years ago

Perhaps we should default the configuration to a shallower scrape?

PaulMcInnis commented 5 years ago

it appears that it is scraping each job at a time, takes maybe 10 minutes?

i.e. scraping this took ~.5 second https://www.glassdoor.ca/partner/jobListing.htm?pos=215&ao=4120&s=58&guid=0000016bd3fac9e29da79e3a343d760d&src=GD_JOB_AD&t=SR&extid=1&exst=OL&ist=&ast=OL&vt=w&slr=true&cs=1_ede2ce4c&cb=1562629557735&jobListingId=2829412852

studentbrad commented 5 years ago

I have looked at alternate methods of doing this. It seems once you change 'class="jl"' to 'class="jl selected"' there is a reload of a specific part of the job postings page that loads the blurb. I am looking into where those blurbs come from and if a new method of requesting that data could improve speed.