Improve LinkedIn scraper robustness

Bunsly / JobSpy

Jobs scraper library for LinkedIn, Indeed, Glassdoor & ZipRecruiter

https://usejobspy.com

MIT License

555 stars 109 forks source link

Improve LinkedIn scraper robustness #144

Closed lluissalord closed 3 weeks ago

lluissalord commented 2 months ago

Reviewing some posts about scraping LinkedIn Jobs I found that the same data extracted from https://www.linkedin.com/jobs/view/<JOB_ID> is coming from https://www.linkedin.com/jobs-guest/jobs/api/jobPosting/<JOB_ID>.

Then, it makes me think that maybe using this second URL could make the scraper more robust to not get blocked by LinkedIn. I haven't tried, but it could be interesting to make a stress test to see if it has better behavior.

cullenwatson commented 1 month ago

Should be able test just call each endpoint x times and see which gets 429 first. However, we get 429 quickly with jobs search page and it's same api format as that endpoint