Bunsly / JobSpy

Jobs scraper library for LinkedIn, Indeed, Glassdoor & ZipRecruiter
https://usejobspy.com
MIT License
555 stars 109 forks source link

rate limiter #112

Closed troy-conte closed 3 months ago

troy-conte commented 4 months ago

Can you make a rate limiter adjustable on the request polling so we can adjust to attempt to avoid being blocked? I also don't see rate limiters on indeed and glassdoor?

cullenwatson commented 4 months ago

glassdoor the problem is the cookies need to be generated on initialization of scraping. it's hardcoded. Indeed has no rate limiting, we have the api key. but it's good addition for linkedin and ziprecruiter in particular and could be used for all the modules. how do you envision the interface for the user?

troy-conte commented 4 months ago

Ok interesting. Ill dig into glassdoor and see if theres a way to do it another way. I'm getting blocked by the api on indeed, too many requests. Even after I've tried to change IP but there really is not point if they know exactly which key is causing the commotion lol. LinkedIn usually works and ziprecruiter I don't find helpful so might not be worth it for just linkedIn. Also it seems you randomly chose a variable rate limit within a frequency band so honestly that is already the best solution right now.

cullenwatson commented 4 months ago

Yea I meant to say there is indeed rate limiting when doing the searches as I haven't transitioned the repo to use the API for the job search. But fetching the descriptions when we use the api key doesn't have rate limits.

If y'all don't care about the easy apply filter and the time range filter for indeed, we can switch to the api.

troy-conte commented 4 months ago

i personally don't use easy apply, I'm more interested in jobs people can't easily apply for. What are the limits to the API? If everyone is using the same key won't they block/limit search requests?

troy-conte commented 4 months ago

glassdoor the problem is the cookies need to be generated on initialization of scraping. it's hardcoded. Indeed has no rate limiting, we have the api key. but it's good addition for linkedin and ziprecruiter in particular and could be used for all the modules. how do you envision the interface for the user?

I think you mean ziprecruiter that was the only one that I found that had cookies hardcoded. Looking to generating new ones...

ZacharyHampton commented 4 months ago

I believe the Indeed API key is the global public key used by everyone on the site, could be wrong here though, haven't checked in a minute.

troy-conte commented 4 months ago

I believe the Indeed API key is the global public key used by everyone on the site, could be wrong here though, haven't checked in a minute.

"on the site" meaning on indeed or jobspy?

ZacharyHampton commented 4 months ago

I believe the Indeed API key is the global public key used by everyone on the site, could be wrong here though, haven't checked in a minute.

"on the site" meaning on indeed or jobspy?

Indeed

cullenwatson commented 3 months ago

closing as I only see issues with LinkedIn and increasing the delay still results in being blocked.