dashee87 / jobbR

R wrapper for the Indeed API
MIT License
29 stars 7 forks source link

jobSearch does not return all job posts, despite all=TRUE #4

Open kradolfer opened 6 years ago

kradolfer commented 6 years ago

Thanks for this great tool, I have been using it a lot. I am having some problems to find all jobs for big countries, it seems that the searches return a maximum of 1025 unique jobs, despite using the all=TRUE option. (the search also returns many duplicated job posts, I filtered them manually for unique jobkeys.)

Below is an example of the searches I did, I got 1025 unique jobs both for the US and for California only.

Thanks a lot for any suggestions on this issue!

Code:

search for all US data science jobs (17'100 on 2/8/2017)

ds_us <-jobSearch(publisher=key, "data+scientist", country = 'us', all=TRUE)

filter for unique jobs (1025 jobs left)

ds_us_unique <- ds_us[!duplicated(ds_us$results.jobkey),]

search for all data science jobs in California (3'975 jobs on 2/8/2017):

ds_ca <-jobSearch(location= 'CA', publisher=key, "data+scientist", country = 'us', all=TRUE)

filter for unique jobs (again 1025 jobs left)

ds_ca_unique <- ds_ca[!duplicated(ds_ca$results.jobkey),]

dashee87 commented 6 years ago

Hi David,

Thanks for pointing this out.

I'm getting the same issue as you. Here's the query I ran:

jobSearch(publisher="mypublisherkey", "data", country = "uk", location = "london",all = T)

It returns a data frame of 34825 rows (the totalResults column has a value of 34811). So far so good. The problem seems to happen at pageNumber 40. Once it hits that page, it just repeats itself for the remainder of the data frame (that's why there's 1025 (41x25) unique jobs).

If I switch to Indeed's web interface, visiting this page should take me to the 2000th result for my Indeed search. It actually just shows the results 990-1000 of 40874. So it appears to be a built in limitation of the Indeed API. I'll send them an email and keep this issue open.