Bunsly / JobSpy

Jobs scraper library for LinkedIn, Indeed, Glassdoor & ZipRecruiter
https://usejobspy.com
MIT License
555 stars 109 forks source link

Indeed scraper - not fetching salary #125

Open muzaT opened 3 months ago

muzaT commented 3 months ago

I tried to scrap all the latest jobs from indeed using specific country and state but seems like it's not scraping latest job. I am sure there are latest jobs reason for that is, I have built an scraper using selenium and it does get the job done but since that is bit slow. I also tried Premium Static Residential proxies, here is the code:

import csv
from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed"],
    location="Dubai",
    results_wanted=50,
    hours_old=24,
    linkedin_fetch_description=True,
    country_indeed='United Arab Emirates'
)

print(f"Found {len(jobs)} jobs")
print(jobs.head())
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False) # to_xlsx

Am I missing something? I tried Linked-in one, it went good but it crashes due to error 429 after scraping couple of pages.

cullenwatson commented 3 months ago

Stay tuned for release later tonight, will fix this

cullenwatson commented 3 months ago

pip install -U python-jobspy

should be good

muzaT commented 3 months ago

Hi, @cullenwatson

Seems like issue still persists, if I try to scrap for past 24 hours it will scrap 1 job. If I do 48 hours, it scrapes 2 jobs and for 72 hours it scrapes 3 jobs. Seems like it is scraping only first job of each page or something similar.

Please, try with location "Dubai" and country "United Arab Emirates" because I tried with US it works fine. Issue is based on location I guess.

Output for past 72 hours:

2024-03-11 10:11:51,320 - JobSpy - INFO - Indeed search page: 1
2024-03-11 10:11:51,679 - JobSpy - INFO - Indeed search page: 2
2024-03-11 10:11:52,010 - JobSpy - INFO - Indeed found no jobs on page: 2
2024-03-11 10:11:52,010 - JobSpy - INFO - Indeed finished scraping
Found 3 jobs
     site  ...                                      ceo_photo_url
0  indeed  ...                                                NaN
1  indeed  ...                                                NaN
2  indeed  ...  https://d2q79iu7y748jz.cloudfront.net/s/_ceoph...

[3 rows x 26 columns]

Process finished with exit code 0

I have latest version of the package (1.1.48).

cullenwatson commented 3 months ago

fixed. try version 1.1.50. I'm getting 1k jobs now for 72 hours in Dubai

muzaT commented 3 months ago

Thanks, @cullenwatson ! It's working now and it's soo fast, really appreciate the work you have done!

This is one issue, it is not scraping the "Salary" from indeed. Previous scraper used to scrap that (Min and Max amount). Although the salary is mentioned on the website. Please try with "Dubai, United Arab Emirates" as country. Once again, thank you for prompt responses and fixes! :)

cullenwatson commented 2 months ago

for some reason the backend api doesnt give salary as often as the frontend method did