Bunsly / JobSpy

Jobs scraper library for LinkedIn, Indeed, Glassdoor & ZipRecruiter
https://usejobspy.com
MIT License
767 stars 142 forks source link

Long running time when deploying #93

Closed phhoang98a closed 8 months ago

phhoang98a commented 8 months ago

I want to create an API to crawl the job. It worked well in local environment but It runs long or even did not return anything when I deployed the API to clouds like Render, Azure. I also tried Flask+Celery in local, but scrape_jobs function does not response anything.

pippinmole commented 8 months ago

It may help to see what the code is that you wrote.

Also, when you say:

or even did not return anything

...is it returning a data frame with 0 items in, or is the data frame null/an error occurs?

phhoang98a commented 8 months ago

@app.route("/job", methods=['POST']) def job(): job_title = request.json['job_title'] country = request.json['country'] location = request.json['location']

jobs: pd.DataFrame = scrape_jobs( site_name=["indeed"], search_term=job_title, location=location, results_wanted=10, country_indeed=country ) return { "job_url": jobs["job_url"].tolist(), "site": jobs["site"].tolist(), "title": jobs["title"].tolist(), "company": jobs["company"].tolist(), "location": jobs["location"].tolist(), "date_posted": jobs["date_posted"].tolist(), }, 200

This is my basic code, I sure the input params are true because It works in local. You can try to create a Flask API and deploy to see.

cullenwatson commented 8 months ago

I'm using this on my own site usejobspy.com as a FastAPI in DigitalOcean and no issues. Indeed has banned a lot of the cloud providers ip's. You need to put a proxy on it.