Bunsly / JobSpy

Jobs scraper library for LinkedIn, Indeed, Glassdoor & ZipRecruiter
https://usejobspy.com
MIT License
767 stars 142 forks source link

improvement: Remove pandas dependency #86

Closed pippinmole closed 7 months ago

pippinmole commented 8 months ago

Hi,

I'd like to set up a serverless function which runs every 15 minutes to scrape jobs and put them into a database. However, many serverless providers (like DigitalOcean Functions) have package size restrictions.

It looks like Pandas (which has a dependency of numpy) takes up at least 120MB. All to use pd.DataFrame.

I understand this is quite a big change, so I don't expect anything.

Thanks

cullenwatson commented 8 months ago

We use pandas because it's easy for the end user to export but can understand the size issue. Any reason for not using AWS Lambda which has 250MB limit?

pippinmole commented 8 months ago

Yeah I understand why you'd use it, but its just a shame its such a big package to utilise such little features from said package. Currently I'm trying to get it working on DigitalOcean, which has a size limit (but ironically say that install Pandas by default on all their function builds).

Feel free to close this, it was just a suggestion as I could see this package being used in a serverless manner.