Program to scrape and store a constant stream of LinkedIn job postings and dozens of their respective attributes
Download the polished dataset and view insights at - https://www.kaggle.com/datasets/arshkon/linkedin-job-postings
logins.csv
details_retriever.py
This program consists of 2 main scripts, running in parallel.
python search_retriever.py
- discovers new job postings and insert the most recent IDs and minimal attributes into the database
python details_retriever.py
- populates tables with complete job attributes
It's important to note that while search_retriever.py
typically runs smoothly, even through your personal IP and a singular account, details_retriever.py
can be a bit finicky. Each search generates approximately 25-50 results, all of which must be individually queried to obtain their attributes. To enhance its performance, I recommend the following strategies:
python to_csv.py --folder <destination folder> --database <linkedin_jobs.db>
Creates a CSV file for each database, along with minimal preprocessing