ArshKA / LinkedIn-Job-Scraper

LinkedIn scraper to retrieve and store a live stream of job postings
68 stars 20 forks source link
jobs jobsearch linkedin linkedin-api scraper webscraping

LinkedIn Job Scraper

Program to scrape and store a constant stream of LinkedIn job postings and dozens of their respective attributes

Download the polished dataset and view insights at - https://www.kaggle.com/datasets/arshkon/linkedin-job-postings

User Configurations

Required

Running

This program consists of 2 main scripts, running in parallel.

python search_retriever.py - discovers new job postings and insert the most recent IDs and minimal attributes into the database

python details_retriever.py - populates tables with complete job attributes

It's important to note that while search_retriever.py typically runs smoothly, even through your personal IP and a singular account, details_retriever.py can be a bit finicky. Each search generates approximately 25-50 results, all of which must be individually queried to obtain their attributes. To enhance its performance, I recommend the following strategies:

Converting Database to CSV

python to_csv.py --folder <destination folder> --database <linkedin_jobs.db>

Creates a CSV file for each database, along with minimal preprocessing

Database Structure

You can find the structure of the database here