PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.87k stars 217 forks source link
automated beautifulsoup beautifulsoup4 csv glassdoor indeed international job jobs monster python scraper search tfidf waterloo yaml

JobFunnel Banner
Code Coverage

Automated tool for scraping job postings into a .csv file.

Benefits over job search sites:

masterlist.csv

Installation

JobFunnel requires Python 3.11 or later.

pip install git+https://github.com/PaulMcInnis/JobFunnel.git

Usage

By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets.

Configure

You can search for jobs with YAML configuration files or by passing command arguments.

Download the demo settings.yaml by running the below command:

wget https://git.io/JUWeP -O my_settings.yaml

NOTE:

Scrape

Run funnel with your settings YAML to populate your master CSV file with jobs from available providers:

funnel load -s my_settings.yaml

Review

Open the master CSV file and update the per-job status:

Advanced Usage

CAPTCHA

JobFunnel does not solve CAPTCHA. If, while scraping, you receive a Unable to extract jobs from initial search result page:\ error. Then open that url on your browser and solve the CAPTCHA manually.

Developer Guide

For contributors and developers who want to work on JobFunnel, this section will guide you through setting up the development environment and the tools we use to maintain code quality and consistency.

Developer Mode Installation

To get started, install JobFunnel in developer mode. This will install all necessary dependencies, including development tools such as testing, linting, and formatting utilities.

To install JobFunnel in developer mode, use the following command:

pip install -e '.[dev]'

This command not only installs the package in an editable state but also sets up pre-commit hooks for automatic code quality checks.

Pre-Commit Hooks

The following pre-commit hooks are configured to run automatically when you commit changes to ensure the code follows consistent style and quality guidelines:

While the pre-commit package is installed when you run pip install -e '.[dev]', you still need to initialize the hooks by running the following command once:

pre-commit install

How Pre-Commit Hooks Work

The pre-commit hooks will automatically run when you attempt to make a commit. If any formatting issues are found, the hooks will fix them (for Black and isort), or warn you about style violations (for Flake8). This ensures that all committed code meets the project’s quality standards.

You can also manually run the pre-commit hooks at any time with:

pre-commit run --all-files

This is useful to check the entire codebase before committing or as part of a larger code review. Please fix all style guide violations (or provide a reason to ignore) before committing to the repository.

Running Tests

We use pytest to run tests and ensure that the code behaves as expected. Code coverage is automatically generated every time you run the tests.

To run all tests, use the following command:

pytest

This will execute the test suite and automatically generate a code coverage report.

If you want to see a detailed code coverage report, you can run:

pytest --cov-report=term-missing

This will display which lines of code were missed in the test coverage directly in your terminal output.