Automated tool for scraping job postings into a .csv
file.
JobFunnel requires Python 3.11 or later.
pip install git+https://github.com/PaulMcInnis/JobFunnel.git
By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets.
You can search for jobs with YAML configuration files or by passing command arguments.
Download the demo settings.yaml by running the below command:
wget https://git.io/JUWeP -O my_settings.yaml
NOTE:
It is recommended to provide as few search keywords as possible (i.e. Python
, AI
).
_JobFunnel currently supports CANADA_ENGLISH
, USA_ENGLISH
, UK_ENGLISH
, FRANCE_FRENCH
, and GERMANY_GERMAN
locales._
Run funnel
with your settings YAML to populate your master CSV file with jobs from available providers:
funnel load -s my_settings.yaml
Open the master CSV file and update the per-job status
:
Set to interested
, applied
, interview
or offer
to reflect your progression on the job.
Set to archive
, rejected
or delete
to remove a job from this search. You can review 'blocked' jobs within your block_list_file
.
Automating Searches
JobFunnel can be easily automated to run nightly with crontab
For more information see the crontab document.
Writing your own Scrapers
If you have a job website you'd like to write a scraper for, you are welcome to implement it, Review the Base Scraper for implementation details.
Remote Work
Bypass a frustrating user experience looking for remote work by setting the search parameter remoteness
to match your desired level, i.e. FULLY_REMOTE
.
Adding Support for X Language / Job Website
JobFunnel supports scraping jobs from the same job website across locales & domains. If you are interested in adding support, you may only need to define session headers and domain strings, Review the Base Scraper for further implementation details.
Blocking Companies
Filter undesired companies by adding them to your company_block_list
in your YAML or pass them by command line as -cbl
.
Job Age Filter
You can configure the maximum age of scraped listings (in days) by configuring max_listing_days
.
Reviewing Jobs in Terminal
You can review the job list in the command line:
column -s, -t < master_list.csv | less -#2 -N -S
Respectful Delaying
Respectfully scrape your job posts with our built-in delaying algorithms.
To better understand how to configure delaying, check out this Jupyter Notebook which breaks down the algorithm step by step with code and visualizations.
Recovering Lost Data
JobFunnel can re-build your master CSV from your cache_folder
where all the historic scrape data is located:
funnel --recover
Running by CLI
You can run JobFunnel using CLI only, review the command structure via:
funnel inline -h
JobFunnel does not solve CAPTCHA. If, while scraping, you receive a
Unable to extract jobs from initial search result page:\
error.
Then open that url on your browser and solve the CAPTCHA manually.
For contributors and developers who want to work on JobFunnel, this section will guide you through setting up the development environment and the tools we use to maintain code quality and consistency.
To get started, install JobFunnel in developer mode. This will install all necessary dependencies, including development tools such as testing, linting, and formatting utilities.
To install JobFunnel in developer mode, use the following command:
pip install -e '.[dev]'
This command not only installs the package in an editable state but also sets up pre-commit hooks for automatic code quality checks.
The following pre-commit hooks are configured to run automatically when you commit changes to ensure the code follows consistent style and quality guidelines:
Black
: Automatically formats Python code to ensure consistency.isort
: Sorts and organizes imports according to the Black style.Prettier
: Formats non-Python files such as YAML and JSON.Flake8
: Checks Python code for style guide violations.While the pre-commit package is installed when you run pip install -e '.[dev]'
, you still need to initialize the hooks by running the following command once:
pre-commit install
The pre-commit hooks will automatically run when you attempt to make a commit. If any formatting issues are found, the hooks will fix them (for Black and isort), or warn you about style violations (for Flake8). This ensures that all committed code meets the project’s quality standards.
You can also manually run the pre-commit hooks at any time with:
pre-commit run --all-files
This is useful to check the entire codebase before committing or as part of a larger code review. Please fix all style guide violations (or provide a reason to ignore) before committing to the repository.
We use pytest
to run tests and ensure that the code behaves as expected. Code coverage is automatically generated every time you run the tests.
To run all tests, use the following command:
pytest
This will execute the test suite and automatically generate a code coverage report.
If you want to see a detailed code coverage report, you can run:
pytest --cov-report=term-missing
This will display which lines of code were missed in the test coverage directly in your terminal output.