Blurbs are still being retrieved for filtered out jobs

bunsenmurder commented 4 years ago

Description

Currently the scraper is still retrieving blurbs for jobs that have been filtered out by the _prefilter method.

Please include a summary of the issue. Please include the steps to reproduce. List any additional libraries that are affected.

Steps to Reproduce

Run JobFunnel under any query and make sure the results are saved to a directory without a master_list.csv or duplicate_list.csv file.
Run the scraper again and take the note of the amount unique jobs found by the _prefilter, then count the amount of individual jobs that are being scraped. You should notice that they don't match.

Expected behavior

The scraper should remove jobs identified by the by the _prefilter, and only obtain blurbs for the remaining jobs.

Actual behavior

The scraper retrieves blurbs for all jobs whether they were filtered out or not.

To fix the issue, the order of the creation of the _scrapelist and call to the _prefilter method would have to be switched. The screenshot below highlights the issue within the code and the debugger output :

Although this could've of been fixed in a pull request, making this fix would break _datefilter called by the _prefilter method in the main JobFunnel class.

Environment

Build: Master 0a246cb71329e076f7301620000f952cb2867c47
Operating system and version: Arch Linux
[Linux] Desktop Environment and/or Window Manager: Gnome

PaulMcInnis commented 4 years ago

thank-you for the detailed write-up!

(looks like it's time to do some more thorough code review in the codebase)

PaulMcInnis commented 4 years ago

ah oops should have done this before I drafted a release just now. Need to fix this and some other behaviour issues and up the sub-rev.

bunsenmurder commented 4 years ago

Perfect timing actually, I was gonna make a pull request with some fixes I made.

PaulMcInnis commented 4 years ago

ah nice! glad to hear it!

Feel free to up the rev to 2.1.9

PaulMcInnis / JobFunnel