Closed PaulMcInnis closed 3 years ago
FYI I've put this up before I've re-upped the coverage / fixed the pyenv to make it accessible. Fixing the coverage will take some time, but I don't anticipate making any further large changes to the structure of the codebase.
I reviewed as much as I could but ended up stopping part way as this update seems to break debugging using PyCharm. I was able to run JF normally, but debugging would cause it to stall out indefinitely. The issue seems to stem from the use of properties within this new version; more details about the issue can be found within this thread on Jetbrain's support forum.
Thanks for taking a look guys, I'll be fixing the CLI issues tomorrow, I might need to add some functional testing as well to make sure I've smoke tested this a bit better (in lieu of complete unit testing)
additionally, it seems that pyenv sync
doesn't work with the jobfunnel dependency, not sure what's up with that yet though.
It would also seem that USA_ENGLISH locale is broken for the default settings.yaml, need to look into this.
Merging #90 into master will decrease coverage by
21.50%
. The diff coverage is36.94%
.
@@ Coverage Diff @@
## master #90 +/- ##
===========================================
- Coverage 58.34% 36.83% -21.51%
===========================================
Files 13 22 +9
Lines 1150 1341 +191
===========================================
- Hits 671 494 -177
- Misses 479 847 +368
Impacted Files | Coverage Δ | |
---|---|---|
jobfunnel/__main__.py | 0.00% <0.00%> (-35.90%) |
:arrow_down: |
jobfunnel/backend/jobfunnel.py | 0.00% <0.00%> (ø) |
|
jobfunnel/backend/tools/delay.py | 21.15% <21.15%> (ø) |
|
jobfunnel/backend/tools/filters.py | 21.27% <21.27%> (ø) |
|
jobfunnel/backend/job.py | 26.47% <26.47%> (ø) |
|
jobfunnel/backend/scrapers/monster.py | 28.35% <28.35%> (ø) |
|
jobfunnel/config/manager.py | 29.78% <29.78%> (ø) |
|
jobfunnel/backend/tools/tools.py | 29.87% <29.87%> (ø) |
|
jobfunnel/backend/scrapers/glassdoor.py | 30.14% <30.14%> (ø) |
|
jobfunnel/backend/scrapers/indeed.py | 30.90% <30.90%> (ø) |
|
... and 33 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5275820...cbbd917. Read the comment docs.
Verified that USA_ENGLISH functions for Indeed and Monster, added a locale scrape for USA_ENGLISH
to round things out a bit more on Travis.
having a bit of a time with the CLI vs YAML vs defaults still, Made some progress just now just need to add the defaults injection.
I was having a hard time writing tests for the methods I made, so I've broken it down a bit further. Going to take another crack at it later.
I guess I can see now why most programs allow just YAML or just CLI
The fact that this is so hard to test indicates to me that perhaps we shouldnt let user mix YAML and cli arguments and default values. It gets bogged down in invalid cases and combinational cases...
Perhaps we can make the yaml and cli mutually exclusive?
OK, I'm just working on getting a few final things in, but seperating the CLI out made things alot easier. Finally moving past that mess and added some simple tests to verify It actually works.
OK, I've tested this enough for now.
Master is pretty broken compared to this so I'm going to merge and fix bugs as they come in from now on.
Still TODO: [ ] Inter-scrape duplicates by TFIDF [ ] GlassDoor scraper (webdriven) [ ] more testing
Description
This is version 3.0 of JobFunnel with numerous improvements including:
Job
JobField
andJobFilter
get()
andset()
style of API with configurable priority and delayRemote
andWage
scrapingCerberus
for Schema and validation of YAML configuration filesThis will affect anyone currently developing off of the old branch, as the rebase will be un-tenable. I may need to squash this down a lot more.
If you are reading this, please give this branch a go, I find the easiest non-distruptive way is just to clone this repo as
ABCJobFunnel
and simply runA good place to start is
Issues affected:
85 this resolves our existing improper ABC usage
83 added preemption for job get/set when an attribute fails a check.
79 added warnings and handling for empty search results with TFIDF filter.
60, #45, #37 - added support for localised scrapers with custom URLs and customisation of scraping without needing to copy/re-write lots of code (via inheritance)
Context of change
GraphViz
generation script.Type of change
I have updated all documentation.
Existing master CSV files can be ported by adding missing columns, but it is recommended just to start fresh. Existing cache files and block lists are not compatible, block lists could however be made compatible, this one might be worth pursuing.
How Has This Been Tested?
General monkey testing, but I need to up test coverage to be truly confident that the code quality is there. Would appreciate anyone reading this to just try running it and to try breaking it. Respond here with any bugs you find.
Checklist:
Additional TBD: