PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.78k stars 210 forks source link

Add a scraper for german indeed #136

Closed marchbnr closed 2 years ago

marchbnr commented 3 years ago

Add a scraper for german indeed

Description

Implemented a scraper implementation for the german indeed website. The functionality includes a subset of the changes suggested in #132 by @Luckyz7 and the required changes were commented. A different locale name was used to align more closely with iso codes.

Context of change

Please add options that are relevant and mark any boxes that apply.

Type of change

Please mark any boxes that apply.

How Has This Been Tested?

Tested manually using a configuration in the demo directory.

Checklist:

Please mark any boxes that have been completed.

thebigG commented 3 years ago

This is awesome, thanks so much for the contribution! But do you mind adding a test for your new demo file to https://github.com/PaulMcInnis/JobFunnel/blob/master/.github/workflows/ci.yml?

When you add it, github Actions will automatically test your Germany scraper every time there is a push :).

marchbnr commented 3 years ago

This is awesome, thanks so much for the contribution! But do you mind adding a test for your new demo file to https://github.com/PaulMcInnis/JobFunnel/blob/master/.github/workflows/ci.yml?

When you add it, github Actions will automatically test your Germany scraper every time there is a push :).

Thanks for the feedback. I have added a test run in the ci build, but the pipeline already fails due to existing issues.

thebigG commented 3 years ago

Thanks for adding it to the CI. Yes, we have issues in the CI. In fact me and @PaulMcInnis have been talking about this on #133. I thought I had fixed the issue with #134, but sadly it looks like it wasn't fixed completely :(. I'll try to look into it when I get time.

codecov-io commented 3 years ago

Codecov Report

Merging #136 (b818f58) into master (728849f) will increase coverage by 0.28%. The diff coverage is 25.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #136      +/-   ##
==========================================
+ Coverage   36.12%   36.41%   +0.28%     
==========================================
  Files          22       26       +4     
  Lines        1456     1494      +38     
==========================================
+ Hits          526      544      +18     
- Misses        930      950      +20     
Impacted Files Coverage Δ
jobfunnel/backend/jobfunnel.py 0.00% <0.00%> (ø)
jobfunnel/backend/scrapers/base.py 39.39% <0.00%> (+0.88%) :arrow_up:
jobfunnel/backend/scrapers/indeed.py 25.80% <ø> (-1.19%) :arrow_down:
jobfunnel/backend/scrapers/registry.py 100.00% <ø> (ø)
jobfunnel/resources/defaults.py 100.00% <ø> (ø)
jobfunnel/resources/enums.py 100.00% <100.00%> (ø)
jobfunnel/backend/tools/__init__.py 100.00% <0.00%> (ø)
jobfunnel/backend/__init__.py 100.00% <0.00%> (ø)
jobfunnel/__init__.py 100.00% <0.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 446e9e0...b818f58. Read the comment docs.

thebigG commented 3 years ago

I think at least now we won't have job_id issues anymore. What's left is deciding what to do about the error code when we don't find any jobs.

marchbnr commented 3 years ago

Cool, thank you for your changes! I think the status code should not be dependent on the number of results. One possibility to do this differently is by tracking for errors along the way and setting the status code after execution accordingly. However this is not related to the current feature, so I would do this in another pull request, if you agree.

PaulMcInnis commented 3 years ago

also once this goes in we should cut a new release as I think some of our recent issues are resolved by the current master.

PaulMcInnis commented 3 years ago

OK, going to merge this and cut a release with removed brotli encoding for all the other scrapers as well since it seems to be causing issues all around. Was a bit pre-emptive with 3.0.2