PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.78k stars 210 forks source link

French local #111

Closed markkvdb closed 3 years ago

markkvdb commented 3 years ago

Add support for France

Description

JobFunnel 3.0 makes it easy to add new localisations. This pull request aims to add support for France (in French).

Context of change

Please add options that are relevant and mark any boxes that apply.

Type of change

Please mark any boxes that apply.

How Has This Been Tested?

Testing suit has not been extended as the testing suite is rather limited at this point in time.

Checklist:

Please mark any boxes that have been completed.

markkvdb commented 3 years ago

This pull request is basically an extension of #106. The only supported provider is Indeed so far but I'm working on adding Monster right now. Adding support for French does raise a few issues with encoding of characters, however I was positively surprised how few changes are required to make the scraper work for a different language and country.

PaulMcInnis commented 3 years ago

yeah it's alot easier now :) Making the localisation happen was a big motivation for the ABC impl. Glad to see it paying off.

One thing is that I merged #106 so feel free to rebase this off of master to only incl your own commits on top of that.

codecov-commenter commented 3 years ago

Codecov Report

Merging #111 into master will decrease coverage by 0.17%. The diff coverage is 35.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #111      +/-   ##
==========================================
- Coverage   35.90%   35.72%   -0.18%     
==========================================
  Files          22       22              
  Lines        1412     1447      +35     
==========================================
+ Hits          507      517      +10     
- Misses        905      930      +25     
Impacted Files Coverage Δ
jobfunnel/__main__.py 0.00% <ø> (ø)
jobfunnel/backend/jobfunnel.py 0.00% <0.00%> (ø)
jobfunnel/backend/scrapers/registry.py 100.00% <ø> (ø)
jobfunnel/resources/defaults.py 100.00% <ø> (ø)
jobfunnel/backend/scrapers/indeed.py 26.99% <17.39%> (-1.58%) :arrow_down:
jobfunnel/backend/scrapers/monster.py 27.04% <28.57%> (+0.07%) :arrow_up:
jobfunnel/backend/scrapers/base.py 39.49% <75.00%> (+0.92%) :arrow_up:
jobfunnel/backend/tools/tools.py 29.87% <100.00%> (ø)
jobfunnel/resources/enums.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 27fb84c...dd057e0. Read the comment docs.

markkvdb commented 3 years ago

I believe something went wrong with the rebase... I don't have any experience with rebase. @PaulMcInnis do you have an idea what went wrong?

PaulMcInnis commented 3 years ago

@markkvdb yeah, it's ok I can help.

What you should do is this

  1. git checkout master (should have no uncommitted changes, u can stash if you do)
  2. git pull (get the lastest master)
  3. git checkout french_local
  4. git rebase -i master
  5. in the editor it opens (likely nano), drop all the commits that are from lily, as these are already on master now. You can drop commits by changing the keep to drop or d
  6. git log and review that you only see one set of lilys commits in the history
  7. git push -f force push to update french_local branch with the changed history.
markkvdb commented 3 years ago

Thanks! Learned a few things about git today haha. Monster and Indeed both seem to work. I do have an issue with characters though. French has a bunch of special characters that are not properly handled during the parsing process. E.g., look at this output:

Screenshot 2020-10-02 at 18 44 54
PaulMcInnis commented 3 years ago

veryyy interesting. It would seem we need to change the CSV import/export. Probably the right spot is Job.as_row , it should respect the job's locale and use appropriate encoding.

We may need a lookup based on locale, ideally we could use a single encoding for the entire CSV but this may not be realistic

markkvdb commented 3 years ago

Never mind. JobFunnel correctly encodes the output using UTF-8. Basically, it's just Excel being retarded. Opening the CSV in any other spreadsheet software seems to handle UTF-8 correctly.

With this I would say that French is correctly supported. There might be a few loose ends with handling certain texts that are now in French instead of English, so I will keep looking into that. Furthermore, I will change the integration test for France with fewer job listings.

markkvdb commented 3 years ago

That's the department number, i.e., equivalence of states/provinces in US/Canada. You could also write the full name but these can be rather long sometimes haha