PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.85k stars 215 forks source link

Search term specification #42

Closed markkvdb closed 4 years ago

markkvdb commented 4 years ago

First of, great idea which could definitely be useful for a lot of people!

Trying to use the application does raise a few questions for me though. From the README and the demo it is unclear to me what type of search terms I can use. The demo provides the province, city, domain and radius for the region search term.

To be concrete:

markkvdb commented 4 years ago

Upon closer inspection of the code it seems that domain is the domain address, e.g., ca for Canada, fr for France etc.

I don't know if you tested your code for a different domain, e.g. fr because the search specifications included in the search term differ slightly by country. Therefore, hard-coding a province attribute in the code for Indeed brings some trouble since the province attribute does not seem to be supported by the French website.

A solution requires further investigation into the differences in allowed "search specifications" but hard-coding the province attribute does not seem to work if you want to allow different domains.

If you want, I could help with finding a solution?

PaulMcInnis commented 4 years ago

Very good observations, there seems to be more that we need to do to properly support different domains/internationalization.

I would be very grateful for any help with this. 👍

markkvdb commented 4 years ago

I’ll have some time later this week to work on it. I’ll update you on any progress.

markkvdb commented 4 years ago

Quick note: I looked in the API for the search engines of the different job providers.

It seems that Glassdoor will not give major problems. The code requires relatively little adaptions to incorporate many countries.

The Indeed API is also mostly fine, we just have to distinguish countries with explicit region attributes such as the US with states and Canada with provinces.

However, Monster is rather monstrous... I'm from the Netherlands and Monster does not use the Monster.nl domain but instead uses Monsterboard with a slightly different API. Similarly, the French website does not allow for monster.fr/jobs/search/ kind of urls. Instead we need to use monster.fr/vacances/rechecher/. I suspect the same holds for other countries. ]

Therefore, I might first implement the code for Indeed and Glassdoor and consider the case for Monster later. We could for example exclude the Monster database when we use countries with non-English native languages.

@PaulMcInnis I think we need to think of a way to make the settings.yaml more universal so that we do not have to manually add or remove attributes based on the region. Instead, we might have to add logic to the parse_config based on the different localisations. What are you thoughts about this?

PaulMcInnis commented 4 years ago

I agree on doing stuff under the hood based on a locale. This is much better than providing a laundry list of settings.

Looks like monster is a bit of a special case, agree on leaving that for another day.

PaulMcInnis commented 4 years ago

Perhaps we can do a postal code lookup and calculate the rest?

markkvdb commented 4 years ago

I think we can cover a lot of countries by just having a ‘location’ field. For the US and Canada we could just add the state or province after a comma. This would probably work for Indeed and Glassdoor.

Few last remarks before going to bed (it’s late enough here haha):

PaulMcInnis commented 4 years ago

Closing because I have added some internationalization support to the engine, with improved search specifications