PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.78k stars 210 forks source link

Cannot Use Common Search Operator "-" to Block Keywords or Phrases #122

Closed FallenSpaces closed 2 years ago

FallenSpaces commented 3 years ago

Description

When using Indeed.com search box, you can add the dash to remove keywords or phrases.

For example: "Administrative Assistant" -"Master's Degree"

This will look for any Administrative Assistant jobs that do not include the words "Master's Degree" in the listing. This is very helpful to cut down on job listings to sift through.

However, when using JobFunnel, there is no option to do this, so I attempted to simply include it before the keywords as normal.

I received this error:

\lib\site-packages\yaml\scanner.py", line 258, in fetch_more_tokens raise ScannerError("while scanning for the next token", None, yaml.scanner.ScannerError: while scanning for the next token found character '\t' that cannot start any token in "settings.yaml", line 37, column 1

Environment

thebigG commented 3 years ago

That looks like an issue with JobFunnel parsing the YAML config file. So what I recommend is posting the entire yaml configuration file you are using in a comment on this thread. My gut tells me you have a tab somewhere in your configuration file. YAML does not play well with tabs last I checked.

In regards to this indeed feature; typing keywords into any site's(Indeed, GlassDoor, etc) search box, is not the same thing as adding keywords to your config file. They achieve the same thing, but the search box is in the frontend. Who knows what goes on in the frontend. JobFunnel is building a link with the keywords you give it and sending a request. Not the same thing as you typing words into the search box. There might be some JS in there that does voodoo magic with the string that has the "-" operator attached to it. I can't confirm that because I've never used the feature, but that's my best guess.

This would be an interesting feature to implement, but I fear it adding unnecessary complexity to JobFunnel. I say this because I have a feeling this might be an indeed-only feature and it might not be worth the trouble. One thing we could do to implement a similar feature is bring all the jobs down and have some "exclude keywords" key in the YAML file and filter all of the jobs against that. Not sure how feasible this is, just something to think about.

Anyway hope this helps!

Cheers!

PaulMcInnis commented 3 years ago

we could implement this, similar to the flag we added earlier to enforce the presence of all keywords, it just requires constructing the query URL with respect to that website's encoding.

I think this would be worth doing, even if it puts the onus on us to maintain a bit more for it. I would use this feature.

FallenSpaces commented 3 years ago

I have reinstalled to post YAML config which is here. This time I downloaded with wget instead of using copy/paste, and only changed things slightly, so as not to disturb the YAML. Same error though, even with a normal search term. I'm also using VirtualEnv if that makes a difference.

Also, I checked Google, Google jobs, and Indeed, and they all use straightforward link-building when using the dash to filter, but Monster does not have the option to filter even with a gui. Here's an indeed link with a couple filters included as an example: https://www.indeed.com/jobs?q=admin -"Master's Degree" -Executive&vjk=467cc14b42dafd01

+1 for implementing, because I consider it a required feature when it comes to job scraping. It can really help save time when your normal search term brings up a lot of listings.

Thanks for the help! No pressure on this issue, as I'm getting a lot more hits on Google Jobs!

PaulMcInnis commented 2 years ago

Marking duplicate #80