PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.81k stars 212 forks source link

Improved search keyword encoding with support for exact phrase #80

Open akifusenet opened 4 years ago

akifusenet commented 4 years ago

Issue Template

Description

For example on indeed when you want to search for an exact phrase (multiple words) as keyword you put this phrase between double quotes.

When I want to use this feature on funnel it removes the double quotes and it returns wrong results.

Steps to Reproduce

  1. Use funnel with multiple word as keywords between double quotes
  2. Example: -kw "Data Distribution Service"

Expected behavior

Normally when you write this keywords on indeed website this is the URL that is generated: https://www.indeed.com/jobs?q=%22data+distribution+service%22&l=Saratoga%2C+CA&radius=25

Actual behavior

But funnel generates this url: getting indeed page 0 : http://www.indeed.com/jobs?q=Data Distribution Service&l=Saratoga%2C+CA&radius=25&limit=50&filter=0&start=0

Environment

*Windows 10 Home

PaulMcInnis commented 4 years ago

Great suggestion for improved usability.

bunsenmurder commented 4 years ago

Hi, @akifusenet, I just added a commit that should fix this issue. Could you pull the latest commit and let us know if it fixed the problem?

PaulMcInnis commented 3 years ago

Assigning to myself because I need to port this fix to new master

PaulMcInnis commented 3 years ago

I am thinking it might be wiser if we provide a search config parameter such as --exact-match

markkvdb commented 3 years ago

I think the simplest way would be to split the search url into two parts: stem_url and arguments. The stem url would contain everything up to arguments, e.g., https://monster.com/jobs/search/ while arguments contains all things like ?q=%22data+distribution+service%22.

The latter can be simplified and clarified by using the urllib.parse.urlencode in which you give the arguments as a dictionary. Strings will also be automatically converted to the URI encoding used for URLS.