[indeed] better page result parsing

PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

MIT License

1.85k stars 215 forks source link

[indeed] better page result parsing #49

Closed Maddin-619 closed 4 years ago

Maddin-619 commented 4 years ago

This PR checks if there are no results so the None exception is avoided. With germen domain the page result string is 'Seite 1 von 189 Jobs' and the r'f (\d+) ' regex dose not match. I tought that it should be better for international support to prevent language dependent regex.

PaulMcInnis commented 4 years ago

Apologies - we merged some fixes recently - looks like there is a minor conflict now.

studentbrad commented 4 years ago

Something weird happened to this PR. The merge didn't occur as I would expect. Though changes have been made since, we should only see your changes and not others. The number of commits dilutes the actual changes and therefore I cannot review them properly :disappointed:.

Maddin-619 commented 4 years ago

I rebased my feature branch onto the master branch to resolve the conflicts. For review you can just look at my last commit. Basically it is just another regex for page parsing. Merge should be work fine, not my commits will be ignored. Maybe you can squash commits. Sorry next time I will better use merge.

studentbrad commented 4 years ago

Could be too risky merging these changes for the reason stated above. This has happened to me a few times before.

What I have done in the past is merged master with my branch. Next, I used Kdiff3 to find my actual changes including files and lines. Finally, I copied the files with changes to a new branch, forked from master, effectively only showing my actual changes.