Closed PaulMcInnis closed 4 years ago
Hey @PaulMcInnis are you still getting this issue? I was unable to reproduce it.
I just checked now on current master with default settings.yaml
, was unable to repro - I'll close this issue and cut a new release.
Description
Hey everyone, I was gonna cut us a new release, but I noticed an issue:
Currently you get blurbs like below for all jobs scraped from Monster with
GlassdoorStatic
scraper:Since glassdoor runs first most of the duplicates are in other job sites and as a result most of the jobs I scrape now have blurbs like the one above.
I was just checking out
GlassDoorDynamic
and it seems to work well but it misses the date and blurb fields for jobs. As a side note, watching the browser windows go by made me feel like I was in the matrix 😎Perhaps it is easier for us to purge some of the CSS from these blurbs in the
GlassDoorStatic
scraper with a regex for longest string in the raw scrape? Open to suggestions.Alternatively we could just:
Steps to Reproduce
Easily replicable on the stock YAMl on current master with command
funnel -kw Engineer
, resulting./search/masterlist.csv
will contain aforementioned results.Expected behavior
blurb should not contain CSS
Actual behavior
blurb contains CSS.
Environment
b30b28453a0f3528095166d0cbbe871726929b64