Bunsly / JobSpy

Jobs scraper library for LinkedIn, Indeed, Glassdoor & ZipRecruiter
https://usejobspy.com
MIT License
561 stars 109 forks source link

enh: show html tags in description #84

Closed pippinmole closed 4 months ago

pippinmole commented 5 months ago

This bit of code:

jobs = scrape_jobs(
    site_name=["linkedin"],
    search_term="software engineer",
    location="Texas, US",
    results_wanted=15,
    country_indeed='US'  # only needed for indeed / glassdoor
)

returns a description, but does not contain any \n characters. This means when rendering the text it comes out as one big block of text:

image

cullenwatson commented 5 months ago

linkedin currently does not fetch the description. did you mean another site?

pippinmole commented 5 months ago

Hi, yes I did. My apologies, I meant Indeed.

cullenwatson commented 4 months ago

Should readd html, better integrations for users

pippinmole commented 4 months ago

What do you mean by 'readd html'?

cullenwatson commented 4 months ago

I mean show the html tags so people can embed the jobs within their own sites with the same structure. Right now the code strips the tags so need to readd it

pippinmole commented 4 months ago

Hi,

I'd just like to point out that while the html structure is brilliant, and renders perfectly, it isn't the greatest idea to be using HTML if users of this tool expect to merge the scraped data with other job post sources.

Currently, I throw all job posts in a database, with the description column containing the HTML which has been scraped. I'd also like to allow for companies to submit their own job applications via a web form on my site. Unfortunately, the way it is currently means they have to know html (because the scraped jobs are html). I'd like them to be able to use a markdown editor which is much easier for non-tech individuals to understand.

Is there a possibility to allow an output of markdown?

ZacharyHampton commented 4 months ago

I think having us do the HTML to markdown is quite out of scope.

cullenwatson commented 4 months ago

Yea probably should've just left it as markdown by default. But since I added the HTML, already had the code.