TheBlick / blick-scrapper

MIT License
0 stars 0 forks source link

Scrape data off job postings #1

Open Victor-Buendia opened 2 months ago

Victor-Buendia commented 2 months ago

What

Input are URLs:

https://boards.greenhouse.io/placerlabs/jobs/5889594003
https://boards.greenhouse.io/correlationone/jobs/5067265004
https://boards.greenhouse.io/nimblegravity/jobs/4365784005
https://boards.greenhouse.io/securityscorecard/jobs/2201043
https://boards.greenhouse.io/moovx/jobs/4366290005;
https://boards.greenhouse.io/okta/jobs/5849882
https://boards.greenhouse.io/astropay/jobs/4147625007
https://boards.greenhouse.io/integraladscience/jobs/5328370
https://boards.greenhouse.io/yalochatinc/jobs/5773977003
https://boards.greenhouse.io/bitcoincom/jobs/4388949005
https://boards.greenhouse.io/launchpadtechnologiesinc/jobs/4336338006
https://boards.greenhouse.io/truelogic/jobs/7309055002
https://boards.greenhouse.io/truelogic/jobs/7307936002
https://boards.greenhouse.io/truelogic/jobs/7274188002
https://boards.greenhouse.io/truelogic/jobs/7297301002
https://boards.greenhouse.io/moovx/jobs/4034377005;

Using:

Why

DoD

Victor-Buendia commented 2 months ago

job_postings

{'url': 'https://boards.greenhouse.io/launchpadtechnologiesinc/jobs/4336338006',
 'category': 'job_posting',
 'match' : 'direct_match',
 'target_name': 'greenhouse',
 'company_name': 'launchpadtechnologiesinc',
 'job_id': '4336338006',
 'scraped_at'
 'processed_at': 
 'status ',

 'llm_generated_summary',
 'raw_html_content',
 'country',
 'location',
 'company_location',
 }

lista jobs insere no banco

query no banco tipo "select * where processed_at >= last_checkpoint" extrai informações dos jobs insere updates no banco