TraffickStop / gather-the-children

0 stars 0 forks source link

Web Scrape Namus.gov #5

Closed sfitzgerald125 closed 3 years ago

sfitzgerald125 commented 3 years ago

Namus.gov has over 10,000 records nationwide. Utah has 118 records. I want to create a script that gathers all the data from Utah and stores the photos in an S3 bucket.

Initial research shows selenium with a headless Chrome driver to be promising.

sfitzgerald125 commented 3 years ago

robots.txt file limits robots to only the about page and contact page. Not sure if there's an API. Either way, I'm thinking 118 records would be fine though?

sfitzgerald125 commented 3 years ago

Github repo of someone's scraping of missing persons sites: https://github.com/jcmack/missingpersons

sfitzgerald125 commented 3 years ago

Time Comment: