18F / g-dataservices

18F's Data Services Guild
0 stars 2 forks source link

Schedule 70 Data Scraper & API #1

Open mheadd opened 5 years ago

mheadd commented 5 years ago

Description of problem

There is no easy way to get access to bulk data on Schedule 70 vendors.

GSA eLibrary schedules and contracts data on data.gov includes a lot more than just schedule 70 and does not identify which contractors are authorized to work with state and local governments. It's also not clear how current this data is, or how often it is updated.

The GSA eLibrary site does allow for data to be downloaded, ostensibly in Excel format. However, the downloaded file appears to be HTML, not .xls. Also, system errors are encountered periodically when attempting to download data in this way.

Data sets or tools needed

Could probably be address with any standard DOM / scraping library in programming language of choice.

Other information

The structure of the HTML in the GSA eLibrary site is pretty gnarly. This might make developing a scraper more complicated.

lauraGgit commented 5 years ago

Anyone interested in this, I have written a scraper of schedule 70 vendors, feel free to ping me for more about it.