ShelterApp / AddResources

http://shelterapp.org/
11 stars 10 forks source link

add initial irs scraper script #9

Closed hkuffel closed 4 years ago

hkuffel commented 4 years ago

Relevant issue: 5

Hi everybody,

I've taken an initial stab at putting together a script to handle the IRS csv files. So far I think this will cover every requirement except for checking for duplication in the existing services collection.

The script, irs_scraper.py, first establishes a connection via pymongo. Then, it concatenates the IRS files together and filters for the desired NTEE codes (or subcodes), and it adds the service summary for those codes. There's also a function to add the data to MongoDB, but like I said above, we'll need to check for duplicates first. I'm submitting the PR now in case anyone has strong opinions about the process by which we'll perform that duplication-checking, but I'm happy to take a stab myself in a future PR.

Thanks!