I've taken an initial stab at putting together a script to handle the IRS csv files. So far I think this will cover every requirement except for checking for duplication in the existing services collection.
The script, irs_scraper.py, first establishes a connection via pymongo. Then, it concatenates the IRS files together and filters for the desired NTEE codes (or subcodes), and it adds the service summary for those codes. There's also a function to add the data to MongoDB, but like I said above, we'll need to check for duplicates first. I'm submitting the PR now in case anyone has strong opinions about the process by which we'll perform that duplication-checking, but I'm happy to take a stab myself in a future PR.
Relevant issue: 5
Hi everybody,
I've taken an initial stab at putting together a script to handle the IRS csv files. So far I think this will cover every requirement except for checking for duplication in the existing services collection.
The script, irs_scraper.py, first establishes a connection via pymongo. Then, it concatenates the IRS files together and filters for the desired NTEE codes (or subcodes), and it adds the service summary for those codes. There's also a function to add the data to MongoDB, but like I said above, we'll need to check for duplicates first. I'm submitting the PR now in case anyone has strong opinions about the process by which we'll perform that duplication-checking, but I'm happy to take a stab myself in a future PR.
Thanks!