Created functionality to name multiple collections in the script: a check_collection to check for fuzzy name matches against, a dump_collection to dump the new IRS services in and a dupe_collection to dump any duplicates that are found when running.
Added a step early on to delete any services whose EIN is already in the the database.
Added a step to scrape the date of last update on the IRS website and compare to the last update date in our DB
I haven't resolved the case where services have very different names but the same address, as handling potential variability in address strings actually seems like a pretty big job.
check_collection
to check for fuzzy name matches against, adump_collection
to dump the new IRS services in and adupe_collection
to dump any duplicates that are found when running.I haven't resolved the case where services have very different names but the same address, as handling potential variability in address strings actually seems like a pretty big job.