data-liberation-project / aphis-inspection-reports

Inspection data and PDFs from the USDA's Animal and Plant Health Inspection Service.
13 stars 3 forks source link

Fix fetch logic, preventing removal of old results #4

Closed jsvine closed 1 year ago

jsvine commented 1 year ago

With this update, data/fetched/inspections.csv is now treated as a cache of all previously-identified inspections, and is updated (rather than replaced by) scripts/00-fetch-inspection-list.py. (This was already the logic in scripts/01-refresh-inspection-list.py.)

jsvine commented 1 year ago

As long as you're confident that get_sort_key guarantees uniqueness

Unfortunately, with APHIS' not providing any formal/official inspection ID in the data returned by the search tool, "guarantee" may be too strong an attestation, but (a) it appears to be working so far (# of deduplicated results == # of results stated in search tool interface), and (b) I'm keeping an eye on it.