Closed jsvine closed 1 year ago
Might be a good opportunity to add unittests for those utilities
Amen, issue now created for that: https://github.com/data-liberation-project/aphis-inspection-reports/issues/20
Yep. Since are mucking with a couple now getting them covered in this merge would be good while the expectations and edge cases are fresh in your mind.
Very reasonable! Now added.
Discovered that APHIS appears to occasionally bulk-change the
legalName
for any givencustomerNumber
. Because thesort_key
incorporated thatlegalName
and the scripts merge cached results with newer ones, the previous logic was retaining multiple copies of the same inspection (i.e., for inspections for which thelegalName
changed in the fresh/recent results, but not yet in our historical cache). The good news is that thelegalName
seems to correspond directly tocustomerNumber
, so we can just use that. Also adjustedget_sort_key
to account for other aspects observed in the data, e.g., thatcustomerNumber
is never blank.Also added a data dictionary for the APHIS portal data, with some things learned through figuring out the deduping approach.
... as well as a fix for
add_hash_ids
to prevent errors when adding the IDs to a result set that already has them.