data-liberation-project / aphis-inspection-reports

Inspection data and PDFs from the USDA's Animal and Plant Health Inspection Service.
13 stars 3 forks source link

Figure out how to separate the end-of-report notes from inspections' final citations #58

Open jsvine opened 1 year ago

jsvine commented 1 year ago

In data/combined/inspections-citations.csv, narrative for the final citation of each inspection also contains the inspection report's end-of-report text. Those notes are semi-consistent, semi-not. Until we can figure out a reliable way to separate the end-of-report text from the final citation's actual text, I'd rather err on the side of caution (i.e., not stripping anything out). But perhaps we can find a way to consistently separate those parts?

See discussion in https://github.com/data-liberation-project/aphis-inspection-reports/pull/56#issuecomment-1499669469 and the comments that follow.

jsvine commented 1 year ago

Adding a comment here to note that there's some related progress in PR #59