data-liberation-project / aphis-inspection-reports

Inspection data and PDFs from the USDA's Animal and Plant Health Inspection Service.
13 stars 3 forks source link

Parse report date from inspection PDFs #36

Closed jsvine closed 1 year ago

jsvine commented 1 year ago

The report date is is a distinct field from the inspection date, and can be found at the bottom of each main page:

Screen Shot

I've noticed that some reports are dated quite some time after the inspection took place; it'd be interesting to see the distribution and the outliers.

gcappaert commented 1 year ago

Starting in on this. I've got a decent handle on pdfplumber (which is great, by the way, I have never found a better PDF parser). I'll have something to pull in the next couple of days here.

For testing (and my own reference), one inspection that has a post-dated report: 2016090000786401.

jsvine commented 1 year ago

Thanks for taking this on, @gcappaert! (And thanks for the kind words about pdfplumber!)

jsvine commented 1 year ago

Completed in #55 🎉