data-liberation-project / aphis-inspection-reports

Inspection data and PDFs from the USDA's Animal and Plant Health Inspection Service.
14 stars 3 forks source link

Documenting edge-case-y inspection PDFs for parser testing #22

Closed jsvine closed 1 year ago

jsvine commented 1 year ago

As preparation for a more comprehensive parsing of the inspection reports, I think it'll be helpful to document some of the quirks we're seeing in the PDFs. Here's a start:

jsvine commented 1 year ago

Here's another, where the species names span multiple lines: 185280e3821720b9 (uploaded)

Screen Shot
jsvine commented 1 year ago

Another, where the species list is blank, but there's still a "Total" row: ccda727387d4c850 (uploaded)

Screen Shot
jsvine commented 1 year ago

Here's a fun one — "Page {cp} of 1": 22c3072fd5740ef1 (uploaded)

Screen Shot
mbpell commented 1 year ago

A zoo?

jsvine commented 1 year ago

A zoo?

Indeed, lots of zoos in the data!

jsvine commented 1 year ago

Closing this issue since the core related tasks are done, but will pin it for future reference.

jsvine commented 1 year ago

Here's something that looks like a violation heading, but (a) does have an actual statute citation, and (b) appears, on cross-referencing with the web portal metadata, not actually to be a violation that APHIS is counting — 0db69ec135a5b244:

Screenshot