Closed albertisfu closed 1 year ago
Yeah, OK. Seems like we're doing as much as we can without going crazy.
Does CourtListener fail elegantly when we can't extract a number and return ""
?
Does CourtListener fail elegantly when we can't extract a number and return ""?
Yeah, if a document number is not found in a PDF, we fall back to the download confirmation page to get it, if the number can't be found on the download confirmation page either, the docket entry is added without a number.
Well, 4 months ago when this new service was released we thought it was a good idea to fail it loud if a document number was not found in the PDF header so we could check those PDFs and update the regex in case we missed a document number string.
I checked errors on sentry and most of them are related to weird PDF headers like:
CCaassee 2211--22009955,, DDooccuummeenntt 19090, ,0 011/0/044/2/2002233, ,3 3444466266138, ,P Paaggee11 o of f2 2
And there is one where the header doesn't contain a document number at all:Appellate Case: 22-1801 Page: 1 Date Filed: 01/19/2023 Entry ID: 5237514
We currently recognize the following ones, which we have seen so far:
Document:
,Document
,Doc:
,DktEntry:
So, since we haven't found any new document number strings to parse, I changed the logic so that when no document number is found, we just return an empty string instead of failing out loud.
Let me know what you think.