freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
341 stars 98 forks source link

PACER Email Parsing Failure in bankruptcy corner case #577

Open sentry-io[bot] opened 1 year ago

sentry-io[bot] commented 1 year ago

Sentry Issue: COURTLISTENER-38C

ValueError: invalid literal for int() with base 10: 'doc'
  File "django/db/models/fields/__init__.py", line 1823, in get_prep_value
    return int(value)

ValueError: Field 'entry_number' expected a number but got 'doc'.
(12 additional frame(s) were not displayed)
...
  File "django/db/models/sql/query.py", line 1370, in build_filter
    condition = self.build_lookup(lookups, col, value)
  File "django/db/models/sql/query.py", line 1216, in build_lookup
    lookup = lookup_class(lhs, rhs)
  File "django/db/models/lookups.py", line 25, in __init__
    self.rhs = self.get_prep_lookup()
  File "django/db/models/lookups.py", line 77, in get_prep_lookup
    return self.lhs.output_field.get_prep_value(self.rhs)
  File "django/db/models/fields/__init__.py", line 1825, in get_prep_value
    raise e.__class__(
albertisfu commented 1 year ago

Checking this issue.

Seems the document that triggered this error doesn't have a document number even though it's not from an appellate court.

Instead of the document number, it says doc

Screen Shot 2022-10-28 at 10 23 02

Screen Shot 2022-10-28 at 10 16 55

Neither the confirmation download page has a document number. Screen Shot 2022-10-28 at 10 16 34

When we don't have an integer number, should we add the docket entry/recap document as unnumbered?

mlissner commented 1 year ago

Well, this is an interesting one. If you look at the docket report, you see that the item doesn't come up at all:

image

It should be in that space below the last item, but it's not. I called the court to ask about this, and they said that the docket report form should have a checkbox for "Rule 3002.1 Claims Supplement". So, when I generate that report, I should be able to check that box and then get that item to show up. Alas, that checkbox is not visible to regular PACER users like myself. Ok....

So I tried the docket history report, and sure enough it shows up there:

image

And sure enough, the "number" seems to be doc. Greaaat.

I'm not sure how to handle this, really. The entry_number field is a big int, so we can't cram the number in there as is. It's probably not worth changing the field type to be a charfield. We could leave the number off, but that'd make certain people unhappy that we weren't mirroring PACER perfectly.

We could also ignore this since it's a corner case in bankruptcy cases that doesn't even show up on the docket report in PACER unless you're an ECF user. If we take this route I guess we can either ignore the Sentry issue (easiest!) or we could tweak Juriscraper to ignore this error.

albertisfu commented 1 year ago

Interesting. Well if we chose to just ignore this. We can avoid adding the docket entry if we found that the document has a "document number" but is not an integer, so the error won't come up in Sentry for similar cases. Is that ok?

Or just ignore Sentry for now?

mlissner commented 1 year ago

I think we can just ignore it in Sentry, and deprioritize it for now.

sentry-io[bot] commented 1 year ago

Sentry issue: COURTLISTENER-3WX

sentry-io[bot] commented 5 months ago

Sentry issue: COURTLISTENER-638

mlissner commented 5 months ago

I think we opted to just ignore this in Sentry, but if #724 and the Sentry issue above are the same issue, it shows that ignoring this will be annoying. Maybe a quick fix in Juriscraper makes sense.