freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
341 stars 98 forks source link

Support parsing `short_description` for Bankruptcy Multi Docket NEF #914

Open sentry-io[bot] opened 5 months ago

sentry-io[bot] commented 5 months ago

Sentry Issue: COURTLISTENER-6HZ

Not parsing description for Bankruptcy Multi Docket NEF for court 'paeb'

Related to parsing docket_entry.short_description for bankruptcy courts #912 . We have no method for parsing short descriptions when we deal with multi docket NEFs. I wrote a logger.error call for this and now we have some examples.

Filed by @grossir

sentry-io[bot] commented 4 months ago

Another instance of this...

Sentry Issue: COURTLISTENER-6Q8

Filed by @mlissner

sentry-io[bot] commented 4 months ago

And another...

Sentry Issue: COURTLISTENER-6Q7

sentry-io[bot] commented 4 months ago

Sentry Issue: COURTLISTENER-6QV

sentry-io[bot] commented 2 months ago

Sentry Issue: COURTLISTENER-71E

For mdb Linked by @grossir

grossir commented 2 months ago

Taking advantage of the fact that we can upload images, I will document the problem with the multi dockets here Usually, we extract the short_description from a NEF by splitting the email Subject using the docket's case name and number in various combinations. However, when we have more than 1 docket, we may have to try out different splits:

Uses 2nd docket's case name and docket number: image

Uses 1st docket's number, but a partial case name image

Uses 1st docket's number and case name image

Uses 2nd docket's number, and partial case name image

mlissner commented 2 months ago

Looks like this might not be possible in a reliable way or like it just might not be worth it.

grossir commented 2 months ago

I think I found a good enough way, which is basically the old parsing inside a for loop and some extra checks. Doesn't break anything (from the tests). Can you review it? maybe it is good enough

grossir commented 1 month ago

@mlissner I created a new PR to address only multi dockets on njb.

Luckily, njb NEFs follow an easy format that only uses the case_name to split the short description from the subject

https://github.com/freelawproject/juriscraper/pull/1032

sentry-io[bot] commented 3 weeks ago

Sentry Issue: COURTLISTENER-7GB