Docket entry classifier

Headline

FLP's RECAP Archive Categorizes Docket Entries to Help You Find the Needle in the Haystack

What is the Feature?

Build classifiers for docket entry text to label them as, e.g., Complaint, Answer, Pleading (a superset encompassing both Complaint and Answer), Motion, Memorandum, Order, Judgment, Motion for Summary Judgment (a subset of Motion), Claim Construction Order, etc.

These can then become:

search facets
filters on the docket view (just show me the pleadings!)
labels visible on the docket entry, to help you see at a glance if it's what you want
(those labels could be localized to other languages to help users whose first language is not English)

What Problem Might it Solve?

A lot of dockets are hella long, hundreds or thousands of entries across many pages. This stinks if you're only interested in a subset of the documents. Today, you have to either read a zillion docket entries, or ctrl-F your way through each page—and that may not work because of variation across courts. If we classified docket entries, then users could filter on our labels instead, eliminating the inaccurate text searches.

Describe a Scenario in Which the Feature Might be Used

As a lawyer, I'm working on a summary judgment motion in a patent case. I know my opposing party has been in some other cases, and I want to see what arguments they raised in cases that reached the summary judgment phase. I can find the cases, but those dockets are crazy long.

Enter docket entry classification! Now, I can filter by label and easily get to all the summary judgment motions and orders.

Technical Requirements

This is medium hard at least! We'd need to come up with (or adopt) a decent controlled vocabulary for our labels, and then train models to recognize them across the corpus of RECAP docket entries.
If this is supervised ML, we need to produce a solid training set.
If we used LLMs, it could take much less time, but we'd need to test a lot to gain confidence, and it would be computationally and financially expensive because of how many docket entries we have

Existing Systems or Alternatives?

Back in the day, I did this with a rules engine that evaluated zillions of regexes for proto-Lex Machina. It was hard because there was so much variation in wording across courts, which made it complex and brittle. However, at least while I was there, it outperformed what PhD students were able to do with ML. But that was a long time ago, and a lot has changed!

Any Additional Information?

This is also important because it's an enabler for other things:

More targeted alerts (alert only on new pleadings; never alert on pro hac vice materials; etc.)
Targeted further treatment of important docs
Domain-specific research

freelawproject / foresight