Build a citator for cases

mlissner commented 7 months ago

Just got another request from a client to build a sepharizing tool that can flag or score cases as good or bad law. We've talked about this a lot, and considered approaches like regexes for easy stuff + ML for hard. I think AI probably would be a useful tool here too.

Some thing we'll need to do to get this started:

We need a target. What would we want this to look like and what features should it have?
We need an ontology for our flags. Which flags make sense?
We need a name for the feature. Prior art includes "Sepharizer" and FastCase's "Bad Law Bot."

For now this is something of a placeholder, but I certainly welcome discussion and ideas. It'd be a heck of a project for an intrepid developer to pick up and run with, but I suspect the way to do it correctly is to have a dedicated engineer on it for many months.

I'd expect this to be a duplicate, since we've talked about it so many times, but I can't find any issue for this. Hrm.

anseljh commented 1 month ago

Here's my notes dump! I may come back to add more to this later.

Functions of a Citator (what it does)

Must identify forward citations (which authorities have cited a given authority)
May categorize citations on 1 or more dimensions
- character of citation (e.g., positive, negative, overruled, etc.) (researcher's question: "Is it good law?")
  - Lots of nuance here. See Kirschenfeld, Yellow Flag Fever (2016).
  - Lots of opportunities for user misunderstanding.
- depth of treatment of a citation (brief to in-depth; star rating or gauge)

Attributes of a Citator (what it is/has)

Citation graph
- Implies citation extraction
- Implies citation normalization and linking (see Entity Linking)
Vocabulary for its categorizations
- e.g., yellow/red flags, overruling risk, overruled, etc. for character
- e.g., 4-star rating for depth of treatment
- e.g., 2nd-order slave case (cites a slave case)
Vocabulary for confidence levels
Trust
Name
Documentation

Existing and Previous Citators

1873–present: Shepard’s Citations by LexisNexis
- Decisis 2021 (owned by RELX, parent of LexisNexis)
1997–present: KeyCite by Thomson Reuters
- 2014–2023: Casetext (acquired by Thomson Reuters 2023)
  - 2014–2016: WeCite
  - 2019: SmartCite
2004–present: Fastcase (now vLex)
- Authority Check 2004
- Bad Law Bot 2013
- ?–2015: GlobalCite by Wolters Kluwer/Loislaw (acquired by Fastcase 2015)
- 2020: Judicata (acquired by Fastcase 2020)
- 2011–2021: Casecheck by Casemaker (merged with Fastcase 2021)
- 2024–present: Cert
2009–present: BCite by Bloomberg Law
2014–2020: ROSS (shut down 2020 after lawsuit by Thomson Reuters)
2024–present: Paxton AI: Paxton AI Citator
- Introducing the Paxton AI Citator: Setting New Benchmarks in Legal Research

Quasi-citators

HeinOnline ScholarCheck (lacks treatment)
Google Scholar (lacks treatment)
Free Law Project / CourtListener "Berkeley National Reporter" (lacks treatment) (link)

News items about citators

Datasets helpful for citators

Overruling Dataset (Stanford RegLab) (link)
- By Donna Thayer, Alan deLevie, Pablo Arredondo, Lucia Zheng, Neel Guha, Brandon R. Anderson, Peter Henderson, Daniel Ho
BVA Case Citation Dataset (Stanford RegLab) (link)
- "...we present a dataset based on 324,309 decisions issued by the Board of Veterans’ Appeals (BVA) between 2009 and 2017. The decisions are selected from the universe of BVA appeals because they contain a single legal issue."
CaseHOLD Dataset (Stanford RegLab) (link)
- By Lucia Zheng, Neel Guha, Brandon R. Anderson, Peter Henderson, Daniel Ho
- https://huggingface.co/datasets/casehold/casehold
- https://github.com/reglab/casehold

Potential differentiating features of a new citator

Modular
- allow for addition of new flags or other unforeseen features
- others can extend and improve it themselves
Open:
- peer review
- others can inspect the underlying data and algorithms
  - this improves trust
  - "World's first peer reviewed citator" sounds great
  - peer review will make it better over time
- catalyze a new round of innovation in legal research by lowering barrier to entry (prior major round is over with Lexis's acquisition of Ravel, TR's acquisition of Casetext, and TR's nuking of ROSS)

Example Module Ideas

Slave case flags from Citing Slavery project
Misstated facts flag (analyze statement of facts among trial, appellate, and Supreme Court; find discrepancies)
Content warnings
“Potential Corruption” flag for cases with parties who are connected to judicial financial disclosures
Additional content types (statutes; (open-access) journals; treatises)
Summaries & structured data from Wikipedia or Wikidata for historical/social context and more

mlissner commented 1 month ago

This is great, Ansel! Very, very helpful and lots to go over. Really emphasizes how well-developed this area already is.

It'd be great to think about other ways to make something even better than the past. For example, Paxton mentions that their citator can group similar cases together. That's an interesting idea. I'm not sure how it works, but I could see a citator that:

Identifies case A is overruled by case C
Notes that case B is very similar to case A
Puts a flag on case B, even though it's not cited by case C at all:

Warning: C overruled A, which is very similar to this case.

There are probably a dozen similar ideas we should explore as we're launching this.

anseljh commented 1 month ago

Important to fix https://github.com/freelawproject/courtlistener/issues/4290 for this.

mlissner commented 4 weeks ago

Features from SmartCite (CaseText):

Orange flags are "for cases that rely on overruled cases."
filters allow you to narrow results by type of motion decided in the case (e.g., motion to dismiss or to compel discovery), by type of party involved (retail, government, etc.), and by cause of action at issue in the citing case.
Uses parentheticals as input: " To determine which cases to tag with a yellow flag, SmartCite looks for when a case has been cited using one of the contrary citation signals, “but see,” “but cf.” and “contra.”

mlissner commented 4 weeks ago

Two thoughts from @s-taube discussion today:

It should have a feedback button that users can click if they don't like a flag.
Being able to adjust how conservatively flags are applied as a user could be interesting/useful. This implies that red/green are backed by values between zero and one, instead of being simply binary, but imagine wanting to be really careful and saying that green flags should only apply to the most safe cases, say.

anseljh commented 3 weeks ago

Being able to adjust how conservatively flags are applied as a user could be interesting/useful. This implies that red/green are backed by values between zero and one, instead of being simply binary, but imagine wanting to be really careful and saying that green flags should only apply to the most safe cases, say.

I like this idea, but I think it would require substantial user testing among lawyers and other professionals who use citators to figure out how to communicate it effectively. Thinking probabilistically about a citator (or really anything?) would be a big mindset shift.

Counterpoint: Citator users don't understand the nuances of market-leading products all that well either. Rebecca is working on a paper that touches on this. Ask her about it!

Counter-counterpoint: The bar is higher for new entrants.

At Syntexys, we used a little linear gauge to display confidence of contract clause-type predictions, which worked okay. In this context, though, Lexis and Westlaw each do something similar already to display depth of treatment, so citator users are used to that type of widget meaning something other than probability/confidence scores.

Lexis/Shepard's: https://perma.cc/C6PS-PCBF

Westlaw/KeyCite: https://perma.cc/B6JF-2NLH

legaltextai commented 2 weeks ago

This is a super useful summary for a new entrant like myself. Thank you @anseljh.

Quick two thoughts:

1) With all the mentioned available datasets where we may have a text-classification pair like: "Whatever may have been the extent of psychological knowledge at the time of Plessy v. Ferguson, this finding is amply supported by modern authority. Any language in Plessy v. Ferguson contrary to this finding is rejected." = negative treatment

Do we want to fine-tune a language model with good reasoning skills on these datasets? The idea is to train the model to be able to recognize the type of treatment based on the input text.

2) Do we want to start with training the model to be able to extract all cases? In my early attempts, even GPT-4 and Claude 3.5 would not extract all cited cases, probably around 70-80% of all cited cases in a given opinion.

Briefs from here could be a good starting point. The list of extracted cases by the model will be compared against the ones mentioned in Table of Authorities.

freelawproject / courtlistener