freelawproject / foresight

Where we discuss and prioritize new features
2 stars 1 forks source link

Statute Identification in Caselaw/Opinion data #32

Open flooie opened 9 months ago

flooie commented 9 months ago

Summary:

I propose the implementation of a feature that would enable the identification and tagging of statutes within our case law, similar to linking opinion citations. This enhancement aims to improve the efficiency and accuracy of legal research by allowing users to easily find and reference relevant statutes mentioned in caselaw.

Background:

Currently, our caselaw database provides comprehensive legal decisions, but it lacks a systematic way of identifying and linking to statutes mentioned within these cases. Users often have to manually search for statutes, which is time-consuming and prone to error. Additionally, it would provide a new way to group opinions together, something that I think would be of great value to our users.

Proposed Solution:

Potential Challenges:

There is an open source (or maybe now for profit org) OpenStates that works on this issue. Would be smart to contact them I think if we start down this road. They may be more about tracking legislation than final statutes, but their insight will be invaluable.

mlissner commented 9 months ago

Thanks. We'll want to talk to LII for federal stuff too.

I don't think we'll do this any time soon, but it'd be nice to see some screenshots or gifs of other implementations of this. I imagine searching these would be really tricky to get right.

65001 commented 9 months ago

What are the date ranges for the legislation your thinking about?

Federally, https://uscode.house.gov/ has XML data going back to 2013. If you look at the archives it goes back to 1994 if you use XHTML, PDF, or GPO.

If we are interested in supporting statutes at large (ie not references to 42 USC 1983 etc but rather the long name) we'd have to look through that data as well.

mlissner commented 9 months ago

Well, ideally we'd want this to work for all statutes across all time. That's a big lift, so chipping away at it surely would make sense. I think I'm going to need an education on statutes though. I don't know much about this stuff yet.

65001 commented 9 months ago

I'm not a lawyer, but I read some case law in my spare time.

There are a couple of formats, the easiest to move over would be

I also understand that lawyers can cite to the statute at large instead of the codified version, ie Immigration and Nationality Act or Immigration and Nationality Act ("INA"). 8 U.S.C. § 1158 (b)(1)(A). I don't know how prevalent it is to cite the statutes at large in modern cases without also referencing the Code.

As for statutes and the Code, the code can take parts of statutes at large and intermix them together. See Table 3 mapping.

I think you'd get the most benefit for the effort by focusing on U.S.C regex's and CFR for federal regulations. How does Eyecite work behind the scenes? Are you using regex's or are you using some kind of NLP library like Spacy to do the tagging automatically with confidence intervals?

Examples:

mlissner commented 9 months ago

Thanks. Eyecite has some support for identifying statutes. It uses lots and lots of regexes. I guess a good place to start would be to dig into how good those regexes are and to get eyecite really tuned up.

The way we did that with opinions was super laborious. We'd craft overly-inclusive regexes to build up a set of possible citations, and then:

If we followed that approach here, I think we could build up a solid statute finder as the first step (of course, this discussion should probably be over in eyecite, where people like @mattdahl hang out).

mattdahl commented 9 months ago

My thoughts, fwiw:

  1. First, on statute recognition, what is the provenance of the regexes in https://github.com/freelawproject/reporters-db/blob/main/reporters_db/data/laws.json? They all look like they were automatically generated, do we know how comprehensive that generation was? Another source of statute regex templates is here: https://github.com/raindrum/citeurl/tree/main/citeurl/templates. Not sure where those are originally from either, but we could try to cross-check and/or merge.
  2. Second, on statute extraction, we already have a FullLawCitation abstraction in eyecite that I think could handle the extraction of any new statute regexes with little or zero tweaking. (But I may be missing something.)
  3. Third, on statute linking, this seems like the biggest challenge to me. In the above-mentioned repo (https://github.com/raindrum/citeurl/tree/main/citeurl/templates), there are official government URL templates for many of the state statute regexes, which could be handy if they actually work. But @flooie mentions "ingest[ing] bulk statute text to link to" -- do you mean that you'd like to link to internal-to-CL documents, as opposed to those of some external third-party?
mlissner commented 9 months ago
  1. There was once a browser extension that tried to do this. I think the regexes came from there, but I might be totally forgetting something. Definitely worth making the best of any sources we can find.

  2. Awesome.

do you mean that you'd like to link to internal-to-CL documents, as opposed to those of some external third-party?

  1. I think so. Long term, we'll probably need to have statutes to feel complete, but statutes are very hard, so that's years away unless somebody steps forward with funding for a couple full time developers.
flooie commented 9 months ago

@mattdahl @mlissner I thought they were auto-generated by @jcushman when you/he added FullLawCitation. I think the easy part is going to be updated the laws.json to handle all the edge cases and patterns. I played with it maybe a year ago and found it pretty easy to implement the statute patterns for a single state.

I think the hard part is going to be finding and keeping statutes up-to-date in a good data model on CL. But I am hopeful that we can rely on data models or designs by open states to tackle that problem.

flooie commented 9 months ago

Do you want to move the issue to eyecite? I feel like it should have a place here on CL.

mlissner commented 9 months ago

I think this is really about the full project, and if we want to work on parsers, we should open a ticket for that on eyecite? That's how I think about it anyway.