freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
541 stars 149 forks source link

Re-run the citation extractor #4566

Open mlissner opened 6 days ago

mlissner commented 6 days ago

For the citator project, we badly need to re-run the citation extractor.

In today's case law entmoot, we discussed a few enhancements we should make before we do:

  1. If a citation cannot be disambiguated, that's OK, we can send it to a /c/ page, which will list multiple options.

  2. If a citation cannot be matched, we can also link to it via /c/. Here's what you see when a citation is missing:

    Image

    We can tune that up to talk more about neutral citations and things like that, but it's surprisingly not bad already.

    I think these citation links should be red, like Wikipedia pages that haven't yet been created.

  3. We need to fix any eyecite bugs that prevent good citation extraction.

That's my brain dump. We do need to get on top of this so that we have the data we need.

mlissner commented 6 days ago

I added a bunch of sub-issues. We need to at least investigate these and make a decision about how important they are to do before the big extraction.

mlissner commented 2 days ago
  1. Once @flooie lands the opinion cleanup PR, we can link to pincites via HTML anchors. We should use these so that citations link to the correct page of the decisions (if possible!)