freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
542 stars 150 forks source link

Synonym file needs more entries #273

Closed morninj closed 8 years ago

morninj commented 10 years ago

When I query Clapper v. Amnesty Intern. USA, it should return this, but instead I get no results.

mlissner commented 10 years ago

Can you post a link to your query please? On Aug 1, 2014 8:13 PM, "Joseph Mornin" notifications@github.com wrote:

When I query Clapper v. Amnesty Intern. USA, it should return this https://www.courtlistener.com/scotus/5d9s/clapper-v-amnesty-international-usa/?q=&order_by=score+desc&case_name=Clapper+v.+Amnesty&stat_Precedential=on, but instead I get no results.

— Reply to this email directly or view it on GitHub https://github.com/freelawproject/courtlistener/issues/273.

morninj commented 10 years ago

https://www.courtlistener.com/?q=&case_name=Clapper%20v.%20Amnesty%20Intern.%20USA&stat_Precedential=on&order_by=score+desc

mlissner commented 10 years ago

We have a very old synonym file here:

https://github.com/freelawproject/courtlistener/blob/master/Solr/conf/lang/synonyms_en.txt

It's got the barest minimum of items, but we could go a long way by adding items to it.

If you look at it, it has a bunch of examples of how to set up synonyms. The big question I have is if there are any lists already out there.

@emasters, did you ever make something like this?

morninj commented 10 years ago

Would be great to support Bluebook T6: https://i.imgur.com/sTklEaD.jpg

nowherenearithaca commented 10 years ago

Newbie question: do lawyers stick to those fairly consistently (since whenever BlueBook originally came out)?

morninj commented 10 years ago

Yes—at least for case names in citations (as opposed to case names in sentences). This matters because someone might query CourtListener by pasting a citation.

mlissner commented 10 years ago

T6 is apparently now T11. Here's a digital form of it:

https://law.resource.org/pub/us/code/blue/IndigoBook.html#T11

That'd be a good starting point. Still, I feel like there are versions of this already floating around some place....

mlissner commented 10 years ago

Another person I remember talking about synonym files was @waldoj. Did you ever have such a file, Waldo?

waldoj commented 10 years ago

Nope, but it's on my list of things I'd like to create as an @opendata project. I think there are enough State Decoded implementations to have a pretty good corpus of extracted terms and definitions to work with now, too.

mlissner commented 10 years ago

Related: https://github.com/opendata/Legal-Synonyms

waldoj commented 10 years ago

:+1:

mlissner commented 10 years ago

Looks like the synonym work upstream is basically done. The remainder of this issue is probably therefore:

mlissner commented 10 years ago

Oh yeah, and then:

mlissner commented 8 years ago

I just did this today, so this is finally getting fixed once I pull it and things get reindexed. The source for this data is mostly from the Indigo Book, which has many tables of abbreviations, such as:

On top of that, I added a few things:

From there I did the following:

All in all, I think it's a fine list. Definitely a conservative one, which lawyers like, but also one that should make lots of searches (especially those involving abbreviations) work better.

mlissner commented 8 years ago

This is now deployed, and @morninj, your query is definitely improved: https://www.courtlistener.com/?q=Clapper+v.+Amnesty+Intern.+USA&type=o&order_by=score+desc&stat_Precedential=on

I'm loving it, actually. It's a subtle but big improvement.