Closed morninj closed 8 years ago
Can you post a link to your query please? On Aug 1, 2014 8:13 PM, "Joseph Mornin" notifications@github.com wrote:
When I query Clapper v. Amnesty Intern. USA, it should return this https://www.courtlistener.com/scotus/5d9s/clapper-v-amnesty-international-usa/?q=&order_by=score+desc&case_name=Clapper+v.+Amnesty&stat_Precedential=on, but instead I get no results.
— Reply to this email directly or view it on GitHub https://github.com/freelawproject/courtlistener/issues/273.
We have a very old synonym file here:
https://github.com/freelawproject/courtlistener/blob/master/Solr/conf/lang/synonyms_en.txt
It's got the barest minimum of items, but we could go a long way by adding items to it.
If you look at it, it has a bunch of examples of how to set up synonyms. The big question I have is if there are any lists already out there.
@emasters, did you ever make something like this?
Would be great to support Bluebook T6: https://i.imgur.com/sTklEaD.jpg
Newbie question: do lawyers stick to those fairly consistently (since whenever BlueBook originally came out)?
Yes—at least for case names in citations (as opposed to case names in sentences). This matters because someone might query CourtListener by pasting a citation.
T6 is apparently now T11. Here's a digital form of it:
https://law.resource.org/pub/us/code/blue/IndigoBook.html#T11
That'd be a good starting point. Still, I feel like there are versions of this already floating around some place....
Another person I remember talking about synonym files was @waldoj. Did you ever have such a file, Waldo?
Nope, but it's on my list of things I'd like to create as an @opendata project. I think there are enough State Decoded implementations to have a pretty good corpus of extracted terms and definitions to work with now, too.
:+1:
Looks like the synonym work upstream is basically done. The remainder of this issue is probably therefore:
Oh yeah, and then:
I just did this today, so this is finally getting fixed once I pull it and things get reindexed. The source for this data is mostly from the Indigo Book, which has many tables of abbreviations, such as:
On top of that, I added a few things:
From there I did the following:
U.S.
, because those things get split by Solr anyway.Cleaned up a bunch of items that had brackets, like trans[lator, lation], trans
. In the case that the expanded word lists were semantically different, I made them into mappings. Else, I made them into synonyms. For example, here's what trans maps to:
trans => translation,translator,transgender
Whereas something else might just be:
assemb,assembly,assemblyman,assemblywoman,assemblymember
Because they're all essentially the same.
cat
to turn up results for category
.All in all, I think it's a fine list. Definitely a conservative one, which lawyers like, but also one that should make lots of searches (especially those involving abbreviations) work better.
This is now deployed, and @morninj, your query is definitely improved: https://www.courtlistener.com/?q=Clapper+v.+Amnesty+Intern.+USA&type=o&order_by=score+desc&stat_Precedential=on
I'm loving it, actually. It's a subtle but big improvement.
When I query
Clapper v. Amnesty Intern. USA
, it should return this, but instead I get no results.