freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
532 stars 148 forks source link

Enhance case name search relevancy #4366

Open legaltextai opened 3 weeks ago

legaltextai commented 3 weeks ago

"Google v Oracle" shows Oracle v Google Ideally, should show Google v Oracle, Supreme Court first, then Appeal, then organized by date There are other examples with famous cases. Should we assign more weight to the case name field? To the phrase? Recognize when a user searches for a case name (if there is a "v" or "v.")?

Screenshot 2024-08-28 at 7 28 56 AM Screenshot 2024-08-28 at 7 28 31 AM Screenshot 2024-08-28 at 7 28 19 AM
legaltextai commented 3 weeks ago

Interesting, even if I filter further by case name, it still does not show Google v Oracle in the first place

Screenshot 2024-08-28 at 9 38 57 AM
mlissner commented 3 weeks ago

We do recognize when there's a v. or an In re in the query and boost based on case name. Maybe we need to crank that boost higher.

I don't know that we can boost based on phrases, but @albertisfu might.

Two other relevancy enhancements we have planned are:

  1. 558

  2. https://github.com/freelawproject/courtlistener/issues/4381
albertisfu commented 3 weeks ago

We do recognize when there's a v. or an In re in the query and boost based on case name. Maybe we need to crank that boost higher.

Yeah, this is correct. Currently, we boost the caseName field to 50 if there is a "v", "v.", "vs.", or "vs" within the query, or if it starts with "in re ", "matter of ", or "ex parte ".

After discussing this issue with @legaltextai, we think we could try to increase the boost on caseName and/or increase the query_string phrase component as well. This is because currently, matches in fields other than caseName are also influencing the scores.

This will require some testing within the production cluster so we can determine the best tuning for the search parameters. Perhaps this should also wait for https://github.com/freelawproject/infrastructure/issues/144 ?

mlissner commented 3 weeks ago

Yes, I'd suggest waiting. It'll be easier to do once y'all have read only access.