freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
550 stars 151 forks source link

Elasticsearch Citation Search seems broken #4633

Open flooie opened 3 weeks ago

flooie commented 3 weeks ago

https://www.courtlistener.com/opinion/595912/go/#fn-s_ref

When querying for the preceding case in Elastic using the citation, I receive the following message:

Showing results for “” without citation “980 F.2d 737”. It appears we don’t yet have that citation.

However, filtering by a portion of the case name confirms that both the case and citation are indeed present in the system, as shown below:

Ann Franco v. National Van Lines, Inc. Starving Students, Inc., Ann Franco, and 
Louis G. Fazzi, Esq. v. National Van Lines, Inc. Starving Students, Inc., 
980 F.2d 737 (9th Cir. 1992)
Court of Appeals for the Ninth Circuit

Filed: November 13th, 1992
Precedential Status: Non-Precedential
Citations: 980 F.2d 737, 1992 U.S. App. LEXIS 35503
Docket Number: 89-55697

https://www.courtlistener.com/?q=980%20F.2d%20737&type=o&order_by=score%20desc&case_name=Ann%20Franco%20v.%20National%20Van%20Lines&stat_Published=on&stat_Unpublished=on

Image

It appears there may be an issue with citation-based retrieval in Elastic.

mlissner commented 3 weeks ago

OK, yeah, a couple things weird about that, eh?

  1. It shouldn't eliminate the citation from the query and return "" when the only thing there is the citation.

  2. But did it do that? Somehow it still returned results.

  3. Looking it up by citation works:

    https://www.courtlistener.com/?q=&type=o&order_by=score%20desc&case_name=Ann%20Franco%20v.%20National%20Van%20Lines&stat_Published=on&stat_Unpublished=on&citation=980%20F.2d%20737

  4. But like this does not:

    https://www.courtlistener.com/?q=citation%3A(980%20F.2d%20737)&type=o&order_by=score%20desc&case_name=Ann%20Franco%20v.%20National%20Van%20Lines&stat_Published=on&stat_Unpublished=on

  5. But, lo, this works (with the space moved):

    https://www.courtlistener.com/?q=citation%3A(980%20F.%202d%20737)&type=o&order_by=score%20desc&case_name=Ann%20Franco%20v.%20National%20Van%20Lines&stat_Published=on&stat_Unpublished=on

I think the whitespace is the heart of the issue here, but we can put this on Alberto's backlog to dig in.

mlissner commented 2 weeks ago

Got another instance of this today where whitespace doesn't seem to be the issue:

https://www.courtlistener.com/?q=101%20So.%20844&type=o&order_by=score%20desc&stat_Published=on

Image

But 101 So. 844 does exist:

https://www.courtlistener.com/opinion/3239863/u-s-salvage-sales-co-v-weber/

As we're fixing this, Rebecca dropped this meme into our Slack:

Image

So maybe if it makes sense, we should also make pure citation searches just take people where they want to go when we do this (if it doesn't add too much complexity or time).