freelawproject / foresight

Where we discuss and prioritize new features
2 stars 1 forks source link

Enhance case law search relevancy #7

Open mlissner opened 2 months ago

mlissner commented 2 months ago

Feature Request Template

Please include the following information in your feature request:

Headline

What is the Feature?

"CiteGeist" is low-key the name of our search engine's ranking algorithm. There are a number of ways we can make it better:

Each of these uses the metadata of the case to provide better query relevancy and together they should make results more relevant.

What Problem Might it Solve?

Currently, searching uses field-based boosting, phrase boosting, and TF-IDF relevancy to order results. It's fine, but sometimes you really have to wonder why it didn't find the case you're looking for.

Describe a Scenario in Which the Feature Might be Used

Whenever people search, this would enhance the results they got.

Technical Requirements

Existing Systems or Alternatives?

There are plenty of case law search tools, but I think the target we currently have is Google Scholar. If we do this, I expect our results should compete with theirs.

Any Additional Information?

  1. We're also working on semantic search, which would partly obviate this, but I think we'll have keyword search for the foreseeable future.

  2. We're in a cool position with the network-based search because one of the folks in our orbit is doing PhD-level research on that topic. If we can make it work, that'd be special.

  3. In the past, we used pagerank for this, but we stopped a few years ago and nobody noticed. It's not the best ranking algo for something like court cases.