Each of these uses the metadata of the case to provide better query relevancy and together they should make results more relevant.
What Problem Might it Solve?
Currently, searching uses field-based boosting, phrase boosting, and TF-IDF relevancy to order results. It's fine, but sometimes you really have to wonder why it didn't find the case you're looking for.
Describe a Scenario in Which the Feature Might be Used
Whenever people search, this would enhance the results they got.
Technical Requirements
How hard is it to make, subjectively? Medium
Best guess, how long would it take to make, roughly? 4 weeks
What would it require that we do technically?
We would have to figure out the different ranking algorithms and how to apply them to our results in a performant way that can be updated as new content comes in (difficult with network-based algos). We would then have to calculate the network scores and associate them with the 10M cases we have.
Existing Systems or Alternatives?
There are plenty of case law search tools, but I think the target we currently have is Google Scholar. If we do this, I expect our results should compete with theirs.
Any Additional Information?
We're also working on semantic search, which would partly obviate this, but I think we'll have keyword search for the foreseeable future.
We're in a cool position with the network-based search because one of the folks in our orbit is doing PhD-level research on that topic. If we can make it work, that'd be special.
In the past, we used pagerank for this, but we stopped a few years ago and nobody noticed. It's not the best ranking algo for something like court cases.
Feature Request Template
Please include the following information in your feature request:
Headline
What is the Feature?
"CiteGeist" is low-key the name of our search engine's ranking algorithm. There are a number of ways we can make it better:
Each of these uses the metadata of the case to provide better query relevancy and together they should make results more relevant.
What Problem Might it Solve?
Currently, searching uses field-based boosting, phrase boosting, and TF-IDF relevancy to order results. It's fine, but sometimes you really have to wonder why it didn't find the case you're looking for.
Describe a Scenario in Which the Feature Might be Used
Whenever people search, this would enhance the results they got.
Technical Requirements
What would it require that we do technically?
We would have to figure out the different ranking algorithms and how to apply them to our results in a performant way that can be updated as new content comes in (difficult with network-based algos). We would then have to calculate the network scores and associate them with the 10M cases we have.
Existing Systems or Alternatives?
There are plenty of case law search tools, but I think the target we currently have is Google Scholar. If we do this, I expect our results should compete with theirs.
Any Additional Information?
We're also working on semantic search, which would partly obviate this, but I think we'll have keyword search for the foreseeable future.
We're in a cool position with the network-based search because one of the folks in our orbit is doing PhD-level research on that topic. If we can make it work, that'd be special.
In the past, we used pagerank for this, but we stopped a few years ago and nobody noticed. It's not the best ranking algo for something like court cases.