Hirevo / alexandrie

An alternative crate registry, implemented in Rust.
https://hirevo.github.io/alexandrie/
Apache License 2.0
493 stars 55 forks source link

Improve crate searching #19

Open Hirevo opened 4 years ago

Hirevo commented 4 years ago

Currently, crate searching, both in the frontend and the programmatic API, has limitations:

This issue will serve as the place where improvements to the search mechanism, like addressing these limitations, can be discussed.

danieleades commented 4 years ago

see here for a nice description of how crates.io handles this - https://github.com/rust-lang/crates.io/issues/1270

this is using some postgres features, so it might be hard to generalise over all backends. What is the advantage to users of the crate of being able to use multiple backing databases?

Dalvany commented 1 year ago

Hello, I used to use Alexandrie as a frontend to a mirror of crates.io. Though with few crates current search might be enough when searching through the whole mirror isn't working. For instance, when searching log, the log crate was in the last page of something like 50 pages. I made some changes to use elasticsearch as search engine to improve search. I unfortunately lost the sources, but if you want I might come up with a pull request if I have time to work on it.

Hirevo commented 1 year ago

Hi !

I wholeheartedly agree that the search experience is not great right now, exact matches are not favored and there is no relevancy criteria of any sort taken into account.

Expanding and improving search is something that I'd like to get around to, because I think it would be good to allow things like searching based on crate descriptions, keywords or categories.

Using a system like Elasticsearch would indeed considerably improve the experience but I worry that this would be yet another moving part in an already quite involved deployment process (we already have 3 separate pieces that users need to configure: the registry itself, the git index, and the database).
But it is possible that your experiments with it went fairly smoothly and that my worry isn't well informed.

I'm thinking maybe we can use one of the full text search engines that are implemented in Rust to have it directly integrated into the registry itself (like Meilisearch, or Tantivy).
Maybe we could make it so that it automatically synchronizes itself using the database and the git index, when the registry first starts up, and then is kept up-to-date with each crate publication.

I hope to have a stab at something like that rather soon.

Dalvany commented 1 year ago

I agree with your concern having another system, it might be not worth the trouble to have a fourth external softwate. I chose elastic because I was quite familiar with it and I already had a small cluster. I don't know about Meilisearch but browsing its documentation it seems that it is not a library to include but another system to deploy, so same pitfall as elastic. Tantivy, though is a library you could use inside current Alexandrie's code. I think it's the best option here.

Dalvany commented 1 year ago

Hello, @Hirevo , I find a patch file that contains the search change I made. I was wrong and didn't use elastic but tantivy. I'll make a pull request so you can see what it looks like. Here's a few thing though, I didn't handle upload new crate because we didn't need that but I can also look into that. I also didn't build index at start time but instead I made an HTTP endpoint : with large number of crate like in my use case, it takes several minutes to index everything.