elementary / appcenter

Pay-what-you-can app store for elementary OS
https://elementary.io
GNU General Public License v3.0
543 stars 102 forks source link

Better search ordering/ranking #1279

Open cassidyjames opened 4 years ago

cassidyjames commented 4 years ago

Right now it looks like we list apps alphabetically in search results, regardless of their relevance to the search. Performing the same search with appstreamcli seems to return better ordering, so if we can use that, we probably should—otherwise (or if we find it produces better results), we should order results by:

  • App names starting with the term
  • App names containing the term
  • Everything else

…as mentioned by @WatchMkr in https://github.com/pop-os/shop/issues/171

cassidyjames commented 4 years ago

I would also add matches from the app's summary line, as those appear in the search view. So, if possible:

I'm not sure if AppStream also does matching on the app ID or not, but if so, we might want to figure out where that fits in. But my suspicion is that if an app ID has a search term in it, the human-readable name will as well.

davidmhewitt commented 4 years ago

Ah, yes, I'm not sure if this is a regression from when we put Flatpak support in or if it's always been like this, but it's definitely more difficult to solve now.

AppStream gives us search results in a kind of relevance order, which is the order you're seeing in appstreamcli.

But, now that we have Flatpak support, we technically do two searches in two AppStream pools (PackageKit's and Flatpak's) and merge the result, then sort them. As far as I remember this was done because when implementing it as a single pool, I found some issues that made it difficult to implement in this way and I've opened an issue to re-investigate that here: https://github.com/elementary/appcenter/issues/1284

Because of this, we can't just take the two result lists from AppStream and append one onto the other (keeping the relevance order) because then we'd be demoting either Flatpak or PackageKit packages to the bottom of the list.

I also suspect that trying to do some/all of the string processing necessary to sort the results in the order you detail above will be too inefficient to do at any great speed. I'll try it out as an experiment, but even if it's reasonable on my machine, it'll need testing on slower hardware, because multiple substring searches over hundreds/thousands of components are likely to be crippling.

I'll open an issue in AppStream to see if it would make sense for the sort score per component to be exposed as public API so we could try and use that to sort/merge the components.

As an aside, I had a look at how gnome-software fixed this and from what I can tell, they have a custom XML library (libxmlb) that's optimized for searching for strings within the huge AppStream collection XMLs and implements all of the word stemming stuff that AppStream search does too. So it looks like they don't use the search methods built into AppStream at all and have implemented their own search.

davidmhewitt commented 4 years ago

I've been digging through AppStream to see how the search ordering works and fixed a pretty big issue with it in the process, so AppStream orders results better in master now.

I've also had some discussion with Matthias about exposing those internal result score metrics as public API, so we can use them as the sorting key to combine two lists here: https://github.com/ximion/appstream/issues/269

I'll mark this issue as blocked dependent on the outcome of that issue as the result of that will likely be the best solution to this.

KarkanAlzwayed commented 3 years ago

I don't know if this is helpful or not, but I am running EOS 6 beta and this is happening for me, too. Today, I searched for "OnlyOffice" and I counted 48 apps that showed up before the app that I am looking for. I was about to report it, but thanks to the search on github, I found this open. Thanks

cassidyjames commented 3 years ago

@KarkanAlzwayed I can't reproduce that exactly; for me, searching OnlyOffice with Flathub added shows exactly one result: ONLYOFFICE Desktop Editors which I presume is what you were looking for. However, if I search for Only Office with a space, I see something similar to what you're describing. I believe this is indeed due to AppStream search considering each word individually, and our sorting not considering matches when the whitespace differs. I'm not sure if it's practical for us to work around this exact case or not, but I suppose it does reflect this overall issue.

cassidyjames commented 3 years ago

@davidmhewitt it looks like AppStream did make the component sort scores public; does that let us do more here? https://github.com/ximion/appstream/commit/bc18c45994039bacbfc2b08a664d1e9c97b67024

KarkanAlzwayed commented 3 years ago

@cassidyjames You are correct. I have just gotten the same results as you have. I still think that it should find it first regardless of the whitespace, just to be more efficient.