Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
21 stars 12 forks source link

[Feature] Allow data contributors to specify keywords to improve challenge search results #2692

Closed tschaffter closed 1 month ago

tschaffter commented 1 month ago

What product(s) is this feature for?

OpenChallenges

Description

We are currently indexing the following challenge elements for the challenge search:

We use the default tokenizer, which identify tokens using spaces and slashes. More importantly, searching for a substring doesn't work unless the substring appears as a token at least once in the above elements.

The goal of this ticket is ticket is to allow the registration of extra keywords/tokens for challenges.

The keywords are only supported at the entity level so no changes will be made to the API spec, API clients or web app.

Anything else?

No response

Code of Conduct

tschaffter commented 1 month ago

It looks like Elasticsearch is also using parenthesis and dash as separators when tokenizing. For example, searching for "smc" will return this challenge, which does not include "smc" as a token separated by other known separators.

tschaffter commented 1 month ago

I reviewed the different search issues I could remember/think of. The search results look great and there is no need to make any improvement at this point.

If we feel the need to one day, we could add an additional property to challenge to the data model similar to the HTML keywords metadata property.

<meta name="keywords" content="HTML, CSS, JavaScript">

https://www.w3schools.com/tags/tag_meta.asp