RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
31 stars 8 forks source link

Improve text searches for miRNAs #275

Closed blakesweeney closed 6 years ago

blakesweeney commented 6 years ago

It can be very difficult to find a sequence. For example someone tried to use rat microRNA 21 AND rna_type:"miRNA" to find rat microRNA 21 at the recent meeting. This was pretty tricky and I had to do a several searches before I settled on rno-MiR-21 which finds the sequence (but not in the top 20 hits). Clearly we need to look at how things are matched, or something.

AntonPetrov commented 6 years ago

I think this is related to #268 - if we index strings like "mir-21", it will be easier to find entries like this.

Currently the index contains "rno-mir-21" as one string but the query is "mir-21*", so nothing is found.

blakesweeney commented 6 years ago

True, but the original search is a reasonable one for finding mir-21 and doesn't.

blakesweeney commented 6 years ago

It is now possible to use mir-21 and find this sequence, but it doesn't end up in the first few hits. There is more work to do on the result ordering, but at least this can be found.