glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Protein pathway search returns incorrect entries #1661

Open katewarner opened 3 weeks ago

katewarner commented 3 weeks ago

When you carry out a Protein search (on tst or prd), select the "Any" category and search for the example pathway "hsa:3082", it returns proteins that do not have the "hsa:3082" pathway, e.g. P31749-1 is returned which only has the pathway "hsa:207".

image

I've wondering if the search is picking up hsa or the number 3082 in the pubmed ID, and if so, can we possibly change the "Any" category search to an exact search?

sujeetvkulkarni commented 3 weeks ago

@katewarner looks like partial user input is getting matched (in Any category) which is resulting in large number of results. on tst Category "Any" Input : hsa:3082 results ->19022 Category "Any" Input : hsa:207 results -> 19020

Even random input like (hsa:49999), Category "Any" Input : hsa:49999 results -> 18939

on prod same input results in below output, Category "Any" Input : hsa:3082 results -> 3 Category "Any" Input : hsa:207 results -> 19010 Category "Any" Input : hsa:49999 results -> 18968

As for exact match you can change category from Any to Pathway in the dropdown. Then on both prd and tst it results in below output,

Category "Pathway" Input : hsa:3082 results -> prd output: 3 tst output: 3

Category "Pathway" Input : hsa:207 results -> prd output: 10 tst output: 10

Category "Pathway" Input : hsa:49999 results -> prd output: No results tst output: No results

I believe you can raise "Any" category issue with @rykahsay as I dont know the current implementation to match partial user input string is desirable as it will always match "hsa" string whatever the number maybe (like hsa:49999).

Category "Any" Input : hsa results -> prd output: 18968 tst output: 18939

Category "Pathway" Input : hsa results -> prd output: 18951 tst output: 18922

Any category matches user input string in multiple categories like Disease, Glycan, Protein etc including Pathway. After selecting specific category like Pathway from drop down, it forces to match user input to only Pathway category.

katewarner commented 2 weeks ago

Okay thanks