galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.39k stars 999 forks source link

improve tool panel search #2272

Closed martenson closed 1 year ago

martenson commented 8 years ago

reported by @jennaj given the number of tools on Main the results of search needs to be better, mainly:

I am trying to address the first two (for Main) with: https://github.com/galaxyproject/usegalaxy-playbook/pull/19

hexylena commented 8 years ago

utvalg_999 019

martenson commented 8 years ago

@erasche since recently Galaxy now searches tool IDs

screenshot 2016-08-08 13 49 51

I think improvements might be made regarding the interchangeability of ' ', '_', '-'

hexylena commented 8 years ago

@martenson that's great! +1 for allowing users to substitute in ' ' for the _. I know I look for my tools by ID sometimes and fail to find them.

dannon commented 8 years ago

+1 to that, requiring users to know _'s is unfortunate. Is the input not tokenized and matched? (I guess not, if the broken string doesn't match?)

hexylena commented 8 years ago

@martenson RFC: Things I would like to see indexed and available to search, along with my feelings on their boosts:

I just find myself frustrated when I cannot find the tool I want or the results are very limited because of what is searched upon. Of course, I do not know what the state of 16.07/dev is, have not gotten there yet.

martenson commented 8 years ago

@erasche we have these boosts on Main and these are the defaults

# tool_name_boost = 9
# tool_section_boost = 3
# tool_description_boost = 2
# tool_label_boost = 1
# tool_stub_boost = 5
# tool_help_boost = 0.5
hexylena commented 8 years ago

@martenson hey that's most of the things I need. In that case, then it would be nice to have more space to display this and where the search actually "hit". Apologies, have not been following along with this stuff closely enough to make informed comments.

martenson commented 8 years ago

we have the 'hit' information but I did not figure out a good place to display it - related to the limited canvas

jennaj commented 8 years ago

I would add in that it would be nice to have the underlying tool (binary) name be part of the search, if wrapped under a slightly different tool name or short label of some type.

Related utilized/dependent binaries would be included in this. (lower "boost" probably)

martenson commented 6 years ago

xref https://github.com/galaxyproject/galaxy/issues/1084

martenson commented 6 years ago

digging through Main usage metrics any improvements to toolpanel search should be very well worth it

screenshot 2018-10-11 11 19 36

martenson commented 5 years ago

from @erasche

https://usegalaxy.eu/api/tools?q=compute+an+expression - 0 results https://usegalaxy.eu/api/tools?q=compute+expression - numerous results + correct one https://usegalaxy.eu/api/tools?q=compute+an - numerous results + correct one

hexylena commented 5 years ago

Another concrete issue:

https://usegalaxy.eu/api/tools?q=peakachu returns 2 results, neither are displayed on frontend. client issue.

martenson commented 5 years ago

@erasche I cannot reproduce screenshot 2019-01-18 11 02 38

hexylena commented 5 years ago

Firefox on linux, cannot repro in chrome.

martenson commented 5 years ago

two subsequent searches for peakachu yielded different results for me in firefox, the first does not show in client, the second does

["toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.1", "toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.0"]
["toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.1", "toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.0", "toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.2"]
nsoranzo commented 5 years ago

@erasche I also can reproduce ~50% of the times on the UI, on both Firefox and Chrome on Linux. One of the web handler hasn't reloaded the toolbox probably.

martenson commented 5 years ago

xref new issue for the display bug: https://github.com/galaxyproject/galaxy/issues/7238

jennaj commented 5 years ago

another search term returning unexpected results: ncbi

browser might not matter, same results using chrome or safari under mac osx (but didn't test firefox)

FredericBGA commented 5 years ago

another search term: convert It did not find the convert tool (Text Manipulation>Convert delimiters to TAB)

It works

Tries made with Firefox.

I discover boosters! What can I set in order to find a result as usegalaxy.eu?

# tool_name_boost = 9
# tool_section_boost = 3
# tool_description_boost = 2
# tool_label_boost = 1
# tool_stub_boost = 5
# tool_help_boost = 0.5
hexylena commented 5 years ago

@FredericBGA .eu's boosts are here https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/group_vars/gxconfig.yml#L1076 but they're pretty aggressive / strange compared to other sites'

FredericBGA commented 5 years ago

@FredericBGA .eu's boosts are here https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/group_vars/gxconfig.yml#L1076 but they're pretty aggressive / strange compared to other sites'

@erasche thank you! The link in Martin post above is broken. I will try with something between default and .eu

martenson commented 5 years ago

@FredericBGA we have this on Main atm

  tool_name_boost: 12
  tool_section_boost: 5

We should probably experiment with tool_enable_ngram_search

I created a PR to mimic EU and enable ngram too: https://github.com/galaxyproject/usegalaxy-playbook/pull/228

nsoranzo commented 5 years ago

We use tool_enable_ngram_search: true, which works fine.

FredericBGA commented 5 years ago

thank you all for sharing your config with me! It works now, with: tool_name_boost: 20

jennaj commented 5 years ago

tool_name_boost: 20

Wonder if Main would benefit from that much higher boost, specifically. Searches are still a bit unpredictable and result too limited imho. martin probably is on that already... is not new and we've tried a few variations already but still could use some tuning.

Has to be frustrating to search for a tool and not find it -- as the stats above he posted backup.

martenson commented 5 years ago

@jennaj I have proposed radical change for Test here: https://github.com/galaxyproject/usegalaxy-playbook/pull/228

The search stats above do not include anything about the results, it is a boolean for 'has the user searched at least once?'.

jennaj commented 5 years ago

Great, I really do like the EU search results. Finds everything, and even though outputs more results, totally worth it imo. Glad we are exploring that.

The stats sort of indicate that people using tool searches spend more time in a Galaxy session. Suggesting are new and/or running tools directly from history, and possibly are spending more time "hunting" for tools (the "hard way", eg expanding/scrolling through all). Non-tool searching sessions look they might be biased from those running workflows - so quick login/out, no tool searches, bounce rate higher because they get whatever they want to do done quicker. But maybe am reading too much into that :)

hexylena commented 5 years ago

New issue reported when looking for blast and expecting to find blastp.

Screenshot at time of reporting
martenson commented 5 years ago

an idea: A hybrid approach where the search result limit is high but will cut off at certain hit score if there are enough results. This could prune the less important results.

hexylena commented 5 years ago

image

searching for full name + title doesn't find the tool. Search select lines and it is returned.

hexylena commented 5 years ago

also found https://github.com/galaxyproject/galaxy/issues/3276 when searching for this one, think that's one of the points in this thread somewhere.

jennaj commented 5 years ago

@erasche I think the point made in the linked ticket is an important one. People don't care about the actual tool order in whatever tool panel they are working with. They want to find the tool. The ranking is an intuitive way to do a search -- could even be a toggle in the GUI.

Could even be extended if ranked (exact tool name match): eg: "show all" vs "I'm feeling lucky" type of thing

hexylena commented 5 years ago

Agreed. I understand (what I assume was) the original intention to help users find the section on their own later, but on eu with 2k tools, they will basically always use search. Would love to have a ranking.

martenson commented 5 years ago

meantime on Main:

search for fastq has first result in response the fastqc tool, panel never shows it search for fastqc has middle result the multiqc tools, panel never shows it

jennaj commented 4 years ago

Main/usegalaxy.org

search for ncbi does not find any of the "NCBI SRA" Get data tools: https://toolshed.g2.bx.psu.edu/view/iuc/sra_tools/f5ea3ce9b9b0

Eu/usegalaxy.eu

search for ncbi also does not find any of the "NCBI SRA" Get data tools, but does find other Get Data tools from NCBI not in the same tool suite (none of those are loaded at .org)

hexylena commented 4 years ago

On eu:

The following two queries return different results:

Importantly, the first doesn't include random_lines1, the tool I'm looking for.

wm75 commented 4 years ago

On eu, a search for snpeff does not find any of the following tools:

Works just fine on .org.

wm75 commented 4 years ago

... and, possibly related to @hexylena's example just above: trailing whitespace in a search term seems to mess up results completely

This one on main AND eu

wm75 commented 4 years ago

a question: is there really no way to express an AND between search terms?

martenson commented 4 years ago

@wm75 not at the moment, can you please provide examples of searches that don't behave as you'd expect?

hexylena commented 4 years ago

Odd result on EU:

Multiqc appears in two sections

afbeelding

Searching for it yields only one: afbeelding

Contrast with fastqc which appears in two and searches yield two (of the same version)

afbeelding

martenson commented 4 years ago

xref: https://github.com/galaxyproject/galaxy/issues/10030

hexylena commented 3 years ago

"UCSC main" is unfindable on EU: https://usegalaxy.eu/api/tools?q=ucsc+main doesn't include ucsc_table_direct1, but it does include 150 other things. @bgruening

It does on .org, but not nearly the top hit for a search on the exact tool title

martenson commented 3 years ago

on EU searchingucsc has 52 results and "Main" is the last one 😭

hexylena commented 3 years ago

right? saw that one too, and :joy: seems very odd given our boosts. https://github.com/usegalaxy-eu/infrastructure-playbook/blob/45c98a0baec381ccb0acd6cca78016985bd58fe4/group_vars/gxconfig.yml#L1190

martenson commented 3 years ago

@hexylena maybe https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/webapps/galaxy/config_schema.yml#L1955 could improve things?

hexylena commented 3 years ago

Possibly! I just expected the tool_name boost to have the biggest effect. I would love to debug the internals sometime, and see what scores x boost are being returned for each of these results that are doing 'better' than the direct text match. Like, if those are returning first, clearly they say "ucsc main" dozens of time in their descriptions or something?

mvdbeek commented 3 years ago

is it possible that exact matches overflow in score ? This is a search for ucsc:

Screenshot 2020-10-29 at 11 54 15
hexylena commented 3 years ago

@mvdbeek neat! How did you obtain that?