Closed martenson closed 1 year ago
@erasche since recently Galaxy now searches tool IDs
I think improvements might be made regarding the interchangeability of ' ', '_', '-'
@martenson that's great! +1 for allowing users to substitute in ' ' for the _. I know I look for my tools by ID sometimes and fail to find them.
+1 to that, requiring users to know _
's is unfortunate. Is the input not tokenized and matched? (I guess not, if the broken string doesn't match?)
@martenson RFC: Things I would like to see indexed and available to search, along with my feelings on their boosts:
I just find myself frustrated when I cannot find the tool I want or the results are very limited because of what is searched upon. Of course, I do not know what the state of 16.07/dev is, have not gotten there yet.
@erasche we have these boosts on Main and these are the defaults
# tool_name_boost = 9
# tool_section_boost = 3
# tool_description_boost = 2
# tool_label_boost = 1
# tool_stub_boost = 5
# tool_help_boost = 0.5
@martenson hey that's most of the things I need. In that case, then it would be nice to have more space to display this and where the search actually "hit". Apologies, have not been following along with this stuff closely enough to make informed comments.
we have the 'hit' information but I did not figure out a good place to display it - related to the limited canvas
I would add in that it would be nice to have the underlying tool (binary) name be part of the search, if wrapped under a slightly different tool name or short label of some type.
Related utilized/dependent binaries would be included in this. (lower "boost" probably)
digging through Main usage metrics any improvements to toolpanel search should be very well worth it
from @erasche
https://usegalaxy.eu/api/tools?q=compute+an+expression - 0 results https://usegalaxy.eu/api/tools?q=compute+expression - numerous results + correct one https://usegalaxy.eu/api/tools?q=compute+an - numerous results + correct one
Another concrete issue:
https://usegalaxy.eu/api/tools?q=peakachu returns 2 results, neither are displayed on frontend. client issue.
@erasche I cannot reproduce
Firefox on linux, cannot repro in chrome.
two subsequent searches for peakachu
yielded different results for me in firefox, the first does not show in client, the second does
["toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.1", "toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.0"]
["toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.1", "toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.0", "toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.2"]
@erasche I also can reproduce ~50% of the times on the UI, on both Firefox and Chrome on Linux. One of the web handler hasn't reloaded the toolbox probably.
xref new issue for the display bug: https://github.com/galaxyproject/galaxy/issues/7238
another search term returning unexpected results: ncbi
browser might not matter, same results using chrome or safari under mac osx (but didn't test firefox)
usegalaxy.org == finds "get data > NCBI bam" download tool but not "get data > NCBI fastq". This server doesn't include "get data > NCBI pileup" anymore (tool routinely failed -- data usually too large plus any represents ambiguous scientific content)
usegalaxy.eu == doesn't find any of these three (all are present under "get data")
usegalaxy.org.au == finds all three (under "get data")
usegalaxy.be == doesn't find any of these three (all are present under "get data")
another search term: convert
It did not find the convert
tool (Text Manipulation>Convert delimiters to TAB)
It works
Tries made with Firefox.
I discover boosters! What can I set in order to find a result as usegalaxy.eu?
# tool_name_boost = 9
# tool_section_boost = 3
# tool_description_boost = 2
# tool_label_boost = 1
# tool_stub_boost = 5
# tool_help_boost = 0.5
@FredericBGA .eu's boosts are here https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/group_vars/gxconfig.yml#L1076 but they're pretty aggressive / strange compared to other sites'
@FredericBGA .eu's boosts are here https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/group_vars/gxconfig.yml#L1076 but they're pretty aggressive / strange compared to other sites'
@erasche thank you! The link in Martin post above is broken. I will try with something between default and .eu
@FredericBGA we have this on Main atm
tool_name_boost: 12
tool_section_boost: 5
We should probably experiment with tool_enable_ngram_search
I created a PR to mimic EU and enable ngram too: https://github.com/galaxyproject/usegalaxy-playbook/pull/228
We use tool_enable_ngram_search: true
, which works fine.
thank you all for sharing your config with me!
It works now, with:
tool_name_boost: 20
tool_name_boost: 20
Wonder if Main would benefit from that much higher boost, specifically. Searches are still a bit unpredictable and result too limited imho. martin probably is on that already... is not new and we've tried a few variations already but still could use some tuning.
Has to be frustrating to search for a tool and not find it -- as the stats above he posted backup.
@jennaj I have proposed radical change for Test here: https://github.com/galaxyproject/usegalaxy-playbook/pull/228
The search stats above do not include anything about the results, it is a boolean for 'has the user searched at least once?'.
Great, I really do like the EU search results. Finds everything, and even though outputs more results, totally worth it imo. Glad we are exploring that.
The stats sort of indicate that people using tool searches spend more time in a Galaxy session. Suggesting are new and/or running tools directly from history, and possibly are spending more time "hunting" for tools (the "hard way", eg expanding/scrolling through all). Non-tool searching sessions look they might be biased from those running workflows - so quick login/out, no tool searches, bounce rate higher because they get whatever they want to do done quicker. But maybe am reading too much into that :)
New issue reported when looking for blast
and expecting to find blastp
.
tool_search_limit = 160
blastp
an idea: A hybrid approach where the search result limit is high but will cut off at certain hit score if there are enough results. This could prune the less important results.
searching for full name + title doesn't find the tool. Search select lines
and it is returned.
also found https://github.com/galaxyproject/galaxy/issues/3276 when searching for this one, think that's one of the points in this thread somewhere.
@erasche I think the point made in the linked ticket is an important one. People don't care about the actual tool order in whatever tool panel they are working with. They want to find the tool. The ranking is an intuitive way to do a search -- could even be a toggle in the GUI.
Could even be extended if ranked (exact tool name match): eg: "show all" vs "I'm feeling lucky" type of thing
Agreed. I understand (what I assume was) the original intention to help users find the section on their own later, but on eu with 2k tools, they will basically always use search. Would love to have a ranking.
meantime on Main:
search for fastq
has first result in response the fastqc tool, panel never shows it
search for fastqc
has middle result the multiqc tools, panel never shows it
Main/usegalaxy.org
search for ncbi
does not find any of the "NCBI SRA" Get data tools: https://toolshed.g2.bx.psu.edu/view/iuc/sra_tools/f5ea3ce9b9b0
Eu/usegalaxy.eu
search for ncbi
also does not find any of the "NCBI SRA" Get data tools, but does find other Get Data tools from NCBI not in the same tool suite (none of those are loaded at .org)
On eu:
The following two queries return different results:
Importantly, the first doesn't include random_lines1
, the tool I'm looking for.
On eu, a search for snpeff
does not find any of the following tools:
Works just fine on .org.
... and, possibly related to @hexylena's example just above: trailing whitespace in a search term seems to mess up results completely
This one on main AND eu
a question: is there really no way to express an AND between search terms?
@wm75 not at the moment, can you please provide examples of searches that don't behave as you'd expect?
Odd result on EU:
Multiqc appears in two sections
Searching for it yields only one:
Contrast with fastqc which appears in two and searches yield two (of the same version)
"UCSC main" is unfindable on EU: https://usegalaxy.eu/api/tools?q=ucsc+main doesn't include ucsc_table_direct1, but it does include 150 other things. @bgruening
It does on .org, but not nearly the top hit for a search on the exact tool title
on EU searchingucsc
has 52 results and "Main" is the last one ðŸ˜
right? saw that one too, and :joy: seems very odd given our boosts. https://github.com/usegalaxy-eu/infrastructure-playbook/blob/45c98a0baec381ccb0acd6cca78016985bd58fe4/group_vars/gxconfig.yml#L1190
@hexylena maybe https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/webapps/galaxy/config_schema.yml#L1955 could improve things?
Possibly! I just expected the tool_name boost to have the biggest effect. I would love to debug the internals sometime, and see what scores x boost are being returned for each of these results that are doing 'better' than the direct text match. Like, if those are returning first, clearly they say "ucsc main" dozens of time in their descriptions or something?
is it possible that exact matches overflow in score ? This is a search for ucsc:
@mvdbeek neat! How did you obtain that?
reported by @jennaj given the number of tools on Main the results of search needs to be better, mainly:
I am trying to address the first two (for Main) with: https://github.com/galaxyproject/usegalaxy-playbook/pull/19