BACKGROUND
Many ways to potentially improve on this would suggest Issue Assignee do some "field research" and/or "user webinar research" to find the best solution before starting this ticket (or before dismissing this ticket) - so not a "quick fix" ticket. I personally came across this issue when parsing data files from Ensembl Genomes (Protists) when recreating steps (from actual user story) for new BRC Analytics users (coming from the legacy VeuPathDb site). (Screenshot and data file used included in this ticket.)
ISSUE
Suggesting a further enhancement to improve ALL REGULAR EXPRESSION FORM FIELDS on the Tool Form page. Should help resolve potential issues, while the "Help" content is good it: (1) may be a bit too specific for the sample use-case given all the various types of users we have and (2) contains "see Python" note which is outside the Galaxy "no programming language/code" ethos and (3) even users familiar with regular expressions may be surprised they have to re-learn for Python's particular regular expression pattern (as there are lots of regular expression engines in existence).
SOLUTION
Would look for an Interactive solution (not just wording) that prevents user from wasting time trying out regular expressions on their very big files that might take several minutes to process. Some possibilities (from straightforward to more complex): (1) display a "Try your Regex" field/pop-up where user could paste in a single line of content that would show them their regex match before running the Tool Form; maybe the "single line of content" is auto-populated even from their file; google "online regular expression sandbox" for example. (2) Show a "Preview" window maybe of the the first 1,000 characters of their file as they type their regular expression in the form field; could also include a "Results Count" if helpful. (3) Other possible solutions sure to exist, see "field research" not in first/Background paragraph.
Galaxy Version and/or server at which you observed the bug
Galaxy Version: version 24.1.3.dev0 (on production / US Main as of Oct 10, 2024)
To Reproduce
Steps to reproduce the behavior:
Go to: "Tools" panel >> find tool: "Regex Find And Replace"
Upload attached data file from: Ensembl Genomes
Complete the Regex fields: Find Regex: .*gene_id=([^;]*);.* and Replacement: \1
Receive interactive feedback on your Regex BEFORE clicking the "Run Tool" button
Expected behavior
Something that prevents user from wasting time trying out regular expressions on their very big files that might take several minutes to process
I've assigned myself for a student I'm supervising who would like to take over this project. Once we've got them set-up, I will re-assign this issue to them.
BACKGROUND Many ways to potentially improve on this would suggest Issue Assignee do some "field research" and/or "user webinar research" to find the best solution before starting this ticket (or before dismissing this ticket) - so not a "quick fix" ticket. I personally came across this issue when parsing data files from Ensembl Genomes (Protists) when recreating steps (from actual user story) for new BRC Analytics users (coming from the legacy VeuPathDb site). (Screenshot and data file used included in this ticket.)
ISSUE Suggesting a further enhancement to improve ALL REGULAR EXPRESSION FORM FIELDS on the Tool Form page. Should help resolve potential issues, while the "Help" content is good it: (1) may be a bit too specific for the sample use-case given all the various types of users we have and (2) contains "see Python" note which is outside the Galaxy "no programming language/code" ethos and (3) even users familiar with regular expressions may be surprised they have to re-learn for Python's particular regular expression pattern (as there are lots of regular expression engines in existence).
SOLUTION Would look for an Interactive solution (not just wording) that prevents user from wasting time trying out regular expressions on their very big files that might take several minutes to process. Some possibilities (from straightforward to more complex): (1) display a "Try your Regex" field/pop-up where user could paste in a single line of content that would show them their regex match before running the Tool Form; maybe the "single line of content" is auto-populated even from their file; google "online regular expression sandbox" for example. (2) Show a "Preview" window maybe of the the first 1,000 characters of their file as they type their regular expression in the form field; could also include a "Results Count" if helpful. (3) Other possible solutions sure to exist, see "field research" not in first/Background paragraph.
Galaxy Version and/or server at which you observed the bug Galaxy Version: version 24.1.3.dev0 (on production / US Main as of Oct 10, 2024)
To Reproduce Steps to reproduce the behavior:
.*gene_id=([^;]*);.*
and Replacement:\1
Expected behavior Something that prevents user from wasting time trying out regular expressions on their very big files that might take several minutes to process
Screenshots
Additional context Attached data file: Cut_on_data_2.tabular.zip