galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.42k stars 1.01k forks source link

Enhancement: improve Regex field help on Tool Form page #18978

Open hujambo-dunia opened 1 month ago

hujambo-dunia commented 1 month ago

BACKGROUND Many ways to potentially improve on this would suggest Issue Assignee do some "field research" and/or "user webinar research" to find the best solution before starting this ticket (or before dismissing this ticket) - so not a "quick fix" ticket. I personally came across this issue when parsing data files from Ensembl Genomes (Protists) when recreating steps (from actual user story) for new BRC Analytics users (coming from the legacy VeuPathDb site). (Screenshot and data file used included in this ticket.)

ISSUE Suggesting a further enhancement to improve ALL REGULAR EXPRESSION FORM FIELDS on the Tool Form page. Should help resolve potential issues, while the "Help" content is good it: (1) may be a bit too specific for the sample use-case given all the various types of users we have and (2) contains "see Python" note which is outside the Galaxy "no programming language/code" ethos and (3) even users familiar with regular expressions may be surprised they have to re-learn for Python's particular regular expression pattern (as there are lots of regular expression engines in existence).

SOLUTION Would look for an Interactive solution (not just wording) that prevents user from wasting time trying out regular expressions on their very big files that might take several minutes to process. Some possibilities (from straightforward to more complex): (1) display a "Try your Regex" field/pop-up where user could paste in a single line of content that would show them their regex match before running the Tool Form; maybe the "single line of content" is auto-populated even from their file; google "online regular expression sandbox" for example. (2) Show a "Preview" window maybe of the the first 1,000 characters of their file as they type their regular expression in the form field; could also include a "Results Count" if helpful. (3) Other possible solutions sure to exist, see "field research" not in first/Background paragraph.

Galaxy Version and/or server at which you observed the bug Galaxy Version: version 24.1.3.dev0 (on production / US Main as of Oct 10, 2024)

To Reproduce Steps to reproduce the behavior:

  1. Go to: "Tools" panel >> find tool: "Regex Find And Replace"
  2. Upload attached data file from: Ensembl Genomes
  3. Complete the Regex fields: Find Regex: .*gene_id=([^;]*);.* and Replacement: \1
  4. Receive interactive feedback on your Regex BEFORE clicking the "Run Tool" button

Expected behavior Something that prevents user from wasting time trying out regular expressions on their very big files that might take several minutes to process

Screenshots Greenshot 2024-10-10 10 54 56

Additional context Attached data file: Cut_on_data_2.tabular.zip

ElectronicBlueberry commented 1 month ago

I've assigned myself for a student I'm supervising who would like to take over this project. Once we've got them set-up, I will re-assign this issue to them.