galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 991 forks source link

Replace Text replaces intentionally left open "Replace with" with "[Object object]" #18545

Open Sch-Da opened 2 months ago

Sch-Da commented 2 months ago

Describe the bug When running the "Replace Text" Tool (toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.3+galaxy1) in a workflow, the intentionally left blank "replace with" part is unintentionally filled with "[object Object]". It runs smoothly outside of a workflow.

I am trying to use the "Replace Text" Tool to clean my text and delete some unnecessary passages. As I wanted to delete text, I did not fill the "Replace with" box. It works and gives a clean output when I run the tool alone. When it is part of a workflow, the "Replace with" that I intentionally left blank seems to be automatically filled with "[object Object]", resulting in a rather useless file.

Galaxy Version and/or server at which you observed the bug Galaxy Version: version_major | "24.1" version_minor | "2.dev0"

Browser and Operating System Operating System: Windows Browser: Firefox

To Reproduce Steps to reproduce the behavior:

  1. Go to WORKFLOW https://usegalaxy.eu/u/schnda/w/copy-of-comparing-differences-in-two-english-texts
  2. Run workflow with two text files - for example with https://openbible.com/textfiles/akjv.txt and https://openbible.com/textfiles/kjv.txt
  3. Run workflow in steps 5 + 6 of the workflow, pre-processing of text 1 and 2, I wanted to remove some text.

For "find pattern" I inserted: ^(.*?)\t

The Replace, I left open. But when checking again after running, it was replaced by [Object object]

Expected behavior Cleaning of text, removal of a part of the input, as shown here. This works when running alone - but somehow not in the workfow.

Screenshots this is the outcome if run independently screenshot 158

what it actually looks like in the workflow screenshot 157

Thanks for looking into it!

wm75 commented 2 months ago

Reproducing/debugging this is made more complicated by #18546 but I think this gets "fixed" by viewing the empty param on the WF run form. I'm able to reproduce the issue when just running the linked WF as is.

mvdbeek commented 2 months ago

While this is a bug, I would strongly encourage you to use a workflow parameter for this usecase. https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/workflow-parameters/tutorial.html#add-an-integer-workflow-parameter shows how to do this for integer parameters but it works the same way for text parameters.

Sch-Da commented 2 months ago

Thanks for pointing this out @mvdbeek. I checked the tutorial, but could you quickly explain the benefits of the workflow parameter here? That was not clear to me from the tutorial. Do you mean by using the parameter and setting it to text, I can likely avoid getting the error? And: Is this a best practice to use in general or rather a workaround limited to this case? Thank you!

Sch-Da commented 2 months ago

It worked in this workflow, where I got the cleaner text out. https://usegalaxy.eu/u/schnda/h/parameter-text

However, if I add the next steps of cleaning my text and want to remove the punctuation mark with the same tool, it again adds gibberish. If I am not mistaken, regex [^\w\s] should catch all the commas, dots, etc. I thought leaving the "remove" panel blank should just remove them. Instead, despite putting both inputs as a workflow parameter with text, the output replaces everything with various amounts of s and w.

screenshot 159

See history here: https://usegalaxy.eu/u/schnda/h/comparing---2-regex-text-parameters-for-removal-used

Wondering if it's me or a bug...

wm75 commented 2 months ago

Now that is not a bug (and certainly not WF-related) but just a consequence of the regex flavor used by the sed tool: escaped character class symbols like \w and \s simply don't work inside square brackets but are interpreted as the literal characters. In other words, you're discarding everything that is not a "w" or an "s". If you want to use character classes, take a look at, e.g., https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html.

mvdbeek commented 2 months ago

Thanks for pointing this out @mvdbeek. I checked the tutorial, but could you quickly explain the benefits of the workflow parameter here?

Sure, the main one is that we use the modern workflow run form, which doesn't let users alter the workflow, potentially in ways that are not valid. There's many ways in which the old workflow run form is broken (beyond the bug you found). Think of workflows like a program you write, you wouldn't want users to change things in the source code, instead you want to clearly show and describe what the valid parameters are. This is what workflow parameters do. They're also recorded for posterity and you can see them under the inputs tab of the executed workflow. If users change a parameter right inside the workflow it's kind of complicated to find out if and what they changed.

Sch-Da commented 2 months ago

Thanks @wm75 and @mvdbeek for your helpful feedback! I will incorporate this from now on.