galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.41k stars 1.01k forks source link

Implicit conversion in workflow causes results to differ if the input is compressed #19143

Closed Delphine-L closed 1 hour ago

Delphine-L commented 1 day ago

Describe the bug I am using a text transformation tool on a fasta input in a workflow. The text transformation requires a txt input. If the input is a fasta file, the output of the text transformation is a fasta, but if the input is a fasta.gz file, there is a conversion step that convert the fasta to tabular before the text transformation tool, and thee output of the workflow is then a tabular. The solution I am using to circumvent it is to explicitly asked the user if the input are compressed, and add optional decompression steps, but it adds complexity to the workflow and to the workflow form. This is related to issue #18709, but I open a new issue because one of a solution discussed was suggesting both : fileA (as fasta) and fileA(as tabular) for the inputs, but it wouldn't solve the issue inside the workflow.

Galaxy Version and/or server at which you observed the bug Galaxy Version: Main 24.1.3.dev0

Browser and Operating System Operating System: macOS Browser: Chrome

To Reproduce Steps to reproduce the behavior:

  1. Import workflow https://usegalaxy.org/u/delphinel/w/test-implicit-conversion
  2. Import history https://usegalaxy.org/u/delphinel/h/test-implicit-conversion-fastagz-to-txt
  3. Run the workflow on both the compressed and uncompressed fastas (datasets 1 and 2)
  4. Observe that the results are different (datasets 3 and 4)

Expected behavior Suggestions of solutions :

Delphine-L commented 1 day ago

Strangely the issue doesn't happen on vgp.usegalaxy.org, the fasta.gz is converted to fasta before being used as an input:

Screenshot 2024-11-14 at 12 05 46 PM
natefoo commented 1 day ago

A bit more detail from my testing:

usegalaxy.org and vgp.usegalaxy.org do run slightly different configs but the same copy of Galaxy itself, the same datatypes_conf.xml (the sample shipped with Galaxy at the revision it is running), and the same database, and I don't see any differing config options that would affect this.

mvdbeek commented 1 hour ago

I'm gonna close this as a duplicate of https://github.com/galaxyproject/galaxy/issues/18709 and prioritze a fix for that.