galaxyproject / tools-iuc

Tool Shed repositories maintained by the Intergalactic Utilities Commission
https://galaxyproject.org/iuc
MIT License
159 stars 421 forks source link

Picard AddOrReplaceReadGroups can be erroneously run without setting PU #1736

Closed blankenberg closed 5 years ago

blankenberg commented 6 years ago

You can execute Picard AddOrReplaceReadGroups tool without setting Platform Unit (PU), however setting PU is required and results in a runtime error:

ERROR: Option 'RGPU' is required.

USAGE: AddOrReplaceReadGroups [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups

Replace read groups in a BAM file.This tool enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file.

For more information about read groups, see the GATK Dictionary entry. (https://www.broadinstitute.org/gatk/guide/article?id=6472) 

 This tool accepts INPUT BAM and SAM files or URLs from the Global Alliance for Genomics and Health (GA4GH) (see http://ga4gh.org/#/documentation).
Usage example:

java -jar picard.jar AddOrReplaceReadGroups \
      I=input.bam \
      O=output.bam \
      RGID=4 \
      RGLB=lib1 \
      RGPL=illumina \
      RGPU=unit1 \
      RGSM=20

Version: 2.7.1-SNAPSHOT

This should be a validation error that prevents the job from being submitted.

jennaj commented 6 years ago

Including a default value (as for a the other required options) is also an option. Or, have do what Dan states and have the form not submit and highlight the required value.

Came up at Gitter, so is still impacting users in general (not just coursica students): https://gitter.im/galaxyproject/Lobby?at=5bca0c1eae7be9401682bebe

Prior ticket for same issue when was under devteam (from 2016, closing it as a duplicate): https://github.com/galaxyproject/tools-devteam/issues/423

jennaj commented 5 years ago

Test in release 19.01

jennaj commented 5 years ago

@davebx The change is not present in version 2.18.2.1 as installed at main or eu. That is the most current tool version in the MTS.

Is this because tool not get a revision bump? Or has the MTS not been updated? Or do we just need to install again from the MTS? (everywhere, all servers).

These smaller changes without revision bumps are hard to track.. appreciate help figuring it out!

jennaj commented 5 years ago

Never put linked in test histories for public servers, here they are. Once fixed, can test org.au too and let them know to update as needed

https://usegalaxy.org:/u/jen/h/test-ncbi-sra-tools https://usegalaxy.eu:/u/jenj/h/test-picard-addorreplacereadgroups-21821

jennaj commented 5 years ago

The MTS was not updated to include this fix yet. Any ideas about why or how to move this forward? Ping @davebx @bgruening cc @jmchilton

Code in MTS has the old code, not what this PR does https://github.com/galaxyproject/tools-iuc/pull/2211

from read_group_macros.xml browsed in MTS

    #if $rg_param("PU")
        #set $rg_pu = str($rg_param("PU"))
    #else
        #set $rg_pu = ''
nsoranzo commented 5 years ago

@jennaj #2211 only added the default value run for PU, which is present in revision 22:f6ced08779c4 on the MTS. The code you pasted above was not change by #2211.

You probably may want to test by starting the tool from scratch instead of re-running?

jennaj commented 5 years ago

Hum, I coped that code from the repo "view tip files" in the MTS. Weird. Could you double check? Or is there a MTS problem?

jennaj commented 5 years ago

Retested on both Main and EU with fresh test data/histories. Fails both places.

AddOrReplaceReadGroups add or replaces read group information (Galaxy Version 2.18.2.1)

https://usegalaxy.org:/u/jen/h/test-picard-addorreplacereadgroups https://usegalaxy.eu:/u/jenj/h/test-picard-addorreplacereadgroups

Tool form doesn't have an "Auto-assign" toggle and doesn't add in "run" for PU value at runtime. The command-line doesn't have the PU default info either.

Thanks for helping to sort this out!

Error

Dataset Error
An error occured while running the tool toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_AddOrReplaceReadGroups/2.18.2.1.

Tool execution generated the following messages:

Fatal error: Exit code 1 ()
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/jobdir/023/422/23422629/_job_tmp -Xmx7g -Xms256m
ERROR: Option 'RGPU' is required.

USAGE: AddOrReplaceReadGroups [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups

Assigns all the reads in a file to a single new read-group.

This tool accepts INPUT BAM and SAM files or URLs from the Global Alliance for Genomics and Health (GA4GH)
(http://ga4gh.org/#/documentation).

Usage example:

java -jar picard.jar AddOrReplaceReadGroups \
I=input.bam \
O=output.bam \
RGID=4 \
RGLB=lib1 \
RGPL=illumina \
RGPU=unit1 \
RGSM=20

Caveats

The value of the tags must adhere (according to the SAM-spec (https://samtools.github.io/hts-specs/SAMv1.pdf)) with the
regex '^[ -~]+$'</code> (one or more characters from the ASCII range 32 through 126). In particular <Space> is the only
non-printing character allowed.

The program enables only the wholesale assignment of all the reads in the INPUT to a single read-group. If your file
already has reads assigned to multiple read-groups, the original RG value will be lost. 

For more information about read-groups, see the GATK Dictionary entry.
(https://www.broadinstitute.org/gatk/guide/article?id=6472)
Version: 2.18.2-SNAPSHOT

Options:

--help
-h                            Displays options specific to this tool.

--stdhelp
-H                            Displays options specific to this tool AND options common to all Picard command line
                              tools.

--version                     Displays program version.

INPUT=String
I=String                      Input file (BAM or SAM or a GA4GH url).  Required. 

OUTPUT=File
O=File                        Output file (BAM or SAM).  Required. 

SORT_ORDER=SortOrder
SO=SortOrder                  Optional sort order to output in. If not supplied OUTPUT is in the same order as INPUT. 
                              Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate,
                              unknown} 

RGID=String
ID=String                     Read-Group ID  Default value: 1. This option can be set to 'null' to clear the default
                              value. 

RGLB=String
LB=String                     Read-Group library  Required. 

RGPL=String
PL=String                     Read-Group platform (e.g. illumina, solid)  Required. 

RGPU=String
PU=String                     Read-Group platform unit (eg. run barcode)  Required. 

RGSM=String
SM=String                     Read-Group sample name  Required. 

RGCN=String
CN=String                     Read-Group sequencing center name  Default value: null. 

RGDS=String
DS=String                     Read-Group description  Default value: null. 

RGDT=Iso8601Date
DT=Iso8601Date                Read-Group run date  Default value: null. 

RGKS=String
KS=String                     Read-Group key sequence  Default value: null. 

RGFO=String
FO=String                     Read-Group flow order  Default value: null. 

RGPI=Integer
PI=Integer                    Read-Group predicted insert size  Default value: null. 

RGPG=String
PG=String                     Read-Group program group  Default value: null. 

RGPM=String
PM=String                     Read-Group platform model  Default value: null. 

Job info/command line:

Screen Shot 2019-05-08 at 6 02 44 PM
jennaj commented 5 years ago

BTW -- this tool might need a test case that triggers the auto-assign & default input functions .. thoughts on that? Not even sure if possible.

  <tests>
    <test>
      <param name="inputFile" value="picard_ARRG.bam" />
      <param name="LB" value="tumor-a" />
      <param name="PL" value="ILLUMINA" />
      <param name="PU" value="run-1" />
      <param name="SM" value="sample-a" />
      <param name="ID" value="id-1" />
      <output name="outFile" file="picard_ARRG_test1.bam" ftype="bam" />
    </test>
  </tests>
nsoranzo commented 5 years ago

Retested on both Main and EU with fresh test data/histories. Fails both places.

That just means that both servers need to update the tool, the default for "Platform unit (PU)" is "run" on our Galaxy instance, which is updated to 22:f6ced08779c4 . Ping @martenson @bgruening

martenson commented 5 years ago

added to https://github.com/galaxyproject/usegalaxy-playbook/projects/3

jennaj commented 5 years ago

@nsoranzo Ok, going to trust you on that. I don't see the change in the Tool Shed when browsing tip files (specifically read_group_macros.xml). But maybe something else is going with that.

jennaj commented 5 years ago

and I added it to our tool update tracking at usegalaxy.org here https://github.com/galaxyproject/usegalaxy-playbook/projects/3#column-5164217

and pinged eu that they will want to update, too

jennaj commented 5 years ago

OK, works now on Main. Thanks everyone!!