broadinstitute / gatk-protected

Obsolete/Legacy GATK repository -- go to https://github.com/broadinstitute/gatk instead
BSD 3-Clause "New" or "Revised" License
33 stars 20 forks source link

M2FiltersArgumentCollection arguments are not recognized by Mutect2 nor do they show up in the docs #1133

Closed sooheelee closed 7 years ago

sooheelee commented 7 years ago

This is likely intentional but we should still show these parameters in the #docs.

Test command:

WMCF9-CB5:hellbender-protected shlee$ ./gatk-launch Mutect2 -I ~/Documents/workshop_materials/mutect2_tutorial/mutect2_handson/hcc1143_T_subset50K.bam -tumor HCC1143_tumor -O m2_test_06012017.vcf -R ~/Documents/ref/hg38/Homo_sapiens_assembly38.fasta --tumor_lod 0 
Using GATK wrapper script /Users/shlee/Documents/branches/hellbender-protected/build/install/gatk-protected/bin/gatk-protected
Running:
    /Users/shlee/Documents/branches/hellbender-protected/build/install/gatk-protected/bin/gatk-protected Mutect2 -I /Users/shlee/Documents/workshop_materials/mutect2_tutorial/mutect2_handson/hcc1143_T_subset50K.bam -tumor HCC1143_tumor -O m2_test_06012017.vcf -R /Users/shlee/Documents/ref/hg38/Homo_sapiens_assembly38.fasta --tumor_lod 0
USAGE: Mutect2 [arguments]

Call somatic SNPs and indels via local re-assembly of haplotypes
Version:c25c2f4b-SNAPSHOT

Error message:

***********************************************************************

A USER ERROR has occurred: tumor_lod is not a recognized option

***********************************************************************
Use -DSTACK_TRACE_ON_USEREXCEPTION to print the stack trace.

Here's what the autodoc shows. Notice the lack of --tumor_lod description. This is only 1 of 13 parameters within the M2FiltersArgumentCollection that are absent.

WMCF9-CB5:hellbender-protected shlee$ ./gatk-launch Mutect2 -h
Using GATK wrapper script /Users/shlee/Documents/branches/hellbender-protected/build/install/gatk-protected/bin/gatk-protected
Running:
    /Users/shlee/Documents/branches/hellbender-protected/build/install/gatk-protected/bin/gatk-protected Mutect2 -h
USAGE: Mutect2 [arguments]

Call somatic SNPs and indels via local re-assembly of haplotypes
Version:c25c2f4b-SNAPSHOT

Required Arguments:

--input,-I:String             BAM/SAM/CRAM file containing reads  This argument must be specified at least once.
                              Required. 

--output,-O:File              File to which variants should be written  Required. 

--reference,-R:String         Reference sequence file  Required. 

--tumorSampleName,-tumor:String
                              BAM sample name of tumor  Required. 

Optional Arguments:

--addOutputSAMProgramRecord,-addOutputSAMProgramRecord:Boolean
                              If true, adds a PG tag to created SAM/BAM/CRAM files.  Default value: true. Possible
                              values: {true, false} 

--af_of_alleles_not_in_resurce,-default_af:Double
                              Population allele fraction assigned to alleles not found in germline resource.  A
                              reasonable value is1/(2* number of samples in resource) if a germline resource is
                              available; otherwise an average heterozygosity rate such as 0.001 is reasonable.  Default
                              value: 0.001. 

--alleles,-alleles:FeatureInput
                              The set of alleles at which to genotype when --genotyping_mode is GENOTYPE_GIVEN_ALLELES 
                              Default value: null. 

--annotateNDA,-nda:Boolean    If provided, we will annotate records with the number of alternate alleles that were
                              discovered (but not necessarily genotyped) at a given site  Default value: false. Possible
                              values: {true, false} 

--arguments_file:File         read one or more arguments files and add them to the command line  This argument may be
                              specified 0 or more times. Default value: null. 

--assemblyRegionPadding,-assemblyRegionPadding:Integer
                              Number of additional bases of context to include around each assembly region  Default
                              value: 100. 

--base_quality_score_threshold,-bqst:Byte
                              Base qualities below this threshold will be reduced to the minimum (6)  Default value: 18.

--cloudIndexPrefetchBuffer,-CIPB:Integer
                              Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to
                              cloudPrefetchBuffer if unset.  Default value: -1. 

--cloudPrefetchBuffer,-CPB:Integer
                              Size of the cloud-only prefetch buffer (in MB; 0 to disable).  Default value: 40. 

--contamination_fraction_to_filter,-contamination:Double
                              Fraction of contamination in sequencing data (for all samples) to aggressively remove 
                              Default value: 0.0. 

--createOutputBamIndex,-OBI:Boolean
                              If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.  Default
                              value: true. Possible values: {true, false} 

--createOutputBamMD5,-OBM:Boolean
                              If true, create a MD5 digest for any BAM/SAM/CRAM file created  Default value: false.
                              Possible values: {true, false} 

--createOutputVariantIndex,-OVI:Boolean
                              If true, create a VCF index when writing a coordinate-sorted VCF file.  Default value:
                              true. Possible values: {true, false} 

--createOutputVariantMD5,-OVM:Boolean
                              If true, create a a MD5 digest any VCF file created.  Default value: false. Possible
                              values: {true, false} 

--dbsnp,-D:FeatureInput       dbSNP file  Default value: null. 

--disableBamIndexCaching,-DBIC:Boolean
                              If true, don't cache bam indexes, this will reduce memory requirements but may harm
                              performance if many intervals are specified.  Caching is automatically disabled if there
                              are no intervals specified.  Default value: false. Possible values: {true, false} 

--disableReadFilter,-DF:StringRead filters to be disabled before analysis  This argument may be specified 0 or more
                              times. Default value: null. 

--disableSequenceDictionaryValidation,-disableSequenceDictionaryValidation:Boolean
                              If specified, do not check the sequence dictionaries from our inputs for compatibility.
                              Use at your own risk!  Default value: false. Possible values: {true, false} 

--disableToolDefaultReadFilters,-disableToolDefaultReadFilters:Boolean
                              Disable all tool default read filters  Default value: false. Possible values: {true,
                              false} 

--excludeIntervals,-XL:String One or more genomic intervals to exclude from processing  This argument may be specified 0
                              or more times. Default value: null. 

--genotyping_mode,-gt_mode:GenotypingOutputMode
                              Specifies how to determine the alternate alleles to use for genotyping  Default value:
                              DISCOVERY. Possible values: {DISCOVERY, GENOTYPE_GIVEN_ALLELES} 

--germline_resource:FeatureInput
                              Population vcf of germline sequencing containing allele fractions  Default value: null. 

--graphOutput,-graph:String   Write debug assembly graph information to this file  Default value: null. 

--group,-G:String             One or more classes/groups of annotations to apply to variant calls  This argument may be
                              specified 0 or more times. Default value: null. 

--help,-h:Boolean             display the help message  Default value: false. Possible values: {true, false} 

--heterozygosity,-hets:Double Heterozygosity value used to compute prior likelihoods for any locus.  See the GATKDocs
                              for full details on the meaning of this population genetics concept  Default value: 0.001.

--heterozygosity_stdev,-heterozygosityStandardDeviation:Double
                              Standard deviation of eterozygosity for SNP and indel calling.  Default value: 0.01. 

--indel_heterozygosity,-indelHeterozygosity:Double
                              Heterozygosity for indel calling.  See the GATKDocs for heterozygosity for full details on
                              the meaning of this population genetics concept  Default value: 1.25E-4. 

--interval_exclusion_padding,-ixp:Integer
                              Amount of padding (in bp) to add to each interval you are excluding.  Default value: 0. 

--interval_padding,-ip:IntegerAmount of padding (in bp) to add to each interval you are including.  Default value: 0. 

--interval_set_rule,-isr:IntervalSetRule
                              Set merging approach to use for combining interval inputs  Default value: UNION. Possible
                              values: {UNION, INTERSECTION} 

--intervals,-L:String         One or more genomic intervals over which to operate  This argument may be specified 0 or
                              more times. Default value: null. 

--lenient,-LE:Boolean         Lenient processing of VCF files  Default value: false. Possible values: {true, false} 

--log_somatic_prior:Double    Prior probability that a given site has a somatic allele.  Default value: -6.0. 

--maxAssemblyRegionSize,-maxAssemblyRegionSize:Integer
                              Maximum size of an assembly region  Default value: 300. 

--maxReadsPerAlignmentStart,-maxReadsPerAlignmentStart:Integer
                              Maximum number of reads to retain per alignment start position. Reads above this threshold
                              will be downsampled. Set to 0 to disable.  Default value: 50. 

--min_base_quality_score,-mbq:Byte
                              Minimum base quality required to consider a base for calling  Default value: 10. 

--min_variants_in_pileup:Integer
                              Minimum number of reads in pileup to be considered active region.  Default value: 2. 

--minAssemblyRegionSize,-minAssemblyRegionSize:Integer
                              Minimum size of an assembly region  Default value: 50. 

--minNormalVariantFraction:Double
                              Minimum number of reads in pileup to be considered active region.  Default value: 0.1. 

--nativePairHmmThreads,-threads:Integer
                              How many threads should a native pairHMM implementation use  Default value: 1. 

--normal_lod:Double           LOD threshold for calling normal non-germline  Default value: 2.2. 

--normal_panel,-PON:FeatureInput
                              VCF file of sites observed in normal  Default value: null. 

--normalSampleName,-normal:String
                              BAM sample name of tumor  Default value: null. 

--output_mode,-out_mode:OutputMode
                              Specifies which type of calls we should output  Default value: EMIT_VARIANTS_ONLY.
                              Possible values: {EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES} 

--power_constant_qscore:Integer
                              Phred scale quality score constant to use in power calculations  Default value: 30. 

--QUIET:Boolean               Whether to suppress job-summary info on System.err.  Default value: false. Possible
                              values: {true, false} 

--readFilter,-RF:String       Read filters to be applied before analysis  This argument may be specified 0 or more
                              times. Default value: null. 

--readIndex,-readIndex:String Indices to use for the read inputs. If specified, an index must be provided for every read
                              input and in the same order as the read inputs. If this argument is not specified, the
                              path to the index for each input will be inferred automatically.  This argument may be
                              specified 0 or more times. Default value: null. 

--readShardPadding,-readShardPadding:Integer
                              Each read shard has this many bases of extra context on each side. Read shards must have
                              as much or more padding than assembly regions.  Default value: 100. 

--readShardSize,-readShardSize:Integer
                              Maximum size of each read shard, in bases. For good performance, this should be much
                              larger than the maximum assembly region size.  Default value: 5000. 

--readValidationStringency,-VS:ValidationStringency
                              Validation stringency for all SAM/BAM/CRAM/SRA files read by this program.  The default
                              stringency value SILENT can improve performance when processing a BAM file in which
                              variable-length data (read, qualities, tags) do not otherwise need to be decoded.  Default
                              value: SILENT. Possible values: {STRICT, LENIENT, SILENT} 

--recoverDanglingHeads,-recoverDanglingHeads:Boolean
                              This argument is deprecated since version 3.3  Default value: false. Possible values:
                              {true, false} 

--sample_ploidy,-ploidy:Integer
                              Ploidy (number of chromosomes) per sample. For pooled data, set to (Number of samples in
                              each pool * Sample Ploidy).  Default value: 2. 

--secondsBetweenProgressUpdates,-secondsBetweenProgressUpdates:Double
                              Output traversal statistics every time this many seconds elapse  Default value: 10.0. 

--standard_min_confidence_threshold_for_calling,-stand_call_conf:Double
                              The minimum phred-scaled confidence threshold at which variants should be called  Default
                              value: 10.0. 

--TMP_DIR:File                Undocumented option  This argument may be specified 0 or more times. Default value: null. 

--tumor_lod_to_emit:Double    LOD threshold for emit tumor variant  Default value: 3.0. 

--tumorStandardDeviationsThreshold:Integer
                              How many standard deviations above the expected number of variant reads due to error we
                              require for a tumor pielup to be considered active.  Default value: 2. 

--use_jdk_deflater,-jdk_deflater:Boolean
                              Whether to use the JdkDeflater (as opposed to IntelDeflater)  Default value: false.
                              Possible values: {true, false} 

--use_jdk_inflater,-jdk_inflater:Boolean
                              Whether to use the JdkInflater (as opposed to IntelInflater)  Default value: false.
                              Possible values: {true, false} 

--useDoublePrecision,-useDoublePrecision:Boolean
                              use double precision in the native pairHmm. This is slower but matches the java
                              implementation better  Default value: false. Possible values: {true, false} 

--useNewAFCalculator,-newQual:Boolean
                              If provided, we will use the new AF model instead of the so-called exact model  Default
                              value: false. Possible values: {true, false} 

--verbosity,-verbosity:LogLevel
                              Control verbosity of logging.  Default value: INFO. Possible values: {ERROR, WARNING,
                              INFO, DEBUG} 

--version:Boolean             display the version number for this tool  Default value: false. Possible values: {true,
                              false} 

Advanced Arguments:

--activeProbabilityThreshold,-activeProbabilityThreshold:Double
                              Minimum probability for a locus to be considered active.  Default value: 0.002. 

--allowNonUniqueKmersInRef,-allowNonUniqueKmersInRef:Boolean
                              Allow graphs that have non-unique kmers in the reference  Default value: false. Possible
                              values: {true, false} 

--allSitePLs,-allSitePLs:Boolean
                              Annotate all sites with PLs  Default value: false. Possible values: {true, false} 

--annotation,-A:String        One or more specific annotations to apply to variant calls  This argument may be specified
                              0 or more times. Default value: [Coverage, DepthPerAlleleBySample, TandemRepeat,
                              OxoGReadCounts, ClippedBases, ReadPosition, BaseQuality, MappingQuality, FragmentLength,
                              StrandArtifact]. 

--bamOutput,-bamout:String    File to which assembled haplotypes should be written  Default value: null. 

--bamWriterType,-bamWriterType:WriterType
                              Which haplotypes should be written to the BAM  Default value: CALLED_HAPLOTYPES. Possible
                              values: {ALL_POSSIBLE_HAPLOTYPES, CALLED_HAPLOTYPES} 

--comp,-comp:FeatureInput     Comparison VCF file(s)  This argument may be specified 0 or more times. Default value:
                              null. 

--consensus,-consensus:Boolean1000G consensus mode  Default value: false. Possible values: {true, false} 

--contamination_fraction_per_sample_file,-contaminationFile:File
                              Tab-separated File containing fraction of contamination in sequencing data (per sample) to
                              aggressively remove. Format should be "<SampleID><TAB><Contamination>" (Contamination is
                              double) per line; No header.  Default value: null. 

--debug,-debug:Boolean        Print out very verbose debug information about each triggering active region  Default
                              value: false. Possible values: {true, false} 

--disableOptimizations,-disableOptimizations:Boolean
                              Don't skip calculations in ActiveRegions with no variants  Default value: false. Possible
                              values: {true, false} 

--doNotRunPhysicalPhasing,-doNotRunPhysicalPhasing:Boolean
                              Disable physical phasing  Default value: false. Possible values: {true, false} 

--dontIncreaseKmerSizesForCycles,-dontIncreaseKmerSizesForCycles:Boolean
                              Disable iterating over kmer sizes when graph cycles are detected  Default value: false.
                              Possible values: {true, false} 

--dontTrimActiveRegions,-dontTrimActiveRegions:Boolean
                              If specified, we will not trim down the active region from the full region (active +
                              extension) to just the active interval for genotyping  Default value: false. Possible
                              values: {true, false} 

--dontUseSoftClippedBases,-dontUseSoftClippedBases:Boolean
                              Do not analyze soft clipped bases in the reads  Default value: false. Possible values:
                              {true, false} 

--emitRefConfidence,-ERC:ReferenceConfidenceMode
                              Mode for emitting reference confidence scores  Default value: NONE. Possible values:
                              {NONE, BP_RESOLUTION, GVCF} 

--excludeAnnotation,-XA:StringOne or more specific annotations to exclude  This argument may be specified 0 or more
                              times. Default value: null. 

--gcpHMM,-gcpHMM:Integer      Flat gap continuation penalty for use in the Pair HMM  Default value: 10. 

--input_prior,-inputPrior:Double
                              Input prior for calls  This argument may be specified 0 or more times. Default value:
                              null. 

--kmerSize,-kmerSize:Integer  Kmer size to use in the read threading assembler  This argument may be specified 0 or more
                              times. Default value: [10, 25]. 

--max_alternate_alleles,-maxAltAlleles:Integer
                              Maximum number of alternate alleles to genotype  Default value: 6. 

--max_genotype_count,-maxGT:Integer
                              Maximum number of genotypes to consider at any site  Default value: 1024. 

--maxNumHaplotypesInPopulation,-maxNumHaplotypesInPopulation:Integer
                              Maximum number of haplotypes to consider for your population  Default value: 128. 

--maxProbPropagationDistance,-maxProbPropagationDistance:Integer
                              Upper limit on how many bases away probability mass can be moved around when calculating
                              the boundaries between active and inactive assembly regions  Default value: 50. 

--minDanglingBranchLength,-minDanglingBranchLength:Integer
                              Minimum length of a dangling branch to attempt recovery  Default value: 4. 

--minPruning,-minPruning:Integer
                              Minimum support to not prune paths in the graph  Default value: 2. 

--numPruningSamples,-numPruningSamples:Integer
                              Number of samples that must pass the minPruning threshold  Default value: 1. 

--pcr_indel_model,-pcrModel:PCRErrorModel
                              The PCR indel model to use  Default value: CONSERVATIVE. Possible values: {NONE, HOSTILE,
                              AGGRESSIVE, CONSERVATIVE} 

--phredScaledGlobalReadMismappingRate,-globalMAPQ:Integer
                              The global assumed mismapping rate for reads  Default value: 45. 

--showHidden,-showHidden:Boolean
                              display hidden arguments  Default value: false. Possible values: {true, false} 

--useFilteredReadsForAnnotations,-useFilteredReadsForAnnotations:Boolean
                              Use the contamination-filtered read maps for the purposes of annotating variants  Default
                              value: false. Possible values: {true, false} 

Conditional Arguments for readFilter:

Valid only if "AmbiguousBaseReadFilter" is specified:
--ambigFilterFrac:Double      Threshold fraction of ambiguous bases  Default value: 0.05. 

Valid only if "FragmentLengthReadFilter" is specified:
--maxFragmentLength,-maxFragmentLength:Integer
                              Keep only read pairs with fragment length at most equal to the given value  Default value:
                              1000000. 

Valid only if "LibraryReadFilter" is specified:
--library,-library:String     The name of the library to keep  Required. 

Valid only if "MappingQualityReadFilter" is specified:
--maximumMappingQuality,-maximumMappingQuality:Integer
                              Maximum mapping quality to keep (inclusive)  Default value: null. 

--minimumMappingQuality,-minimumMappingQuality:Integer
                              Minimum mapping quality to keep (inclusive)  Default value: 20. 

Valid only if "OverclippedReadFilter" is specified:
--dontRequireSoftClipsBothEnds,-dontRequireSoftClipsBothEnds:Boolean
                              Allow a read to be filtered out based on having only 1 soft-clipped block. By default,
                              both ends must have a soft-clipped block, setting this flag requires only 1 soft-clipped
                              block.  Default value: false. Possible values: {true, false} 

--filterTooShort,-filterTooShort:Integer
                              Value for which reads with less than this number of aligned bases is considered too short 
                              Default value: 30. 

Valid only if "PlatformReadFilter" is specified:
--PLFilterName,-PLFilterName:String
                              Keep reads with RG:PL attribute containing this string  This argument must be specified at
                              least once. Required. 

Valid only if "PlatformUnitReadFilter" is specified:
--blackListedLanes,-blackListedLanes:String
                              Keep reads with platform units not on the list  This argument must be specified at least
                              once. Required. 

Valid only if "ReadGroupBlackListReadFilter" is specified:
--blackList,-blackList:String This argument must be specified at least once. Required. 

Valid only if "ReadGroupReadFilter" is specified:
--keepReadGroup,-keepReadGroup:String
                              The name of the read group to keep  Required. 

Valid only if "ReadLengthReadFilter" is specified:
--maxReadLength,-maxReadLength:Integer
                              Keep only reads with length at most equal to the specified value  Required. 

--minReadLength,-minReadLength:Integer
                              Keep only reads with length at least equal to the specified value  Default value: 1. 

Valid only if "ReadNameReadFilter" is specified:
--readName,-readName:String   Keep only reads with this read name  Required. 

Valid only if "ReadStrandFilter" is specified:
--keepReverse,-keepReverse:Boolean
                              Keep only reads on the reverse strand  Required. Possible values: {true, false} 

Valid only if "SampleReadFilter" is specified:
--sample,-sample:String       The name of the sample(s) to keep, filtering out all others  This argument must be
                              specified at least once. Required. 

Tool returned:
0
sooheelee commented 7 years ago

Tagging @cmnbroad as requested.

cmnbroad commented 7 years ago

Looks like Mutect2 has a private Mutect2Engine, and Mutect2Engine has an M2ArgumentCollection, both of which contain fields with @Argument annotations. For these (nested) arguments to be visible to the command line parser/doc, they need to be both annotated in the containing class with @ArgumentCollection, and instantiated by the constructor (i.e., Mutect2's Mutect2Engine would have to be instantiated by the constructor). Otherwise, they may have to be refactored.

davidbenjamin commented 7 years ago

@sooheelee @cmnbroad These are arguments for the stand-alone filtering CLI FilterMutectCalls. There were good reasons for doing this, which I think will be worth the nuisance.

vdauwera commented 7 years ago

So to be clear, you guys have a handle on how to make the arguments appear in the docs? Is this something you can do in the near future? We need the arguments to appear in the docs for the beta release.

sooheelee commented 7 years ago

Yes, all solved. These parameters that I thought were missing in Mutect2 are actually now in the new tool FilterMutectCalls and do show up in the documentation.