broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.71k stars 592 forks source link

The CalculateContamination Bug Report #7707

Open gaze-abyss opened 2 years ago

gaze-abyss commented 2 years ago

The CalculateContamination Bug Report

Hello, I have a problem to ask you:

I running this command in the gatk4-4.2.3.0-0: gatk CalculateContamination -I gewb.tumor.pileups.table -matched gewb.normal.pileups.table -O gewb.contamination.table

the following information is displayed:

Using GATK jar /cluster/home/jialu/miniconda3/envs/wes2/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /cluster/home/jialu/miniconda3/envs/wes2/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar CalculateContamination -I gewb.tumor.pileups.table -matched gewb.normal.pileups.table -O gewb.contamination.table 19:10:31.163 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/cluster/home/jialu/miniconda3/envs/wes2/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so Mar 06, 2022 7:10:31 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine INFO: Failed to detect whether we are running on Google Compute Engine. 19:10:31.437 INFO CalculateContamination - ------------------------------------------------------------ 19:10:31.437 INFO CalculateContamination - The Genome Analysis Toolkit (GATK) v4.2.3.0 19:10:31.437 INFO CalculateContamination - For support and documentation go to https://software.broadinstitute.org/gatk/ 19:10:31.438 INFO CalculateContamination - Executing as haojie@node1 on Linux v3.10.0-957.el7.x86_64 amd64 19:10:31.438 INFO CalculateContamination - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_302-b08 19:10:31.438 INFO CalculateContamination - Start Date/Time: March 6, 2022 7:10:31 PM CST 19:10:31.438 INFO CalculateContamination - ------------------------------------------------------------ 19:10:31.438 INFO CalculateContamination - ------------------------------------------------------------ 19:10:31.439 INFO CalculateContamination - HTSJDK Version: 2.24.1 19:10:31.439 INFO CalculateContamination - Picard Version: 2.25.4 19:10:31.439 INFO CalculateContamination - Built for Spark Version: 2.4.5 19:10:31.439 INFO CalculateContamination - HTSJDK Defaults.COMPRESSION_LEVEL : 2 19:10:31.439 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 19:10:31.439 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 19:10:31.439 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 19:10:31.439 INFO CalculateContamination - Deflater: IntelDeflater 19:10:31.439 INFO CalculateContamination - Inflater: IntelInflater 19:10:31.439 INFO CalculateContamination - GCS max retries/reopens: 20 19:10:31.439 INFO CalculateContamination - Requester pays: disabled 19:10:31.439 INFO CalculateContamination - Initializing engine 19:10:31.439 INFO CalculateContamination - Done initializing engine 19:10:31.451 INFO CalculateContamination - Shutting down engine [March 6, 2022 7:10:31 PM CST] org.broadinstitute.hellbender.tools.walkers.contamination.CalculateContamination done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=2141192192 java.lang.IllegalArgumentException: there is no such column: contig at org.broadinstitute.hellbender.utils.tsv.DataLine.columnIndex(DataLine.java:483) at org.broadinstitute.hellbender.utils.tsv.DataLine.get(DataLine.java:452) at org.broadinstitute.hellbender.utils.tsv.DataLine.get(DataLine.java:581) at org.broadinstitute.hellbender.tools.walkers.contamination.PileupSummary$PileupSummaryTableReader.createRecord(PileupSummary.java:193) at org.broadinstitute.hellbender.tools.walkers.contamination.PileupSummary$PileupSummaryTableReader.createRecord(PileupSummary.java:188) at org.broadinstitute.hellbender.utils.tsv.TableReader.fetchNextRecord(TableReader.java:364) at org.broadinstitute.hellbender.utils.tsv.TableReader.access$200(TableReader.java:99) at org.broadinstitute.hellbender.utils.tsv.TableReader$1.hasNext(TableReader.java:472) at java.util.Iterator.forEachRemaining(Iterator.java:115) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.broadinstitute.hellbender.utils.tsv.TableReader.toList(TableReader.java:532) at org.broadinstitute.hellbender.tools.walkers.contamination.PileupSummary.readFromFile(PileupSummary.java:139) at org.broadinstitute.hellbender.tools.walkers.contamination.CalculateContamination.doWork(CalculateContamination.java:116) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)

And I use dbsnp_146.hg38.vcf.gz in the previous step. Could you please tell me how to solve it? Thank you

gaze-abyss commented 2 years ago

Here is a demonstration of the two input files: chr1 69091 A AAAAAAAAAAAAAAAAAAAAAAAA <AAA<AAABBABAB=AAABBAAAB chr1 69092 T TTTTTTTTTTTTTTTTTTTTTTTT BBBBBCBBCCBC@DBBBBDCBBCC chr1 69093 G GGGGGGGGGGGGGGGGGGGGGGG IJ-JKKJKKJ;JEJJJJEEJJKE chr1 69094 G GGGGGGGGGGGGGGGGGGGGGG HIHHIIIIHFHGHHHHGFHHIF chr1 69095 T TTTTTTTTTTTTTTTTTTTTTT AAABAAABACACAAAADCAAAC

droazen commented 2 years ago

@gaze-abyss Can you check your .table input files to see whether they have a header that looks like this:

#<METADATA>SAMPLE=sample
contig  position        ref_count       alt_count       other_alt_count allele_frequency

The error message indicates that the tool is not finding the "contig" column for some reason, and a malformed header line is one possiblity.