Closed konopinski closed 4 months ago
Could you check your BAM header to see if it contains any read groups? If not, you might need to use AddOrReplaceReadGroups
That was my case. If anyone encounters the same problem the full solution is:
Check if there's @RG
(read group) line in the problematic .bam header:
samtools view -H <your bam file> | grep '@RG'
If nothing is found, add 'read group' by e.g.
samtools addreplacerg -r "@RG\tID:ReadGroup1\tSM:SampleName\tPL:Illumina\tLB:Library.fa" -o <output.bam> <input.bam>
You can set ReadGroup1
and SampleName
and Library.fa
to something really existing in your dataset (lane number, real sample name and the name of the fasta file). I am not sure about Illumina
but it is true for my case so I left it as it is.
By the way - this is my call to anyone giving advice on bioinformatic tools. I really appreciate your time and effort - I really do. But think of your answer as if you were talking to a regular high-school student. Most of us here know Linux and programming as much as it is necessary to answer our biological questions. We do not know most of the IT guy's slang. Please, be more patient and give us more detailed solutions because otherwise you waste your time answering and we waste our time reading it. For us it's biology that matters, and all the bioinformatic tools are just the tools. In the end, you do not need to know how the processor works to use the computer.
Instructions
The github issue tracker is for bug reports, feature requests, and API documentation requests. General questions about how to use the GATK, how to interpret the output, etc. should be asked on the official support forum.
_
) as appropriateBug Report
Affected tool(s) or class(es)
Tool/class name(s), special parameters? MarkDuplicates
Affected version(s)
Description
Describe the problem below. Provide screenshots , stacktrace , logs where appropriate. I'm trying to use gatk for finding snps in exome capture project. I get an error when trying to use MarkDuplicates - I tried using it from picard and from gatk. The screen output is:
or from gatk
BTW I'm not sure what is "stacktrace" so I do not add one.
Steps to reproduce
bam files obtained as follows:
commands were
gatk MarkDuplicates I=WA02_i5-537_i7-98_S11819_L004.bam O=test.dup.bam M=marked_dup_metrics.txt
picard MarkDuplicates I=WA02_i5-537_i7-98_S11819_L004.bam O=test.dup.bam M=marked_dup_metrics.txt
(in the latter picard is alias for "java -jar /path/to/picard.jar"Expected behavior
I expect the normal action of the function - i.e. output file with marked duplicates
Actual behavior
See above