broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
975 stars 369 forks source link

CollectAlignmentSummaryMetrics - Adapter sequences incorrect use of adapter sequence parameters? #162

Closed nh13 closed 9 years ago

nh13 commented 9 years ago

See: http://sourceforge.net/p/samtools/mailman/message/33435794/

So this may be more challenging than initially thought. I am working of f the following branch: https://github.com/broadinstitute/picard/tree/nh_mark_duplicates_with_low_q_end Dear all,

After various unsuccessful attempts, and browsing the archive (partially fixed my problem), I am afraid I have to post here for help fixing my issue.

I am trying to use CollectAlignmentSummaryMetrics for bisulfite libraries. My problem is that adapters are not detected by Picard while I can clearly find 3.5% of the first 1,000,000 read pairs with the first mate containing the reverse complement of the reverse strand adapter (i.e. perfect match to the adapter sequence detected by grep command). Maybe I am using the ADAPTER_SEQUENCE of Picard wrong?

Similarly to the post https://sourceforge.net/p/samtools/mailman/message/32771613/ I was getting a lot of zeros when the reference genome sequence was not specified. However, giving a reference genome did not improve the detection of adapter sequences. Anything I am missing to detect the adapter sequences in my command line below ?

Here is the command I used (in doubt, I gave the two adapter sequences + the two reverse complement sequences, but apparently none are detected): java -jar /usr/local/src/picard-tools-1.128/picard.jar CollectAlignmentSummaryMetrics INPUT=/workspace/scratch/krue/Methylation/bwa_8lanes/C12.bam OUTPUT=/workspace/scratch/krue/Methylation/bwa_8lanes/C12.test_picard_summary.txt ADAPTER_SEQUENCE=null ADAPTER_SEQUENCE=GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ADAPTER_SEQUENCE=TACACTCTTTCCCTACACGACGCTCTTCCGATCT ADAPTER_SEQUENCE=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC ADAPTER_SEQUENCE=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA IS_BISULFITE_SEQUENCED=true REFERENCE_SEQUENCE=/workspace/storage/genomes/bostaurus/UMD3.1.75/source_file/Bos_taurus.UMD3.1.75.dna.toplevel.fa STOP_AFTER=100000

When I tested the SAM file, PicardValidateSamFile returned only one error: ERROR: Read name C12_TAGCTT_L002_R_001, A platform (PL) attribute was not found for read group

Many thanks in advance, and I hope I didn't miss an embarrassingly obvious explanation somewhere

vdauwera commented 9 years ago

The read group information should contain a platform (PL) attribute. If the issue still persists after the problem has been corrected, the user can post a followup on the GATK forum and we will troubleshoot further there.