Closed ajwije closed 4 years ago
Hmm, can't you simply do
samtools view -H S1_combine.deduplicated.bam > header.txt
samtools view S1_combine.deduplicated.bam | grep chloroplast > chloroplast_reads.sam
cat header.txt chloroplast_reads.sam | samtools -bS - >chloroplast.extraction.bam
Cheers, Felix
Thanks, Felix for your quick reply. What you suggested worked and I was able to extract reads aligned to the chloroplast genome. Following is a report from the methylation calls. Is it accurate to say the bisulfite conversion didn't work?
chloroplast.extraction.bam
Parameters used to extract methylation information:
Bismark Extractor Version: v0.22.3
Bismark result file: paired-end (SAM format)
Output specified: strand-specific (default)
No overlapping methylation calls specified
Processed 10691 lines in total
Total number of methylation call strings processed: 21382
Final Cytosine Methylation Report
=================================
Total number of C's analysed: 213239
Total methylated C's in CpG context: 17380
Total methylated C's in CHG context: 14780
Total methylated C's in CHH context: 95012
Total C to T conversions in CpG context: 14073
Total C to T conversions in CHG context: 11432
Total C to T conversions in CHH context: 60562
C methylated in CpG context: 55.3%
C methylated in CHG context: 56.4%
C methylated in CHH context: 61.1%
Hmm, I am not an expert with plants - but isn't the chloroplast supposed to be unmethylated? .... Does your other data also look dodgy? How do the numbers compare to reports in the literature?
Yes, according to literature, the chloroplast is supposed to be unmethylated. Other data also looks suspicious. Here is the splitting report for the entire genome. CHG and CHH methylations seem way too high.
For methylation calls, if understand correct, bismark uses 5x coverage by default, correct? is there a way to change it? I couldn't find the option (I am pretty sure I missed it!).
Processed 1073866 lines in total
Total number of methylation call strings processed: 2147732
Final Cytosine Methylation Report
=================================
Total number of C's analyzed: 14918674
Total methylated C's in CpG context: 1024675
Total methylated C's in CHG context: 960082
Total methylated C's in CHH context: 11441855
Total C to T conversions in CpG context: 109747
Total C to T conversions in CHG context: 139670
Total C to T conversions in CHH context: 1242645
C methylated in CpG context: 90.3%
C methylated in CHG context: 87.3%
C methylated in CHH context: 90.2%
Bismark doesn't apply any cut-offs, so no worries there. But yes I agree, this looks like a quite failed bisuflite conversion, or possibly a few converted reads mixed with mostly unconverted genomic DNA. I am afraid this looks like you need to do it again...
I am trying to split the bam file using samtools to get reads aligned to the chloroplast genome using the following command:
samtools sort S1_combine.deduplicated.bam -o S1_combine.deduplicated.sort.bam
samtoolssamtools index S1_combine.deduplicated.sort.bam
samtools view -b S1_combine.deduplicated.sort.bam chloroplast_1 > chloroplast.extraction.bam
However, the samtools crashes after printing weird characters to the screen. The bam file is an output from duplicate removed step from these two steps:
bismark --non_directional $Genome_path -1 S1_combine_R1.fastq \ -2 S2_combine_R2.fastq -o $ALIGN
deduplicate_bismark -p *_bismark_bt2_pe.bam -o S1_combine
Any suggestions would be greatly appreciated.