brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
144 stars 54 forks source link

repeated alignment (bwa get stuck) #88

Closed hanrong498 closed 1 year ago

hanrong498 commented 1 year ago

Hi,

I am trying to run bwa mem with

bwameth.py \
     --reference $REFERENCE \
     $FQ1 $FQ2 > output.sam

However it seems that this command got stuck and kept running, as in my error

[M::process] 0 single-end sequences; 1818182 paired-end sequences
[M::process] read 1818182 sequences (120000012 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (36, 178823, 11466, 10)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (117, 362, 1187)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 3327)
[M::mem_pestat] mean and std.dev: (635.83, 645.35)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 4397)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (83, 119, 169)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 341)
[M::mem_pestat] mean and std.dev: (126.55, 60.59)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 427)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (113, 314, 722)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1940)
[M::mem_pestat] mean and std.dev: (437.26, 432.76)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2549)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (119, 633, 788)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 2126)
[M::mem_pestat] mean and std.dev: (557.10, 407.29)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2795)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 1818182 reads in 5503.742 CPU sec, 773.964 real sec
[M::process] 0 single-end sequences; 1818182 paired-end sequences
[M::process] read 1818182 sequences (120000012 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (33, 178223, 11445, 7)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (109, 216, 777)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 2113)
[M::mem_pestat] mean and std.dev: (373.16, 372.35)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2781)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (83, 119, 169)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 341)
[M::mem_pestat] mean and std.dev: (126.27, 60.50)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 427)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (109, 309, 722)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1948)
[M::mem_pestat] mean and std.dev: (437.98, 439.44)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2561)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF

This is not the complete log. The bwamem was executed many more times. Do you know if this is normal?

Thanks a lot!

brentp commented 1 year ago

yes, those are just chunks of reads that bwa mem operates on, you should see many of them.

hanrong498 commented 1 year ago

Hi,

Thanks for the quick reply! I actually have another question following this:

My original code is to pipe this output to samtools sort

bwameth.py --threads 12 --reference "genome.fa"\
 "FASTQ/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001_R1_001_MM_1.fastq.gz" \
"FASTQ/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001_R2_001_MM_1.fastq.gz" \
2> bwameth/logs/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001.map_reads.err |          \
samtools sort -T "$MYTEMP"/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001 -m 3G -@ 4 \
-o "bwameth/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001.bam" \
2>> bwameth/logs/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001.map_reads.err

But here I get error [W::sam_parse1] unrecognized reference name ""; treated as unmapped

A similar issue was mentioned here and I also checked that

$ head genome.fa.bwameth.c2t
>rchr1 1

Could this cause the unrecognised reference? Thanks for the help!