brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
144 stars 54 forks source link

unrecognized reference name during sam_parse #86

Closed nash5202 closed 1 year ago

nash5202 commented 1 year ago

I have some Enzymatic methyl sequencing data that I am trying to align using bwa-meth. However, I am encountering the following issue in parsing of the sam file that the aligner is generating when I try to convert the sam file to a bam file.

[W::sam_parse1] unrecognized reference name ""; treated as unmapped

I am not sure what the issue is as I have used the same hg38 reference genome for alignment of other types of sequencing data. Also, I am using the following command for the task:

bwameth.py --reference ref_genome.fa Sample1_1.fastq.gz Sample1_2.fastq.gz -t 7 | samtools view -b - > sample1.bam

Any help is appreciated.

Thank you.

brentp commented 1 year ago

I am not sure. You were able to run bwameth.py index without problems? Do other tools (fastqc, etc) run without problems on your fastq files?

MSleeper1 commented 1 year ago

I am having the same error when converting the sam files generated by alignment with bwa-meth to bam files:

[W::sam_parse1] urecognized reference name; treated as unmapped

I tried two variations of the command to align a single-end read and convert sam to bam:

bwameth.py --threads 8 --reference ref_genome.fa sample.fastq > sample.sam samtools view -b -o sample.bam sample.sam and bwameth.py --reference ref_genome.fa sample.fastq -t 8 | samtools view -b - > sample.bam

I was able to runbwameth.py index without any issues. I have tried the above commands on a few different fastq files and I am currently double-checking my fastq files with fastqc for any issues.

The environment I am using had the following package versions installed: bwa 0.7.17
bwa-mem2 2.2.1
bwameth 0.2.5 python 3.11.0 samtools 1.6 toolshed 0.4.6

Please let me know if you have any advice on how to troubleshoot this error message.

Thank you.

haodongchen commented 1 year ago

One problem I noticed is bwameth.py puts nothing in the chrom column when the read is unmapped, while it should put an * there: readname\t77\t\t0\t0\t*\t*\t0\t0\tGNAATCATGTGTCTTCTTATCTCTAAATCAGAATCCCGCCCAAACGAAACGATACGACAACGCCGCGAAACCTCGATTAACCTCAAATAACCAATCCCCCACCGATCCCCGCCGCCGAACCCCCCGCGCCAGCCCGCGCCCCGCGCGGCCG\t;#CCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCC-CCCCCCC;CCCCCCCCCCCCCCCCCCCCCCC;CC--CCC;CC;CCCC;C;C;;;CC-CC--C;-CC--C;-CC--C---;--C--C--;C--;---CC;CCC-----\tAS:i:0\tXS:i:0\tRG:Z:test\tYC:Z:CT\n. This may cause some software to report Unrecognized reference name.

brentp commented 1 year ago

@haodongchen , thanks for diagnosing. I pushed a fix for this, would you or others in this issue give it a try and let me know? thanks!

MSleeper1 commented 1 year ago

@brentp, I can confirm that the blank chrom column is causing the issue because I did not get this error when I removed reads that were blank in the chromcolumn.

I pulled your most recent update and ran alignment with the updated bwameth.py. bwameth.py --reference hg19.p13.plusMT.no_alt_analysis_set.fa.gz SRR536237.fastq > SRR536237.sam

The resulting sam file still contained blank chrom columns for some reads:

SRR536237.29    16  chr17   75537577    60  101M    *   0       0ACTACCCCGAATAAACCACACTCCTTACAAAAACCAAACAACTACGTTAAAAAAATATTAATATTTATCAAAAAACCCTCTTCCAACCATTTTTAATTTTT  #########A>3><3<>A>=953>;;=?????>7;A@A@;?@;.<DDDDDEECCB@B<DDECECDB?16DDIEE??3A+3:FAIEE@>DDDDADDDD????   NM:i:1  MD:Z:76A24  AS:i:98XS:i:29  RG:Z:SRR536237  YC:Z:CT YD:Z:r
SRR536237.30    4       0   0   *   *   0   0       TTGTTGTTTGGAGATGTTTTGGTTTTGTGGTTTTAAGGCTTTGGAGAAGGGAGGGGAAAATATGTGTTTTTTTTTTGAATTAGGGTTATTAAAGTTAATTT   ????8:ADD>?+2++2AEEDD<<C;FBEEI?8))*:*?*09DBB#########################################################   AS:i:0  XS:i:0  RG:Z:SRR536237  YC:Z:CT

When I ran samtools view -S -b SRR536237.sam > SRR536237.bam, it returned the same error. [W::sam_parse1] urecognized reference name; treated as unmapped

Please let me know if any other information would be helpful. Thank you.

brentp commented 1 year ago

thanks for following up @MSleeper1 . Can you share a fastq with 2 reads that show the problem? I think this is likely something with not having paired-end reads as there are likely few users with single-end reads.

haodongchen commented 1 year ago

I tried the fix and it solved the problem I got.

brentp commented 1 year ago

@MSleeper1 and @haodongchen thanks for following up! I'll tag a new release with the fix.

MSleeper1 commented 1 year ago

I dug into which script was being used when I called bwameth.py and found that has been defaulting to using miniconda3/envs/bwa/bin/bwameth.py.

When I ran the alignment and specified the absolute path miniconda3/pkgs/bwameth-0.2.5-pyh5e36f6f_0/python-scripts/bwameth.py, which contains your most recent push. This solved my problem; there are now * in the chrom columns that were previously blank.

Thank you for all the help. :)