BenLangmead / bowtie

An ultrafast memory-efficient short read aligner
Other
257 stars 76 forks source link

bowtie 1.3.0 generated sam files error #119

Closed hidvegin closed 3 years ago

hidvegin commented 3 years ago

I used bowtie 1.3.0 for generate small RNA-seq aligments to reference genome. I used this parameters: bowtie -n 1 -l 10 -m 100 -k 1 --best --strata -p 24 -f /HDD/lculinaris/smallRNA/clean_data/LC01.clean.fa.gz -x /HDD/lculinaris/bowtie_index/lculinaris --sam --sam-nohead | samtools view -bS -o LC01_clean.bam

I would like to generate bam files with samtools view on the fly but I get this error message:

[E::sam_hrecs_error] Missing tab at line 1: "@ST-E00523:628:H2WJ5CCX2:2:1101:16143:1362 0   26094   19805   255 16M *   0   0   NCGCCTGTAGCTCAGT    IIIIIIIIIIIIIIII    XA:i:1  MD:Z:0G15   NM:i:1  XM:i:2"
samtools view: failed to add PG line to the header

The reads in fasta file format, example:

>@ST-E00523:628:H2WJ5CCX2:2:1101:14945:1221 1:N:0:NAGCTGAA
NGAAGCGGAAGCGAGAATG
>@ST-E00523:628:H2WJ5CCX2:2:1101:17462:1221 1:N:0:NAGCTGAA
NGGGATGTAGCTCAGA
>@ST-E00523:628:H2WJ5CCX2:2:1101:21501:1221 1:N:0:NAGCTGAA
NGGGATGTAGCTCAAA
>@ST-E00523:628:H2WJ5CCX2:2:1101:2696:1309 1:N:0:NAGCTGAA
NGGGATGTAGCTCAAA
>@ST-E00523:628:H2WJ5CCX2:2:1101:6492:1309 1:N:0:NAGCTGAA
NCAACCGACTTAGAAC
>@ST-E00523:628:H2WJ5CCX2:2:1101:9191:1309 1:N:0:NAGCTGAA
NTGTGGATCTTTGACGGCAACGTAAA
>@ST-E00523:628:H2WJ5CCX2:2:1101:14407:1309 1:N:0:NAGCTGAA
NATAGACTTGAAC
>@ST-E00523:628:H2WJ5CCX2:2:1101:14955:1309 1:N:0:NAGCTGAA
NGGGATGTAGCTCAGA
>@ST-E00523:628:H2WJ5CCX2:2:1101:15747:1309 1:N:0:NAGCTGAA
NGGGATGTAGCTCAGA

Please help me, how could I resolve this issue.

BenLangmead commented 3 years ago

Does this work if you remove the --sam-nohead option from the bowtie command?

hidvegin commented 3 years ago

Thank @BenLangmead for your answer. I tried it, what you mention but I get the very same error message:

[E::sam_hrecs_error] Missing tab at line 256025: "@ST-E00523:628:H2WJ5CCX2:2:1101:13859:1520    0   2309    41991   255 21M *   0   0   NATGAGATACGGACACCGGCC   IIIIIIIIIIIIIIIIIIIII   XA:i:1  MD:Z:0T19T0 NM:i:2  XM:i:2"
samtools view: failed to add PG line to the header

This code was used: bowtie -n 1 -l 10 -m 100 -k 1 --best --strata -p 24 -f /HDD/lculinaris/smallRNA/clean_data/LC01.clean.fa.gz -x /HDD/lculinaris/bowtie_index/lculinaris --sam | samtools view -bS -o LC01_clean.bam

ch4rr0 commented 3 years ago

Hello @hidvegin

This appears to be a bug in samtools where it thinks that the first read is a continuation of the header, or start of the header in the case of --sam-nohead and expects there to be a TAB after @ST but instead finds a hyphen. A temporary solution may be to remove the '@' character from the first the read. Your resulting file should look something like this:

>ST-E00523:628:H2WJ5CCX2:2:1101:14945:1221 1:N:0:NAGCTGAA
NGAAGCGGAAGCGAGAATG
>@ST-E00523:628:H2WJ5CCX2:2:1101:17462:1221 1:N:0:NAGCTGAA
NGGGATGTAGCTCAGA
>@ST-E00523:628:H2WJ5CCX2:2:1101:21501:1221 1:N:0:NAGCTGAA
NGGGATGTAGCTCAAA
>@ST-E00523:628:H2WJ5CCX2:2:1101:2696:1309 1:N:0:NAGCTGAA
NGGGATGTAGCTCAAA
>@ST-E00523:628:H2WJ5CCX2:2:1101:6492:1309 1:N:0:NAGCTGAA
NCAACCGACTTAGAAC
>@ST-E00523:628:H2WJ5CCX2:2:1101:9191:1309 1:N:0:NAGCTGAA
NTGTGGATCTTTGACGGCAACGTAAA
>@ST-E00523:628:H2WJ5CCX2:2:1101:14407:1309 1:N:0:NAGCTGAA
NATAGACTTGAAC
>@ST-E00523:628:H2WJ5CCX2:2:1101:14955:1309 1:N:0:NAGCTGAA
NGGGATGTAGCTCAGA
>@ST-E00523:628:H2WJ5CCX2:2:1101:15747:1309 1:N:0:NAGCTGAA
NGGGATGTAGCTCAGA

Please let us know if this works for you.

hidvegin commented 3 years ago

Hello @ch4rr0

Thank you for your answer. I tried what you suggested to me, but it did not work. I get this error message:

[E::sam_hrecs_error] Missing tab at line 256025: "@ST-E00523:628:H2WJ5CCX2:2:1101:9526:1713 0   190533  6700    255 24M *   0   0   AAATCCGTCTGATCTGAATCGATC    IIIIIIIIIIIIIIIIIIIIIIII    XA:i:0  MD:Z:24 NM:i:0  XM:i:2"
samtools view: failed to add PG line to the header
ch4rr0 commented 3 years ago

Here is the command line that I used on the modified input file. I kept only the arguments that were necessary to recreate the issue:

$ ./bowtie indexes/e_coli -f in.fa -S | samtools view -bS -o out.bam
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 9
# reads with at least one alignment: 3 (33.33%)
# reads that failed to align: 6 (66.67%)
Reported 3 alignments
$ file out.bam
out.bam: gzip compressed data, extra field, original size modulo 2^32 0
hidvegin commented 3 years ago

It worked for me, if I use a short index file (example miRBase mature small RNAs), but when I use a 4 Gbp reference genome index file which I made with bowtie1.3.0, I get the same error message. Also, in bowtie2 and hisat2, the samtools is worked for me very well with the latest version. bowtie 1.3.0 is the only one, which is not working with samtools.

ch4rr0 commented 3 years ago

I am getting the same error when using bowtie2 when using the original input sample:

$ ./bowtie2-align-s -x example/index/lambda_virus -f ../bowtie/in.fa | samtools view -b 
9 reads; of these:
9 (100.00%) were unpaired; of these:
9 (100.00%) aligned 0 times
0 (0.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
0.00% overall alignment rate
[E::sam_hrecs_error] Missing tab at line 4: "@ST-E00523:628:H2WJ5CCX2:2:1101:14945:1221 4       *       0       0       *       *       0       0       NGAAGCGGAAGCGAGAATG     IIIIIIIIIIIIIIIIIIIYT:Z:UU  YF:Z:SC"
samtools view: failed to add PG line to the header

$ ./bowtie2-align-s -x example/index/lambda_virus -f ../bowtie/in.fa --sam-nohead | samtools view -b
9 reads; of these:
9 (100.00%) were unpaired; of these:
9 (100.00%) aligned 0 times
0 (0.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
0.00% overall alignment rate
[E::sam_hrecs_error] Missing tab at line 1: "@ST-E00523:628:H2WJ5CCX2:2:1101:14945:1221 4       *       0       0       *       *       0       0       NGAAGCGGAAGCGAGAATG     IIIIIIIIIIIIIIIIIIIYT:Z:UU  YF:Z:SC"
samtools view: failed to add PG line to the header

Would it be possible to share a sample of the index?

hidvegin commented 3 years ago

Yes, here is the link for the index files: LINK With this index files I get the error message.

hidvegin commented 3 years ago

Also, I tried different small RNA libraries (14 different libraries) align to the miRBase mature index file and I recognized that for 11 libraries worked your suggestion but for 3 libraries did not work. So, it looks that the '@' character deletion in the firs read, is not working for all of libraries.

ch4rr0 commented 3 years ago

Hello,

I was not able to download the read file using the link provided however I would like to recommend the below command, which should work regardless of input:

bowtie -n 1 -l 10 -m 100 -k 1 --best --strata -p 24 -f /HDD/lculinaris/smallRNA/clean_data/LC01.clean.fa.gz -x /HDD/lculinaris/bowtie_index/lculinaris --sam | samtools view -b --no-PG

The only significant difference is the addition of the --no-PG flag.

hidvegin commented 3 years ago

Hello @ch4rr0,

Thank you for your answer. With your help, samtools worked very well and generated the bam files.