duncanca / mosaik-aligner

Automatically exported from code.google.com/p/mosaik-aligner
0 stars 0 forks source link

base quality score limitation in MosaikText #79

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.MosaikText -in testseq.aligned.dat -u -sam testseq.sam
2.
3.

What is the expected output? What do you see instead?
------------------------------------------------------------------------------
MosaikText 1.1.0018                                                 2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- evaluating unique reads only.
- converting the alignment archive to the following formats: SAM

Converting alignment archive:
 0% [                                                                                                                        ]                                  
|ERROR: The base quality is larger than 60.

What version of the product are you using? On what operating system?
1.1.0018

Please provide any additional information below.

I'm attaching the algined file .dat, and also the "raw" paired-end reads file. 
If I clip each file to the first 100 lines (25 reads) then the program works 
fine. I guess I don't understand why MosaikAlign and MosaikSort allow the reads 
to be aligned but not allowing the reads to be converted to sam file.

Thanks much,
Hua

Original issue reported on code.google.com by hua.l.t...@gmail.com on 14 Nov 2010 at 12:08

GoogleCodeExporter commented 9 years ago
Hi Hua,

MosaikText now is weired for producing SAMs.

I try this command to print the archive to the stdout. It works well.
> ./MosaikText -in testseq.aligned.dat -screen

I am checking SAM writer.

Original comment by WanPing....@gmail.com on 14 Nov 2010 at 3:44

GoogleCodeExporter commented 9 years ago
Dear Wan-Ping

I found that in the data structure file, (I think the file is called
*string.cpp in the data structure directory) there is a line that
prints out an error message if the base quality score is >60.
Suppressing that line seems to solve the problem, although I don't
know if a base quality score >60 makes sense.

Original comment by hua.l.t...@gmail.com on 14 Nov 2010 at 7:40

GoogleCodeExporter commented 9 years ago
Hi Hua,

Since we're thinking that qualities are encoded by Phred+33 format, we put that 
checker in MosaikText. We're going to extend Mosaik to allow other qualities 
encoding formats.

Original comment by WanPing....@gmail.com on 16 Nov 2010 at 4:06

GoogleCodeExporter commented 9 years ago

Original comment by WanPing....@gmail.com on 16 Nov 2010 at 4:07

GoogleCodeExporter commented 9 years ago
Thanks Hua for the trick. 

I had the same problem and this seems to fix it. Actually instead of deleting 
the line, I changed the parameter threshold to print an error message only if 
basequality is >104. 

@ Wan-Ping
The base quality are actually encoded by Phred+64 (at least for Illumina reads) 
so setting the error parameter at >60 doesn't make sense!

Original comment by sebastie...@gmail.com on 7 Dec 2010 at 6:18

GoogleCodeExporter commented 9 years ago
I run into the same problem. Where is the file you guys talk about to change 
the score?
Thank you for help.

Original comment by wenbin...@gmail.com on 17 May 2011 at 11:38

GoogleCodeExporter commented 9 years ago
Thanks hua, I also modified the file and recompiled with success. The code 
referred to can be located at 
/mosaik-aligner/src/CommonSource/DataStructures/Mosaikstring.cpp, line 286, in 
the source code for version 1.1.0021.

Original comment by linus.fo...@gmail.com on 12 Jul 2011 at 1:11

GoogleCodeExporter commented 9 years ago
I ran with 1.1.0021 modified to basequality >104.  I was unable to create bam 
file due to character X in the DNA sequence. Strangely the it created sam file.

Original comment by shin.n....@gmail.com on 21 Oct 2011 at 11:28