averagehat / pbsim

Automatically exported from code.google.com/p/pbsim
0 stars 0 forks source link

Simulated sequence contains unexpected characters! #3

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. simulate fastq reads file from human genome chr20 use PBSIM. 
2. map back fastq reads back to human genome chr20 using BWA-SW with parameter 
bwa bwasw -t 4 -b 5 -q 2 -r 1 -z 20 
3. Use picard tools convert sam to bam and sort.

What is the expected output? What do you see instead?
In step 3 above, the convert and sort should be success. However, there is SAM 
parsing error. There exist '<<DEL>' in the sequence field, which is not in the 
SAM standard. 

What version of the product are you using? On what operating system?
pbsim-1.0.3-Linux-amd64.tar.gz   
Linux  3.2.0-39-generic Ubuntu 

Please provide any additional information below.

Original issue reported on code.google.com by Ruhua.Jiang on 8 May 2014 at 11:38

GoogleCodeExporter commented 9 years ago
I have checked the MAF file, find the '<<DEL>' in reads sequence and '<DEL>' in 
corresponding reference region. However, I scanned whole  reference fasta and 
there do not exist any of such characters. 

Original comment by Ruhua.Jiang on 8 May 2014 at 11:41

GoogleCodeExporter commented 9 years ago
Thanks so much for your report. The pbsim does not print '<<DEL>' if the string 
is not included in the reference fasta. I guess control characters (e.g. 
ctl-backspace) are the cause. Could you check whether there exist control 
characters in the reference fasta?

Original comment by ono.yuki...@gmail.com on 12 May 2014 at 3:05