SoYeonA / TenSQR

4 stars 3 forks source link

what I should do when low reads covering regions/no SNV called showed, and why? #4

Open ZhongyiZhu opened 4 years ago

ZhongyiZhu commented 4 years ago

Dear developer: I try to call Quasispecies with a sam file of average depth 200x, and 100%region have a >20X coverage. however, I found few reads were filled in the covering regions (my reference is HBV viruse, I already set the region to the whole length of my reference, which is 1-3182). why is that? and I found the output below print "mapped_counter:1 unmapped_counter:0". what "mapped_counter" means? why there is only 1?

thankyou and below is the output information for ExtractMatrix call:

  1. filename of reference sequence (FASTA) : NC_003977.2.fasta
  2. filname of the aligned reads (sam format) : Week0.sam
  3. SNV_thres : 0.001
  4. reconstruction_start : 0
  5. reconstruction_stop: 3182
  6. min_mapping_qual : 5
  7. min_read_length : 10
  8. max_insert_length : 500
  9. characteristic zone name : Week0
  10. seq_err (assumed sequencing error rate(%)) : 2
  11. MEC improvement threshold : 0.0312
  12. initial population size : 1 mapped_counter:1 unmapped_counter:0 num_lowQSseq:0 seq_count:1 filtered_counter1:37922 filtered_counter2:603480 filtered_cond2:1 pair_counter:1 singleton_counter:5656553 total_counter:6297956

After parsing 6297956 reads in file Week0.sam, there are 1 paired_end reads(mean lengths 52) covering regions 0-3182. CPU time for SAM parsing: 33.87

After calling SNVs from 3183 bases in regions between 0 and 3182, 0 SNVs are detected. After correcting error, 0 fragments are used for quasi-species reconstruction. reduced number of fragment:1 CPU time for SNV calling: 0

WuLoli commented 4 years ago

Hi Zhongyi,

The results show that only 1 read covering region 0 - 3182.

By checking the position index of each read in the SAM file, are all the reads lying in the region between 1 and 3182?

mikemc commented 4 years ago

I also had this problem. In my case, the problem was because I had sorted the SAM file by reference position with samtools sort and so the read-pairs were not grouped together, which it seems that TenSQR requires. I fixed the problem by resorting by read name, using samtools sort -n.