cfe-lab / MiCall

Pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C
https://cfe-lab.github.io/MiCall
GNU Affero General Public License v3.0
14 stars 9 forks source link

Improve amplicon detector #487

Open donkirkby opened 4 years ago

donkirkby commented 4 years ago

The current technique that the assembled version is using to detect amplicons has two problems:

  1. Some lengths are identified as amplicons when they shouldn't be.
  2. Some lengths are not identified as amplicons when they should be. Usually, this is because there's another amplicon of a similar length, and it raises the average count within the 20 length window.

The 12 Jun 2015 run has examples of both problems.

Instead of choosing the threshold for amplicon peaks as a multiple of the average count, use a multiple of the 90th percentile. There are several parameters we could try to optimize:

donkirkby commented 4 years ago

The first round of changes didn't seem to improve the results. Lots of amplicons in the 12 Jun 2015 run didn't get detected.

After discussion with Chanson, we decided to display the results of the G2P analysis separate from the remapped reads. In other words, a read pair could get counted twice: once in the G2P analysis and once in the remapping. The remapped reads would be displayed in GP120, and the G2P analysis would be displayed in V3LOOP.