Open donkirkby opened 4 years ago
The first round of changes didn't seem to improve the results. Lots of amplicons in the 12 Jun 2015 run didn't get detected.
After discussion with Chanson, we decided to display the results of the G2P analysis separate from the remapped reads. In other words, a read pair could get counted twice: once in the G2P analysis and once in the remapping. The remapped reads would be displayed in GP120, and the G2P analysis would be displayed in V3LOOP.
The current technique that the assembled version is using to detect amplicons has two problems:
The 12 Jun 2015 run has examples of both problems.
Instead of choosing the threshold for amplicon peaks as a multiple of the average count, use a multiple of the 90th percentile. There are several parameters we could try to optimize:
window size to compare each peak to, currently +/- 20
percentile to compare to within the window, currently 90
multiple of comparison, currently 50
minimum count, currently 50
[x] change from average to percentile, and choose reasonable thresholds.
[ ] optimize the thresholds over a large number of samples.