drgirasol / FSAnalyzer

DNA Fragment Analysis using Matlab
GNU General Public License v3.0
1 stars 0 forks source link

Size Standard Peak Detection #2

Open drgirasol opened 8 years ago

drgirasol commented 8 years ago

When there is a very strong peak among the size standard peaks, caused by off-scale error, the used method for peak detection is not able to detect all standard fragments. Neither the "minimum threshold" nor the "ignore peaks below x" seem to be able to cope with this problem. Solution: a) change the peak detection algorithm to ignore outliers or b) change the peak detection to use a threshold instead of a minimum threshold or c) adapt the detection algorithm to only consider peaks above the "ignore" threshold

ufcyg commented 8 years ago

peak detecting alorithm reworked aka "adjusting threshold bug"

The "ignore approx x bp" is not affecting the detection algorithm in any way. The minimum threshold will be only used if the calculated "maximum peaks of the lower ones" is below 300 RFU. Previous Version of peakAdaptM first scanned the data for peaks at a minimum threshold of 150 RFU, after this first detection some variables (maximum dist between peaks, peak prominence and threshold) will be adjusted and then the highest peak was removed if it was "far enough" away from the median of the other peaks.

Main problem of the new data was the, in comparsion to the other peaks, massive false signal. This caused the algorithm for adapting threshold to use a much to high adjusted threshold as intended.

This approach "from the borders to the middle" is replaced by a new version after the first scan the percentual difference between the median of all found peak heights, and the mean of all found peaks except the biggest. If the difference is higher than 2.5% the biggest peaks will be removed till the difference is smaller than 2.5%.

In the next step the threshold will be adjusted as in the previous version.

Again a peak detection is done with adjusted variables. If there are less or equal peaks according to the ladder file, the threshold will be adjusted 10 RFU lower and another peak detection occures. When there are enough peaks, the difference between median is calculated and bigger peaks removed till the difference is below 2.5% again. If no peaks are removed this way the lowest peak will be removed, because the threshold adjustment is done till there is a minimum of one more peak than preset by the ladder file.

Furthermore affected files by this change: FSAnalyzerGUIv3.mat FSAdebug.mat callplotM.mat calcpeakqual.mat

drgirasol commented 8 years ago

How did you determine the 2.5 %? Did you evaluate good size marker FSAs for that?

If you tested the new implementation please create a pull request for it.

drgirasol commented 8 years ago

It would be nice to have a wiki entry for the peak detection. It should include the steps as described here as well as the names of the matlab functions (linked to documentation) that are used. I already adapted your description in the Wiki. But there are still some open questions...

ufcyg commented 8 years ago

a range of 1.2 % to 1.8 % difference has been on every tested file (around 100) when all false high peaks have been removed; in contrast to around 20-60% when all peaks are considered i tried a setup with 5% preset before, but this led to less satisfying results by not removing false high peaks in several samples, while 2% and lower tend to falsely remove high, but real peaks.

at the end of the day i want to remind that this preanalysis is a way of making work easier with this less amount of data it is not possible to create a learning algorithm and thatfor there is no way to replace a human being checking the data before an analysis is started.