biosustain / OpenMS

The codebase of the OpenMS project
https://www.openms.de
Other
0 stars 1 forks source link

Refactor/mrm transition group picker no consensus boundaries #141

Closed hroest closed 6 years ago

hroest commented 6 years ago

First idea on how we could have a per-transitiongroup peak boundaries

Basically we pick the highest intensity peak for each transition in the window and then use this going forward. This may be problematic if we have more than one peak in that window (e.g. the peaks may not match up well).

hroest commented 6 years ago

Note that this is just conceptual code, I have not tested this but I hope to get the conversion started with this.

dmccloskey commented 6 years ago

Hi Hannes. I think this is a great start and helps us a lot in pointing out exactly where the modifications need to be made.

One thing that is very beneficial about the class is the alignment of the retention times for each transition in a transition group. I do want to ensure that this will not be lost in the above modifications. For example, in many cases the RT different between each of the transitions will very by only a few seconds, but the best left and right positions could differ by quite a bit. What we are looking for is the ability of the algorithm to still align the peaks within a transition group, but then allow peak integrator to fine tune the exact retention time for each peak and utilize the local_left/local_right for each peak for more precise peak area/background/metrics calculations.

hroest commented 6 years ago

Ok, lets look at a few examples with peaks indicated as (start, apex, end): Note that I assume that the apex is picked independently in each trace

Ex1 well behaved:

Trace 1, 1 peak: (400,450, 500) Trace 2, 1 peak: (350, 425, 450)

Result: 1 peakgroup -> ( (400,450, 500) ; (350, 425, 450) ). This result is different to the consensus result as the peak boundaries are now different in each trace. This is what we would like to happen.

Ex2 missing peak:

Trace 1, 1 peak: (400,450, 500) Trace 2, no peak:

Result: 1 peakgroup -> ( (400,450, 500) ; (400,450, 500) ) -> peak boundaries will be transferred. This result is equal to the consensus result

Ex3 shifted peak:

Trace 1, 1 peak: (400,450, 500) Trace 2, 1 peak shifted: (350, 395, 450)

Result: 2 peakgroups -> ( (400,450, 500) ; (400,450, 500) ) from trace 1 and ((350, 395, 450); (350, 395, 450)) from trace 2. This result is equal to the consensus result

Ex3 two peaks peak:

Trace 1, 1 peak: (400,450, 500) Trace 2, 2 peaks: (350, 420, 430); (430, 460, 490)

Result: 2 peakgroups -> ( (400,450, 500) ; (390, 460, 490) ) from trace 1 which combines two peaks and ( (350, 420, 430); (350, 420, 430)) from secondary peak in trace 2. Note that in the current implementation with consensus, we would have deleted the secondary peak in Trace 2, leading to only one single peak group as a result. Also the result here will depend on which peak is more intense, this assumes that the 460 peak is more intense than the 420 peak.

dmccloskey commented 6 years ago

Ex3 two peaks peak:

Trace 1, 1 peak: (400,450, 500) Trace 2, 2 peaks: (350, 420, 430); (430, 460, 490)

What I would like to see in this case is (400,450, 500);(430, 460, 490) because more of second peak from trace 2 is contained in trace 1.

hroest commented 6 years ago

What I would like to see in this case is (400,450, 500);(430, 460, 490) because more of second peak from trace 2 is contained in trace 1.

maybe we can again have a switch whether we want to consider the "wider" peak first or the "highest intensity" peak.

dmccloskey commented 6 years ago

I would be hesitant to use a switch like that for this case. There will be many cases like Ex3 with two peaks where there one of the traces will have multiple peaks that overlap one of the other trace windows. For the example given, chosing (350, 420, 430) would most likely be a mistake where as chosing (430, 460, 490) would be correct as the majority of the (430, 460, 490) peak is contained within the first trace whereas the (350, 420, 430) is not.

Essentially, I am quite happy with how the original algorithm works utilizing either the highest or widest peak to derive the initial boundaries for the peaks for all traces. What I would like to see is a subsequent refinement of those peak boundaries to better accurately reflect the peaks within each of the traces.

Going back to Ex3 with two peaks, I would envision the algorithm working like this:

  1. assuming Trace 1 (400,450, 500) is the largest peak from which all other borders are derived, the algorithm would apply those borders to all other traces as it currently does
  2. This would result in consensus Trace 2 peak (400, 450, 500)
  3. The algorithm would then go trace by trace and refine the peak boundaries and peak apex for the consensus peaks
  4. This would result in Trace 2 peak (430, 460, 490).
hroest commented 6 years ago

Maybe in this case it would make sense to pick the peak with the largest overlap with the base peak instead of the one with the most intensity?