ElucidataInc / ElMaven

LC-MS data processing tool for large-scale metabolomics experiments.
https://resources.elucidata.io/elmaven/
GNU General Public License v2.0
88 stars 52 forks source link

grouprank not properly set #71

Closed chubukov closed 7 years ago

chubukov commented 7 years ago

It looks to me like during automated peak picking with a compound database, the grouprank attribute is not being computed or not being properly set.

When I allow multiple groups per compound and then export to mzroll, I get grouprank 0 for all groups. The grouprank formula was supposed to be (1.1-maxQuality) (1/log(maxIntensity+1)) (deltaRT^2) (whether that's the best formula is an issue for another day).

I noticed this because of very strange behavior regarding which peak gets selected when there are multiple peaks within the RT window.

chubukov commented 7 years ago

@sahil21 thanks for looking at this -- I see that you've already made some progress. Let me know when you think it's ready for testing -- this is a major bug, so I'd like to get a fix deployed as soon as we can.Let me know if I can help in any way.

sahil21 commented 7 years ago

@chubukov This is resolved.

chubukov commented 7 years ago

@sahil21 confirming that this appears to be resolved.

Noticed a related issue that I will post about separately.

chubukov commented 7 years ago

Something is still wrong here. The grouprank is properly set for the purposes of writing the mzxml file, but not for the purposes of choosing the right peakgroup during automatic peak picking (compound database search).

Here's a reproducible example:

Look at the integration for L-Asparagine. You can see that in the maxGroups==1 case, it's choosing a group that's far from expectedRt and has low intensity. Indeed, if you export to mzxml from the maxGroups==3 table, you can see that of the two groups for L-Asparagine, the earlier higher-intensity one has a lower groupRank, as expected. But it's not the one chosen in the maxGroups==1 case, which makes me think that groupRank is not properly set during the choice.

Perhaps the problem is something else, but I have verified that Maven 776 picks the correct group.

-------------------SETTINGS-------------------

alignSamplesFlag=0
alignMaxItterations=10
alignPolynomialDegree=5
--------------------------------MASS SLICING
rtStepSize=20
ppmMerge=20
limitGroupCount=2147483647
minMz=0
maxMz=0
minRt=0
maxRt=0
minIntensity=0
maxIntensity=0
minCharge=0
maxCharge=0
------------------------------COMPOUND MATCHING
ionizationMode=1
matchRtFlag=1
compoundPPMWindow=10
compoundRTWindow=1
matchFragmentation=0
fragmentMatchPPMTolr=1000
------------------------------EIC CONSTRUCTION
eic_smoothingWindow=20
eic_smoothingAlgorithm=0
baseline_smoothingWindow=5
baseline_dropTopX=80
------------------------------PEAK GROUPING
grouping_maxRtWindow=1
eicMaxGroups=3
------------------------------GROUP FILTERING
minGoodPeakCount=3
minSignalBlankRatio=10
minSignalBlankRatio=10
minNoNoiseObs=3
minSignalBaseLineRatio=2
minGroupIntensity=1e+06
-----------------------------------OUTPUT
outputdir=reports\
writeCSVFlag=0
keepFoundGroups=1
showProgressFlag=1
GeorgeSabu commented 7 years ago

@chubukov The order problem wth the group rank was related to a wrong inequality. We have fixed it.

chubukov commented 7 years ago

@GeorgeSabu @sahil21 great. looks like this was a typo during refactoring? (just trying to make sure I understand why it worked fine in previous versions).

GeorgeSabu commented 7 years ago

@chubukov yes it was a typo during refactoring.