ElucidataInc / ElMaven

LC-MS data processing tool for large-scale metabolomics experiments.
https://resources.elucidata.io/elmaven/
GNU General Public License v2.0
87 stars 52 forks source link

|charge|>1 #17

Closed sahil21 closed 7 years ago

sahil21 commented 7 years ago

Reported by @chubukov on Bitbucket

One of the things that was never handled properly in Maven was multiple charged species (even though some parts of the code are designed to handle it). You can see https://bitbucket.org/elucidatainc/qews/commits/fd181d02608f3e0d1ad32709f8ea62fa0561b76f for an example of some basic edits to make it work in the CLI. Browsing the new code, I can see that this bug is still present. E.g.

bool mzSlice::calculateMzMinMax(float CompoundppmWindow, int ionizationMode) {
     float ppmScale = 1e6;

    //Calculating the mzmin and mzmax
    if (!this->compound->formula.empty()) {
        //Computing the mass if the formula is given
        double mass = MassCalculator::computeMass(this->compound->formula, ionizationMode);

should really be compound->charge instead of ionizationMode. I'm certain this is far from the only place. Charge for a known should be read in when compoundDb is read (that functionality already exists). When the user types a formula in the main window, we should probably default to charge=ionizationMode, though ideally there would be either an extra box for the charge or a way to specify it in the formula (e.g. C50H20N10/3) meaning charge=3*ionizationMode. If we do this, we have to make the ionizationMode option front and center in the main options. I can't think of any other problematic cases at the moment. In terms of priority: on the one hand, it's definitely a bug and it will become critical if we ever apply Maven to proteomics, or, to some degree, lipidomics. On the other hand, it has never really worked, and we're probably never going to make Maven into a tool that's heavily used by the proteomics community. So in that sense, I'd say it's not critical for initial release.

chubukov commented 7 years ago

reminding myself to upload some proteomics data so you can test this

chubukov commented 7 years ago

Sample data and source files uploaded in https://www.dropbox.com/home/MAVEN/bug_reports/2017_02_09_proteomics_multicharge_sample_data . Some notes in readme.txt

chubukov commented 7 years ago

@sahil21 @rkdahmiwal We had a phone conversation about this a long time ago now, but I wanted to note that this fix is still a little broken -- it treats charge as sort of a hybrid between a compound parameter and a main window parameter. In particular, it's still not possible to use a compound database with charge specified, since the main window option overrides it.

Most places that currently look at the mainWindow option should really be using compound->charge instead -- I think the main window option should probably be used only when a user types a formula into the text box (maybe some other edge case I'm not thinking about).

As before, this is really low priority for us because we rarely deal with multicharged ions, but I would consider it a "core" issue.

sahil21 commented 7 years ago

@chubukov

A getCharge function has been introduced which returns the compound charge if it is available. If the user enters a formula or when the compound charge is 0, the function returns ionization mode*charge taken from the main window. In all other cases, it returns the compound charge. Giving precedence to compound charge can lead to errors in some cases. For example, a compound DB file with charge calculated for positive ionization mode is used for a sample set with negative ionization. In this case, the compound charge will be used for mass calculation which is incorrect. One of the solutions is if the sign of the compound charge does not match the ionization mode of samples, the user should be given a warning.

Known Issue:

image An issue has been observed while changing the charge value. Suppose we perform peak detection with a particular ionization mode and charge as the user input. Changing the ionization and charge and clicking on a compound in the peak table will change the EIC but the peak quality circles stay the same as shown above as they are only calculated once during peak detection. We are reporting this as a separate issue.

chubukov commented 7 years ago

This seems to be mostly working now, although I've mainly tested in positive mode. We'll make new issues if something else comes up.