RogerGinBer / qHermes

GNU General Public License v3.0
3 stars 0 forks source link

Tests - SOI peak detection / ChromPeak annotation #6

Open RogerGinBer opened 3 years ago

RogerGinBer commented 3 years ago

In this issue we will track tests of different SOI-detection-related functions (RHermes -> XCMS format) and also ROI/ChromPeak-annotation functions (XCMS -> RHermes).

The general objectives of these tests are:

Of course that'll be quite a bit of work, but if you have some spare time I'd appreciate if you could give these quick tests a try 🙌 :

Test 1 - ChromPeak annotation using RHermes ionic formulas

First download this zip that contains a csv with molecular formulas from KEGG and ECMDB: merge_KEGG_ECMDB.zip

Then try annotateROI():

library(RHermes)
library(faahKO)
library(XCHermes)
library(xcms)

faahko_3_files <- c(system.file("cdf", "KO", "ko15.CDF", package = "faahKO"),
                    system.file("cdf", "KO", "ko16.CDF", package = "faahKO"),
                    system.file("cdf", "KO", "ko18.CDF", package = "faahKO"))

data <- readMSData(faahko_3_files, mode = "onDisk")
xdata <- findChromPeaks(data, param = CentWaveParam(snthresh = 2))
anot_xdata <- annotateROI(xdata, DBfile = "your_path/merge_KEGG_ECMDB.csv", ppm = 50)

annotateROI will generate a big set of ionic formulas from the DB csv and try to match them to each ChromPeak mz. The results are then added to the ChromPeakData DataFrame.

jorainer commented 3 years ago

Hi @RogerGinBer , sorry for my late reply. Function and results look good to me! I do however have some (minor) suggestions 😉

to better integrate into the xcms user experience I would suggest you make this function a method with its own parameter object to define the settings:

setMethod("annotateChromPeaks", signature(object = "XCMSnExp", param = "ChemFormulaParam") {
   ...
})

I would name the method annotateChromPeaks because that's exactly what you are doing: you are annotating ChromPeaks. I would then use also a parameter object (maybe you find a better name than the one I chose) so that we could have in future also other annotation method implementations.

The parameter object would then take the parameters DBfile, adfile and ppm - I guess the adfile defines expected adducts? I was wondering if you could eventually also have a look at the way we define and calculate adduct m/z in MetaboCoreUtils (e.g. https://rformassspectrometry.github.io/MetaboCoreUtils/reference/adductNames.html) - maybe there is something you could re-use? Also, because IMHO providing the names of the expected adducts as a character vector instead of a text file might also be simpler?

Eventually it might even be cool if you would already do any preparatory things (reading the text files, checking if their format is correct, eventually other stuff/calculations that can be done on only these files) in the constructor function of ChemFormulaParam, so, when the user calls cfp <- ChemFormulaParam(DBfile, adfile, ppm) all these things get already done before calling the annotateChromPeaks method.

Finally, the way how you added the results to the ChromPeakData is exactly how this was intended to be. I would maybe just use a more self-explanatory name for the column, such as "annotation".