Metabolomics-MPC / NTAnnotationWorkflow

An R script for the annotation of small-molecule LC-MS data
2 stars 1 forks source link

Workflow structure #4

Closed michaelwitting closed 2 years ago

michaelwitting commented 2 years ago

The workflow can/should be split into different "modules". One main script handles the input/output and then simply calls the "modules" via source.

Ideas for potential "modules": 00_Setup.R (takes care of all libraries, helper functions etc...) 01_MS1Import (imports positive and negative mode MS1 data from .csv files and does some sanity checks) 02_MS2Import (imports positive and negative mode MS2 data from .mgf files and does some sanity checks) 03_MS1Annotation_pos 04_MS1Annotation_neg

XX_SpectralSimilarityNetwork_pos (calculates different spectral similarity networks, might be computationally expensive and should only be performed if neccessary, define in parameter file) YY_SpectralSimilarityNetwork_neg (calculates different spectral similarity networks, might be computationally expensive and should only be performed if neccessary, define in parameter file)

Any thoughts @chufz ?

michaelwitting commented 2 years ago

If interested, @liesasalzer could add some networks, e.g. mass difference networks etc...

chufz commented 2 years ago

I would go for each script containing a wrapper function, and only a 03_MS1Annotation.R and 04_MS2Annotation.R with each function (MS1Annotation() and MS2Annotation() having a parameter pos/neg since the calculation steps should be same except for different ions etc. Ok?

michaelwitting commented 2 years ago

I'm totally fine with that.

chufz commented 2 years ago

After thinking, I would keep input and output separate, as for my computations, we use differently mounted drives (large for storage as input, fast writeable for output). Also the library should not be included in the project folder as it will be large and more often used for different projects, but an absolute file path in the settings_MetaboAnnotation.yaml will lead to it. This will lead to running on the commandline: Rscript workflow.R projectdir outputdir Ok?

michaelwitting commented 2 years ago

I use often multiple libraries, mostly to not need to combine them in a single large file. I typically iterate over these libraries and annotate individually from each library. The collection of libraries can change from project to project. One would then need to define multiple file paths, which is fine for me.

chufz commented 2 years ago

This should be implemented now