AutoFlowResearch / SmartPeak

Fast and Accurate CE-, GC- and LC-MS(/MS) Data Processing
MIT License
44 stars 13 forks source link

Formal metaData extraction method from MzML reader #91

Closed dmccloskey closed 5 years ago

dmccloskey commented 6 years ago

Description

Updates to the MzML reader.

Objectives

Validation

pcolaianni commented 5 years ago

This metadata should be added during the execution of OpenMSFile::loadMSExperiment(), correct?

dmccloskey commented 5 years ago

Yes that is correct

pcolaianni commented 5 years ago

We either load data through OpenMS':

I report here an example of the ChromeleonFile input (header part):

File Path   chrom://UV_VIS_2.chm
Channel UV_VIS_2

Injection Information:
Data Vault  ChromeleonLocal
Injection   20171013_C61_ISO_P1_GA1
Injection Number    22
Position    GA1
Comment 
Processing Method   New ProcMethod
Instrument Method   HM_metode_ZorBax_0,02%_Acetic_acid_ver6
Type    Unknown
Status  Finished
Injection Date  10/13/2017
Injection Time  6:28:26 PM
Injection Volume (µl)   1.000
Dilution Factor 1.0000
Weight  1.0000

Raw Data Information:
Time Min. (min) 0.000000
Time Max. (min) 11.000000
Data Points 3301
Detector    UV
Exporting Data System   Chromeleon 7.1.3.2425
Operator    
Signal Quantity Absorbance
Signal Unit mAU
Signal Min. -1.072552
Signal Max. 155.296276
Channel UV_VIS_2
Driver Name DAD3000.dll
Channel Type    Evaluation
Min. Step (s)   0.200
Max. Step (s)   0.200
Average Step (s)    0.200

Signal Parameter Information:
Signal Info WVL:280 nm

I would like to know where I could find the information about:

I can find some of these for the ChromeleonFile case (they have basically the same name, i.e. Injection Volume or its units), but I would not know where to fetch the same info from a MSExperiment made from a FileHandler (not a chromeleon file)

dmccloskey commented 5 years ago

Let's use channel as a proxy for acquisition method name. The others that are not in the file can be left blank. Batch name should be specified in the sequence file and not the raw data file.

pcolaianni commented 5 years ago

Would I add it as a metavalue?

const String acq_method_name = whatever is contained at Channel line; // i.e. UV_VIS_2
experiment.setMetaValue("acq_method_name", acq_method_name);

Or do you see a better place for this info? Useful link: https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/nightly/html/classOpenMS_1_1MSExperiment.html (and its base classes)

dmccloskey commented 5 years ago

I think that should work for the hplc text files.

For the MS data there does appear to be some useful information potentially in experimental settings. I think getHPLC might be close to what we want but I would need to see an example.

Could you post the contents of getHPLC for one of the lcms examples?

pcolaianni commented 5 years ago

Reading in the file LCMS_MRM_Standards/mzML/150516_CM1_Level1.mzML I got all empty strings and zero values, except for temperature that shows 21. image I'm skipping the gradient_ member which seems not relevant.

pcolaianni commented 5 years ago

I will push a PR to OpenMS to update the ChromeleonFile class, so that it can parse the acq_method_name metavalue.

dmccloskey commented 5 years ago

In regards to the HPLC attribute, that is too bad. The lack of useful information is most likely due to the file converters. I went through a couple of the LC-MS mzML files and there are not attributes for anything similar to the acq_method so it looks like we will have to rely in reading this from the sequence file.

pcolaianni commented 5 years ago

Is it ok if the acquisition date and time stay blank in case it's not found? THe issue with getting the "last modified time" info from the filesystem is that currently it's not part of the standard library (it's part of c++17 but it's still experimental, I've used it in some code of mine). ANd doing that on multiple platforms requires ad-hoc code for each platform.

dmccloskey commented 5 years ago

I have ran into this problem as well. Let's leave it blank if not found for now, but please do add some pseudo code or a note that we will need to add this in. Maybe even creating an issue for it would be a good step.

pcolaianni commented 5 years ago

Created a new issue: https://github.com/dmccloskey/SmartPeak2/issues/97