HUPO-PSI / mzTab

mzTab Reporting MS-based Proteomics and Metabolomics Results
https://hupo-psi.github.io/mzTab
39 stars 17 forks source link

Representing Spectra Collection #222

Open YasinEl opened 2 months ago

YasinEl commented 2 months ago

Hello, and thank you for maintaining this tool!

We propose adding the following parameters to the MTD table section allowing adding mgf files or other files types like msp with consensus scans for the dataset (e.g., could be the result from spectral clustering like mscluster or consensus MS2 scans from feature extraction software). To make this possible I suggest the following parameters:

“spectral_representation[x]-location”: filename/filepath to the file “spectral_representation[x]-key-mz”: key used for precursor mz in the file “spectral_representation[x]-key-rt”: key used for precursor retention time in the file “spectral_representation[x]-key-rt-unit”: unit of rt in file (minutes or seconds) “spectral_representation[x]-key-mslevel”: can be numeric (e.g. most commonly 2) or the key giving the level (sometimes mgf files can include ms1 and ms2 or ms3, ms4, etc)

“Spectral_representation[x]-key-featureID”: key used for feature table that can be matched to feature ids in SMF table

nilshoffmann commented 6 days ago

Thanks for your input on this. I think it makes sense to locate this information in the metadata part. For location, I would propose to use URIs, as we have done also for other file references in mzTab-M (e.g. the ms_run location). Could you add an example here how key-mz and key-rt would look like? Please note that key-rt-unit may not be necessary unless this is needed for external reference since we decided to always represent retention time in seconds within mzTab-M. How would key-featureID look like? Would this be a bar separated list of feature ids?

YasinEl commented 1 day ago

Thank you for implementing!

Agree regarding the metadata part and URL for location.

Here is an example for what this would look like for the mgf below. key-rt-unit is needed because it points to mgf/msp or other formats outside mzTab-M. key-featureID is the key used in the mgf/msp which points to the feature the scan is associated with in the SMF table.


spectral_representation[x]-location: "path/to/mzmineOutput.mgf"
spectral_representation[x]-key-mz: "PEPMASS"
spectral_representation[x]-key-rt: "RTINSECONDS"
spectral_representation[x]-key-rt-unit: "seconds"
spectral_representation[x]-key-mslevel: "MSLEVEL"
spectral_representation[x]-key-featureID: "FEATURE_ID"

one entry from path/to/mzmineOutput.mgf:

BEGIN IONS
FEATURE_ID=2
MSLEVEL=2
RTINSECONDS=18.91
PEPMASS=110.00862
CHARGE=1+
MERGED_SCANS=1451,1698,1944,1572,1326,1818,2064,2310,2556
MERGED_STATS=9 / 10 (0 removed due to low quality, 1 removed due to low cosine).
FILENAME=015_Sa02_Water_POS.mzML;015_Sa02_Water_POS.mzML;015_Sa02_Water_POS.mzML;021_Sa07_Water_POS.mzML;021_Sa07_Water_POS.mzML;021_Sa07_Water_POS.mzML;021_Sa07_Water_POS.mzML;021_Sa07_Water_POS.mzML;021_Sa07_Water_POS.mzML
SCANS=2
Num peaks=116
55.018024 0.862
55.054058 0.465
56.964603 0.439
56.99868 0.527
57.489437 0.8
END IONS