Stortebecker / metabolomics

A repo to exchange and discuss metabolomics issues at the ZBSA Freiburg.
5 stars 3 forks source link

Free spectra database formats (`.msp` and `.msl`) #3

Open Stortebecker opened 7 years ago

Stortebecker commented 7 years ago

.msp seems to be the commonly used open format (text based) for metabolomics spectra. Free databases like the Fiehn database and the Golm database are provided in this format. The commercial NIST database is provided in a closed format (NIST db file), but can be converted to .msp using the "NIST MS search" software (freeware).

Yesterday we saw that each database uses a different syntax for their .msp files. All formats seem to contain exactly the same information, but the text file looks a bit different. Obvious differences we saw immediately were:

Manuel, can you please provide examples for each .msp (one spectrum will be sufficient for now - if possible exactly the same spectrum in the three formats)?

@korseby Do converters exist for .msp file formats? Are there international conventions for the format?

@korseby @blankclemens If no converter already exists, please coordinate whom is going to write one.

Stortebecker commented 7 years ago

Also, there is a second free (?) format .msl. Do converters exist for .msp to .msl and vice-versa?

@korseby @blankclemens Which format is superior? Which format is more compatible to open source / closed source software? Please consider .msl format, if you should start writing a converter.

Stortebecker commented 7 years ago

@korseby Could you please help filling my list of spectra databases?

@blankclemens Could you please check, why this file is not correctly displayed?

korseby commented 7 years ago

@Stortebecker The MONA guys are using the format. AFAIK it is a plain-text format. See: http://mona.fiehnlab.ucdavis.edu/downloads under Download Spectra.

blankclemens commented 7 years ago

@Stortebecker Done. 8f005d61dff71280b6f9dd7a4d9506adce385715

MSchlprt commented 7 years ago

@Stortebecker unfortunately we do not have one particular spectra in different .msp formats however we have same databases in .msl and .msp (see Golm database)

Golm Database in .msp and .msl format databases.zip

fiehnlib database (best open-source one) in .msp generated by free lib2nist conversion tool http://chemdata.nist.gov/mass-spc/ms-search/Library_conversion_tool.html fiehn_alk_simplenames_nist.zip

in house database build up and exported with NIST MSSearch in .msp format (Flo mentioned that we are able to export also the commercial nist mainlib to .msp format which should be in that format-type of .msp as it is also exported by NIST MSSearch) CFM-Standards.zip

MSchlprt commented 7 years ago

@Stortebecker @blankclemens Additionally there is a updated fiehnlib database for GC-MS (MONA_export_GC-MS) in .msp which also has another filestructure in .msp MoNA-export-GC-MS-msp.zip

Stortebecker commented 7 years ago

@korseby Do converters exist for .msp file formats? Are there international conventions for the format?

Stortebecker commented 7 years ago

International conventions should be here, if they already exist: http://metabolomicssociety.org/

Stortebecker commented 7 years ago

@korseby told me that W4M probably has a converter for the different formats, because they built up an internal database using Golm and other free databases.

@blankclemens Can you check if you can find this converter and maybe even a Galaxy wrapper for it? If it is not freely available, maybe the W4Ms are willing to share it? Also MONA must have some way of converting the data into their own format, maybe they are more cooperative?

MSchlprt commented 7 years ago

@Stortebecker @blankclemens @korseby

We tried different things with metaMS.runGC tool (Galaxy - Freiburg VS1.1) first: using subgroups for predicting your dataset in treated and untreated works. Upload and processing to obtain a peakspectra.msp (suitable for GOLM metabolomics annotation) was successful. However the GOLM database is not that good for our application.

Second: Database option of metaMS.runGC Currently we know how the msp file should look like for using the database option of metaMS.runGC. Here you find the layout of the example .msp database (unlikely as .txt as github cannot upload .msp)

threeStdsDB.txt

Running metaMS.runGC including that database is functional however as the database is just exemplary and therefore we have no annotation. We added the entry for glutamine manually but also we were not able to annotate glutamine in the sample set.

Third: Using xcms.xcmsSet tool (Galaxy Version 2.1.0) for preprocessing prior to metaMS.runGC leads to an error.

I also shared my history with Björn, Clemens and Flo if you want to have a more detailed view.