ElucidataInc / ElMaven

LC-MS data processing tool for large-scale metabolomics experiments.
https://resources.elucidata.io/elmaven/
GNU General Public License v2.0
88 stars 52 forks source link

Make freely available databases importable from within El-Maven #579

Open sp-eldata opened 6 years ago

sp-eldata commented 6 years ago

From a user at UC Denver:

Perfect. Thank you. Only issue we had is that the link to the KEGG database (was there in previous versions) disappeared. Though I understand you still have a "known" database, it would be great if other freely available databases (MS and MS2) like HMDB, Metlin, KEGG, LipidMaps could be automatically imported. I understand this is asking a little bit too much... We also have in house Compound Discoverer, Progenesis LC-MS, etc, but we always liked Maven better

chubukov commented 6 years ago

@sp-eldata agreed. the "databases" included by default are unlikely to be useful.

sp-eldata commented 6 years ago

@chubukov Ideas around what we could do? Compile something more extensive or link to public databases?

chubukov commented 6 years ago

Best would be to grab the latest version of HMDB or KEGG. But both have a lot of errors. We have a mildly curated version we could contribute, but it's still not perfect. You could also just compile some recent public metabolomics datasets and just take formula and HMDB or KEGG id for the metabolites detected.

sp-eldata commented 6 years ago

@chubukov Could we start with the one that Agios has? That will be a significant improvement if not perfect.

@Raghavdata @sahil21 Do we have anything internally? Another user had requested a DB for MS2.

chubukov commented 6 years ago

I think if you're going to include ms/ms databases, it would be good to make a big effort to make sure the fragmentation widget and all the other tools that would interact with that database are actually working properly.

I'll try to get you what we have.

V.

Raghavdata commented 6 years ago

@sp-eldata We use a 1700 compound (metabolites only) DB and a 2700 compound (metabolites plus a few other small molecules such as drugs) internally. These have been curated from KEGG. This is specifically for MS1. We dont have something like this for MS2.

chubukov commented 6 years ago

I guess the "knowns" table is actually pretty close to my second suggestion (list from a typical publication). I would take off the retention times though (no reason to think they'd match anyone's method).

@Raghavdata that sounds like a good list.

sp-eldata commented 6 years ago

@Raghavdata Let's ship these out for MS?

@chubukov How do we ship out a good MS/MS database? I don't think we are using anything internally. Any public available databases worth looking at? I know about METLIN.

chubukov commented 6 years ago

@sp-eldata I don't think it makes sense to ship an "ms/ms database" that only has a single precursor and product m/z (which is what the maven source files have). I think most people expect such a database to have a full fragmentation pattern. That's why I was asking if we even really support the related features in Maven.

If you do go in that direction, there is some public stuff on MassBank. HMDB also has spectra. NIST and Metlin are good but not free. I'm sure there are many others -- I'm actually not an expert on this.

Raghavdata commented 6 years ago

@sahil21 2717_Compounds_DB (1).csv.zip