Open dmccloskey opened 7 years ago
Other resources from Hanne:
./src/topp/SpecLibSearcher.cpp ./src/utils/MetaboliteSpectralMatcher.cpp https://github.com/OpenMS/OpenMS/pull/2874 -- FeatureFinderMetaboIdent
Sebastian Boecker paper https://www.biorxiv.org/content/early/2017/02/17/109389
spectral comparison fxn: https://github.com/OpenMS/OpenMS/tree/develop/src/openms/source/COMPARISON/SPECTRA
HI, @dmccloskey I recently rewrote the underlying classes of SpecLibSearcher. When I looked at the code, I realized that it should be easily applicable to metabolomics. This renders MetaboliteSpectralMatcher a bit superfluous. There are some minor differences though in terms of supported file formats. MetaboliteSpectralMatcher only supports an in-house mzML file as database. SpecLibSearcher tries to support more of the existing database formats - but right now is more focused on proteomics. I think it would be a good idea to consider SpecLibSearcher as it has the better underlying datastructures. It's also reasonably fast (on our node about 250 000 comparisons / s) with the new datastructures.
Hi @timosachsenberg, OK we will base the matchSpectrum method on the functionality of SpecLibSearcher. Do you have a list input file formats that SpecLibSearcher supports? It would be interesting for us to see if they match what is given by e.g., the NIST database.
I recently gave it a try when I improved the core data structures and I realized that reading the spectral databases is still a weak part in OpenMS (e.g., they are also not very well standardized). It should not be too much work to get this working but probably requires some additional code for parsing. I just did not find the time yet to do so but I could certainly provide some help.
Hi @timosachsenberg, we should have some time to tackle this problem. Do you mind giving us an overview of what is required to parse the spectral databases? We can also setup a Skype meeting to go over it if there are too many technical aspects needed to discuss on the comments.
Hmm I honestly don't know. I think we currently don't have access to the NIST spectra as these seem to be only commercially available. Happy to have a quick skype session this week.
@timosachsenberg: I have a copy of it actually. I am very unfamiliar with the file formats so I am not sure what files correspond to the correct format.
A Skype session would be great. Tentative goals for the meeting:
Reformatting the info:
The samples was derivatized with MSTFA. I got a NIST match for:
bin_size
and bin_spread
(e.g., bin_size = 1.0, bin_spread = 0.0)Check for the case of e.g., "Hexestrol"
problem: MSP file parsing speed was slow
solution: switching from vector to set, optimization of regular expressions (simplier and fewer), and removal of LOG_DEBUG
problem: spectral data for tests using Database X
solution: utilize a few dummy spectrum instead of the Database X spectrum
Objectives
Feature plan
Pre-existing OpenMS classes that maybe of help: