EuBIC / EuBIC2023

EuBIC 2023 developer's meeting
https://eubic-ms.org/events/2023-developers-meeting/
12 stars 1 forks source link

Metabolomics hackathon: MS2 spectra matching for metabolite identification #13

Open mmattano opened 1 year ago

mmattano commented 1 year ago

MS2 spectra matching for metabolite identification

Abstract

One major open topic in untargeted metabolomics is identifying unknown compounds from mass spectra. As MS1 comparisons can be ambiguous (especially for small molecules), we need to look at MS2 spectra, and compare them to public MS2 databases, to differentiate compounds in the same mass range. Currently, the best performing methods for compound identification are GNPS and Sirius. They provide a user with a list of potential compounds, but in some cases the uncertainty is very high or multiple candidates are suggested, making the downstream analysis labor intensive. GNPS improves their predictions by using molecular networks and taking biological information into account. Sirius improves their predictions by comparing structural similarity of the compounds. We would like to set up a novel system, with modular parts that can be tested separately. Each aspect of the pipeline can be improved/modified individually, and multiple methods can be combined as an ensemble. In doing so, this can also serve as a benchmark of existing scoring and matching functions and a testing playground for novel ideas.

Project Plan

The general purpose is to have a(n automated) workflow for MS2 spectra matching that does not just rely on cosine similarity scoring. Subsequently, we would like to

Technical Details

Main language: Python

Contact Information

Members of the metabolomics research group lead by Thomas Moritz at the NNF Center for Basic Metabolic Research, Faculty of Health Research, University of Copenhagen

Matthias Mattanovich (matthias.mattanovich@sund.ku.dk) Muyao Xi (muyao.xi@sund.ku.dk) Lawrence Egyir (lawrence.egyir@sund.ku.dk)

tobiasko commented 1 year ago

Dear @mmattano,

I am happy to inform you that your proposal has been selected for the DevMeeting2023! Participants will decide which hackathon to join after the pitch on Monday.

Best, Tobi

tobiasko commented 1 year ago

Maybe Muyao and Lawrence could leave a short comment here so they also become participant of this issue! THX!

lawtrea commented 1 year ago

Thank you @tobiasko

MuyaoXi9271 commented 1 year ago

Thanks @tobiasko Please add me in👍

tobiasko commented 1 year ago

Hello everyone,

I just created a slack workspace for the DevMeeting and a channel named metabolomics for this hack. You should receive an invite to join by email.

Best, Tobi

mmattano commented 1 year ago

Summary paragraph

During the metabolomics related hackathon, spectral similarity scoring was explored. In order to identify a metabolite from an MS1 or MS2 spectrum, different scores are applied to match the spectrum in question to a database entry or, more commonly, an in-house library. Currently in the field, the cosine similarity score is most frequently used. Here, we set up a pipeline to compare multiple different ways to score spectral similarity and an array of variations or their respective input parameters. The data that was specifically prepared for the hackathon also allowed for statistics on false positives, false negatives, etc. Furthermore, we set up systems to test the robustness of these scores to intensity perturbations, which is very common when dealing with biological samples, and tested a possible correlation between structural- and spectral similarity.