EuBIC / EuBIC2020

3 stars 0 forks source link

Building a Gold-Standard Protein Sequence Dataset for Functional Annotation #5

Closed rababerladuseladim closed 4 years ago

rababerladuseladim commented 4 years ago

Abstract

Metaproteomics is the analysis of proteins in samples composed of multiple organisms. One major use case is the investigation of the functional composition of a sample. Multiple tools can connect identified sequences with functional information (e.g. Unipept, Prophane, MetaGOmics). Unfortunately, the performance of these tools is not easy to assess, due to a lack of data with known ground-truth at the functional level. The target benchmark dataset would consist of a diverse range of peptides/proteins with high-quality, experimentally validated functional annotations. The obstacles that need to be overcome for the creation of such a dataset are: (1) the further complicated protein inference issue in metaproteomics compared to single-organism proteomics (peptides can match to homologues in the same and multiple organisms) and (2) low annotation levels of proteins in the metaproteomic context (many proteins have no function - not even an assumed one - assigned to them). We plan to develop a concept on how the ideal gold standard dataset should be composed and generate it accordingly. Based on this dataset, a functional benchmark of the aforementioned tools can be initiated.

Work plan

Technical details

Contact information

Henning Schiebenhoefer - Robert Koch-Institut (Germany) - schiebenhoeferh@rki.de

RalfG commented 4 years ago

This hackathon project will be merged with #1 by @pverscha. See #1 for more info.