Ian Walsh, Dmytro Fishman, Dario Garcia-Gasulla, Tiina Titma, Gianluca Pollastri, The ELIXIR Machine Learning focus group, Jennifer Harrow, Fotis E. Psomopoulos, & Silvio C.E. Tosatto
Modern biology frequently relies on machine learning to provide predictions and improve decision processes. There have been recent calls for more scrutiny on machine learning performance and possible limitations.
The aim of these community-wide recommendations is to help establish standards of supervised machine learning validation in biology, by adopting a structured methods description for machine learning based on Data, Optimization, Model and Evaluation (DOME). The recommendations are formulated as questions to anyone wishing to pursue implementation of a machine learning algorithm. Answers to these questions can be easily included in the supplementary material of published papers.
Our goal is to act as a single point of reference for best practices, guidelines and recommendations for Machine Learning in Life Sciences. The current set of recommendations are made primarily for the case of supervised learning in biology in the absence of direct experimental validation, as this is the most common type of ML approach used.
The data (that also includes a YAML
form of the DOME recommendations) are under a . The code that parses the YAML
in order to produce a tabular output as an excel file, is under a
Our goal is to expand and extend the DOME recommendations to other fields of ML, like unsupervised, semi-supervised and reinforcement learning, as well as other Life Science domains.
As we gather feedback, and as the field evolves, we plan to publish comprehensive updates to the DOME recommendations.
The DOME machine learning summary table and examples for it can be found in the /data directory.
CodeOcean capsule is available here.
More info to be added here