davidemms / Open_Orthobench

6 stars 4 forks source link

Orthobench Revisited

This is the GitHub repository for a complete reanalysis of the Orthobench benchmarks for orthogroup inference accuracy. It contains 70 curated Bilaterian orthogroups based on the orthogroups from the original Orthobench study.

There have been a number of major corrections to the curated orthogroups, which together improve the accuracy of the benchmarks considerably. The updated orthogroups together with the complete set of data supporting the analysis are provided in this repository. The issues page is open, so anyone can identify any further corrections that are required and can submit the data supporting the correction.

Citation

If using this work please cite:

D M Emms, S Kelly, Benchmarking Orthogroup Inference Accuracy: Revisiting Orthobench, Genome Biology and Evolution, evaa211, https://doi.org/10.1093/gbe/evaa211

As well as the original study:

Trachana, K., Larsson, T.A., Powell, S., Chen, W.‐H., Doerks, T., Muller, J. and Bork, P. (2011), Orthology prediction methods: A quality assessment using curated protein families. Bioessays, 33: 769-780. doi:10.1002/bies.201100062

Benchmarking an Orthogroup Inference Method

  1. Download BENCHMARKS.tar.gz from https://github.com/davidemms/Open_Orthobench/releases
  2. Run the method on the proteomes in BENCHMARKS/Input/
  3. Write the predicted orthogroups to a file, one orthogroup per line
  4. Run the script on the results file: python benchmark.py orthogroups_results_file.txt

The predicted orthogroups file should have one orthogroup per line, the script uses regular expressions to extract the genes from each line for a wide variety of formats. Lines starting with '#' are ignored.

Description of Files

There are two main directories

The contents of this directory are as follows.

BENCHMARKS - data & script for benchmarking orthogroup inference methods

Supporting_Data - Data supporting the inferred RefOGs

This contains two subdirectories

Data_for_RefOGs - The data associated with the inference of each RefOG

Additional_Files