WormBase / genedesc_generator

Automated gene descriptions generator for model organism databases
Other
1 stars 0 forks source link

Using common ancestor for GO terms #8

Closed valearna closed 6 years ago

valearna commented 6 years ago

If the selected terms for a go aspect and a certain evidence group contain terms with a common ancestor, we want to remove those terms and keep the ancestor. This must happen after #6 requirements:

  1. the ancestor cannot be a root term
  2. we want to apply this rule only if the number of terms in the set is higher than a specified threshold
  3. we want to have a threshold on the number of links to traverse to find the common ancestor new doc 2018-02-21
valearna commented 6 years ago

The situation is much more complicated than the scenario described above since each term can have more than one parent. Why don't we use GO slim ontology and apply only the trimming step in #6 ? GO slim is a less detailed version of the GO ontology and is specifically designed for use cases that need gene summary information. We can map each GO term into their slim version using goatools (already used in other parts of the project.)

valearna commented 6 years ago

These slides explain the naive solution that I implemented to select common ancestors and trim the terms: https://docs.google.com/presentation/d/14cad8DOPA1ZFk2VdtM7NCLib6G1ciQPO_oq13-rbYeM/edit#slide=id.p

valearna commented 6 years ago

Implement trimming strategy based on GO Slim