MEGA-GO / MegaGO

Calculate semantic distance for sets of Gene Ontology terms
MIT License
5 stars 2 forks source link

Improve performance #24

Closed pverscha closed 4 years ago

pverscha commented 4 years ago

This PR drastically improves the script's performance by implementing two optimizations:

1) Precompute the information content of the most informative ancestor. If the script loads for the first time, it checks if the precomputed information is available. If not, it precomputes the data, stores it as a JSON-file and loads in memory. Invocations of the script afterwards automatically use the precomputed data.

2) Computing the deepest common ancestor is cached. Once this value has been computed for a set of inputs, it will be reused.

I also split the files some more to be more concise and renamed a few input files for consistency.