INCATools / ontology-access-kit

Ontology Access Kit: A python library and command line application for working with ontologies
https://incatools.github.io/ontology-access-kit/
Apache License 2.0
116 stars 27 forks source link

Performance: `SqlImplementation.entity_metadata_map()` #676

Open joeflack4 opened 11 months ago

joeflack4 commented 11 months ago

Overview

This method seems far too slow.

Example

In this PR, using mondo.db:

    exclusion_rels: List[Tuple[str, str]] = []
    for rel in oi.entities_metadata_statements(mondo_ids, ['obo:mondo#excluded_subClassOf']):
        if rel:
            exclusion_rels.append((rel[0], rel[2]))

Each iteration of this loop took on average 15.5 seconds. Estimating that to get through ~25k mondo_ids would take ~100 hours at this rate.

matentzn commented 10 months ago

Thank you @joeflack4 this is a high priority for me! @hrshdhgd how can we get this up the ladder of priorities?

hrshdhgd commented 10 months ago

https://github.com/INCATools/ontology-access-kit/pull/679: PR on the way!

hrshdhgd commented 10 months ago

The latest version - v0.5.22 should be quicker!

matentzn commented 4 months ago

I am trying this on 0.6.6 and it is still much too slow..

for curie in all_descendants:
    metadata_map = adapter.entity_metadata_map(curie)
    if "oio:inSubset" in metadata_map:
        list_of_subsets = metadata_map["oio:inSubset"]
        for subset in list_of_subsets:
            row = {
                "id": curie,
                "subset": subset
            }
        data_subsets.append(row)

I stopped this after 16 minutes..

joeflack4 commented 4 months ago

@matentzn and others, I'm not sure if the code for .entity_metadata_map() is similar to .relationships_metadata(), but if so, it might be worth looking at this PR #659 where Chris and I discuss a few different performance refactoring options for that method.