SynBioDex / pySBOL3

Native python implementation of SBOL 3.0 specification
MIT License
37 stars 16 forks source link

Refactor refobj #441

Open bbartley opened 1 year ago

bbartley commented 1 year ago

This experimental feature branch aims to eliminate costly lookup operations. It was motivated by the fact that LabOP protocols, even relatively simple ones, start to bog down during protocol execution. Profiling showed that execution was getting bogged down by lookups and finds that take a long time to traverse over the Document tree. As the protocol executes, new objects are dynamically generated, making these document traversals ever more costly. Moreover, a conventional caching approach (e.g., Python functools) was ineffective, as the lookups are predominantly performed on nascent, uncached objects.

jakebeal commented 1 year ago

I'm quite sympathetic to this idea, having experienced the significant time costs myself.

I think that my key concern ends up being around the idea of the implementation of out-of-document objects via anonymous stubs.

What would you think of changing from anonymous stubs to objects of a subclass with a name like MissingObject or OutOfDocument? (I'm not sure whether it should be a subclass of TopLevel or of SBOLObject). Having a designated subclass would let a program that's traversing a document be able to know explicitly and positively that it's encountering an out-of-document link, and take actions like trying to resolve the link or ignoring the link.

tcmitchell commented 1 year ago

@bbartley we should merge main into this PR because two other PRs have been merged. One of the other PRs fixes the read-the-docs issue that is causing this PR's build to break. Would you like me to do the merge? I don't want to step on toes if you're actively developing.

bbartley commented 1 year ago

@tcmitchell go for it! this PR is at a stable point and the only thing failing is the read the docs. it would be great to see it go green