ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
224 stars 147 forks source link

Performace Improvement on the run CLI(ersilia-os#1299) #1353

Closed Abellegese closed 3 weeks ago

Abellegese commented 3 weeks ago

Thank you for taking your time to contribute to Ersilia, just a few checks before we proceed

Description

Overall performance of the run CLI has been increased substantially by making the changes below(comparison results will be given at #1299)

1) Asynchronous and concurrent based key converter at the compound identifier 2) Caching(with dynamic custom cache size) key conversion results for smiles both from the APIs and from rdkit (main performance boost) 3) Adding a valid smiles checker 4) api schema and information file reader function improvement 5) reduction in the unnecessary heavy app layers 6) more

Related to #1299

DhanshreeA commented 3 weeks ago

Hey @Abellegese this looks fantastic! Please implement the small comments around code clean up/refactoring. And if you could add a note explaining your usage of nest_asyncio, including as much detail as possible - what was the problem you encountered, which pipelines were failing (you can paste logs as well). This would be good documentation for coming back to it in the future.

Abellegese commented 3 weeks ago

Issue Summary

In one the stages I encountered an error due to multiple asynchronous event loops. While running locally and using nox, the app worked fine. However, GitHub Actions raised a RuntimeError related to "multiple async event loops" due to the Jupyter notebook in the final test stage leaking its async event loop, leading to interference across stages.

Files Impacted

These files initialize asynchronous event loops:

GitHub Workflow Impacted Stages

The issue occurs between the first and last stages:

  1. Deploy and test ersilia on PR / build (pull_request)
  2. test-colab-notebook (notebook execution leaks async event loop)

Solution: Using nest_asyncio

To handle nested event loops and prevent inter-stage communication issues, I integrated nest_asyncio, which allows us to reuse the event loop in each file without conflicts. nest_asyncio is compatible across Python versions and has resolved the issue in GitHub Actions.