ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
225 stars 147 forks source link

🐅 Epic: Test and incorporate GDI caching functionalities #1359

Open miquelduranfrigola opened 4 weeks ago

miquelduranfrigola commented 4 weeks ago

Summary

Hi @Abellegese and @DhanshreeA

The GDI engagement has come to an end and we need to plan the adoption of their contributions. Below, I am writing some thoughts with the hope that, from here, we can come up with a clear sequence of tasks.

Background

The GDI contribution is aimed at caching calculations in an S3 bucket. These precalculations can be queried with Athena. At a high level, GDI work in two fronts. First, they created a model-inference-pipeline repository that leverages GitHub Actions to make pre-calculations on a reference library of 2M compounds for a given model identifier. This pipeline eventually caches results in AWS. Second, they contributed on the Ersilia CLI to provide a client that is able to query the results seamlessly using the Ersilia commands.

Below is a tentative list of tasks to be completed in order.

1. The model inference pipeline

2. The Ersilia CLI client

3. Scheduled running

As discussed, let's use this as a framework to start this work. Please feel free to revise the list of tasks and convert them into more granular batches and tasks. Also, feel free to add or remove tasks.

Objective(s)

  1. Test GDI's precalculations pipeline.
  2. Integrate the querying system in the Ersilia CLI.
  3. Schedule prediction runs.

Documentation

Check this folder for more information about this project. The information might be currently outdated but it gives a good idea of the full scope of hte project.