ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
225 stars 147 forks source link

🐕 Batch: A more comprehensive PyTest pipeline #1368

Open miquelduranfrigola opened 3 weeks ago

miquelduranfrigola commented 3 weeks ago

A more comprehensive PyTest pipeline

Below I copy-paste some items we have listed together with @Abellegese on today's 1-on-1 meeting (04/11/2024).

Wishlist

Comments

Improvements

Objective(s)

Documentation

N/A

miquelduranfrigola commented 2 weeks ago

Hi @Abellegese please mark the tasks that are completed so we can have an idea of the current status of this?

miquelduranfrigola commented 2 weeks ago

@Abellegese - let's document PyTesting extensively so it becomes easier to maintain and extend.

Noting here a few comments and questions based on previous meetings:

  1. Mocking is used for most of the functions.
  2. Testing the standard runner class was most challenging.
  3. The README file can be parsed and run. This is a great achievement.
  4. Model identifiers should be specified as variables, not inline in the text. For example, MODEL_ID = "eos3b5e".
  5. How is the GitBook processed, @Abellegese ?
miquelduranfrigola commented 2 weeks ago

Playground tests — brainstorming

Hi @Abellegese, @DhanshreeA and @GemmaTuron. We are facing quite a few challenges these last couple of weeks with the Ersilia Model Hub. We should be testing the CLI more, and more exhaustively.

General thoughts

Predefined input file

We can use this these molecules as input (input.csv):

smiles
CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O
C1=CN=CC=C1C(=O)NN
CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O
CC(=O)OC1=CC=CC=C1C(=O)O
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
CC1(OC2C(OC(C2O1)(C#N)C3=CC=C4N3N=CN=C4N)CO)C
COC1=CC23CCCN2CCC4=CC5=C(C=C4C3C1O)OCO5

Example pipelines

Below I suggest a few pipelines to test. The bullet points in each section should map 1-to-1 to tests.

Fetch a model from GitHub, run it and finally completely delete it.

ersilia -v fetch eos3b5e --from_github
ersilia -v serve eos3b5e
ersilia -v run -i input.csv -o output_eos3b5e.csv
ersilia close
ersilia delete

Fetch a model from DockerHub, run it and finally completely delete it.

ersilia -v fetch eos3b5e --from_dockerhub
ersilia -v serve eos3b5e
ersilia -v run -i input.csv -o output_eos3b5e.csv
ersilia close
ersilia delete

Automatically decide the fetch mode when Docker is inactive or active

ersilia -v fetch eos3b5e
ersilia -v fetch eos3b5e

Fetch and serve multiple models

ersilia -v fetch eos3b5e --from_dockerhub
ersilia -v fetch eos4e40 --from_dockerhub
ersilia -v fetch eos7d58 --from_dockerhub
ersilia -v fetch eos9gg2 --from_dockerhub
ersilia -v serve eos3b5e
ersilia -v serve eos4e40
ersilia -v serve eos7d58
ersilia -v serve eos9gg2

Standard runner and conventional runner

ersilia -v run -i input.csv > output_eos9gg2_0.json
ersilia -v run -i "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O" > output_eos9gg2_1.json
ersilia -v run -i "['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN']" > output_eos9gg2_2.json
ersilia -v run -i input.csv -o output_eos9gg2_0.csv
ersilia -v run -i "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O" -o output_eos9gg2_1.csv
ersilia -v run -i "['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN']" -o output_eos9gg2_2.csv

Note: (for @Abellegese) It is not clear to me when is the --standard flag used. As in, what happens if we run the following?

ersilia -v run -i input.csv > output_eos9gg2.json --standard

Are we going to run the standard runner or the conventional runner? In my opinion, the useful flag is the opposite: --non-standard, which would force the following command to run in non-standard mode:

ersilia -v run -i input.csv -o output_eos9gg2.csv --non-standard

More thoughts: If the input file does not have the appropriate header, then it is not safe to run the standard runner and we should fall back to the old runner. For example, when a file contains many columns, the standard runner is not able to resolve the right column. Please make sure that the internal resolver for standard/non-standard runs is still able to take this into account.

Even more thoughts: Likewise, when the size of the file is very large, the standard runner should not be used.

To be continued...

Let's start by testing these items, and then we can expand more! Please let me know if this sounds good.

Abellegese commented 1 week ago

Hi @miquelduranfrigola I have not seen and yet I created the pipeline to everything you specified above with just nox -s test_cli. I could update the code to support pipeline batching but will definetly mess things up.

I created a branch in my forked ersilia. So you can pass all that parameters in the config.yml file.

On the flag --standard it true by default so no need to use it. I put that if developers come up with something and want to disable it. It should be there I guess.

Abellegese commented 1 week ago

playground-success.png

Failed log

playground-error.png

miquelduranfrigola commented 1 week ago

Hi @Abellegese Thanks for this, it goes in the right direction I believe. It is not just about failing or not failing the execution, though. We need to be sure that the commands are actually doing what is expected from them, as specified above.

Abellegese commented 1 week ago

Hi @miquelduranfrigola yes I saw those check ups and the code will be updated accordingly.

miquelduranfrigola commented 1 week ago

Bringing model testing a step further

Hi @Abellegese and @DhanshreeA, here are some thoughts around model testing.

Background

@DhanshreeA — as you know, the testing workflows on the ersilia CLI code are much improved now, including (a) unit testing (mainly with pytest) and (b) a playground module to test sequences of CLI commands. These tests are run in a pre-selected list of models (simple ones, like eos3b5e). The primary goal of all these tests is to ensure safe commits on the ersilia CLI code.

In a meeting yesterday, @Abellege expressed his interest in developing tests that apply to the models specifically. That is, tests for the eos repos. I told him that this is the goal of the ModelTester class. I am not very familiar with this class, but what is clear is that it contains several elements that are very useful and we are still underusing. So, in my opinion, to avoid redundancy, we need to build on top of the ModelTester class. I'll let you guys take it from here.

As a reminder, there is also a ModelInspector class that was originally developed for the ersilia-maintenance repository. Although these two classes may overlap to some degree, and at first they may look redundant, there is a fundamental difference between them:

Here, we will focus only on the ModelTester class.

Ideas

I will now list some items that I believe are worth checking for each of the models, especially at model contribution time. Many of these checkups are already implemented in the ModelTester class.

1. Tests on the repository folder structure

These set of tests should check that a minimal set of files exist. Note that we have models that were packaged with BentoML and, for the new ones, we have a completely new folder structure.

For BentoML-styled models

For Ersilia Pack models

Common

2. Tests on specific files

I am unsure about testing other files specifically. Perhaps it is not necessary at this stage. In my experience, the files that give more problems are the metadata and the installation instructions. The installation instructions will be anyway tested in point 3 below.

3. Tests on different running modes

The is the true test that models need to pass, i.e. are we able to fetch and run the model? It is important to note that some of these tests will be possible to do immediately before contribution, and some after contribution. For example, for models to be in DockerHub, they will have to already have passed the workflows (unless we build the model locally, which I am not sure it is what we want to do).

Running modes before the GitHub Actions workflows (i.e. before push/merge)

Running modes after the GitHub Actions workflows

4. Tests on computational performance and resource consumption

The ModelTester class should already provide several performance test, including number of inputs run, memory consumption, etc. I am not sure if these tests can be categorized as passed or failed to be honest. Let me just list here what I believe is most important and then we can decide together:

5. Other tests

Finally, a few extra tests that we may want to consider.

Abellegese commented 1 week ago

I think this is well defined @miquelduranfrigola and will take it from here. In my opinion this will require carefull design and will take a while. This test is gonna be the most important as well.

miquelduranfrigola commented 5 days ago

Hi @Abellegese Whenever you have time please provide updates on this issue

Abellegese commented 5 days ago

Hi @miquelduranfrigola I coudn't modify the checkboxes(privilege?).

GemmaTuron commented 5 days ago

could be as I can tick the boxes... I don't know maybe you need to be an admin of Ersilia to do so...

Abellegese commented 5 days ago

Yes @GemmaTuron just give me privilege and will use it responsibly :).

miquelduranfrigola commented 5 days ago

Oh ok. Thanks @Abellegese - then I'll let @GemmaTuron or @DhanshreeA take action as they see fit

DhanshreeA commented 4 days ago

@Abellegese we can quickly chat whenever works for you and take care of this together.