miquelduranfrigola commented 3 weeks ago

A more comprehensive PyTest pipeline

Below I copy-paste some items we have listed together with @Abellegese on today's 1-on-1 meeting (04/11/2024).

Wishlist

[ ] Test a predefined set of models (5-10). For example, eos3b5e, eos4e40, etc. Those models are used in documentation and frequently by the team, so they always need to be functional.
[ ] Test predefined types inputs, especially, to choose between the standard runner (faster) and the conventional runner (slower).
[ ] Test corrupt inputs (these are also predefined). For example, a wrong SMILES string within a list of correct SMILES.
[ ] Test input file as CSV vs input file as list.
[ ] Run models with different modes, namely --from_dockerhub, --from_github.
[ ] Test CLI and Python API (ErsiliaModel class).
[ ] Use mocking for the Python API (built-in).
[ ] Test all commands: delete, catalog, etc. (less prioritary at the moment since we already have some tests for those).
[ ] Test speed. Make sure that it is not slower than a predefined time.
[ ] Run the README file.
[ ] Run some files from GitBook (not all of them).

Comments

Should we add additional dependencies? For example, Tox. This is fine, as long as it is in the dev environment.
If possible, do not create new YAML files for the workflows. Use/expand the current workflow files.
Do not add hard-coded URLs, for example, for Digital Ocean deployments.

Improvements

[ ] The ErsiliaModel and related Python classes may have to be improved along this process. If necessary, open a separate issue (batch or task) for this.

Objective(s)

Better testing in Ersilia

Documentation

N/A

miquelduranfrigola commented 2 weeks ago

Hi @Abellegese please mark the tasks that are completed so we can have an idea of the current status of this?

miquelduranfrigola commented 2 weeks ago

@Abellegese - let's document PyTesting extensively so it becomes easier to maintain and extend.

Noting here a few comments and questions based on previous meetings:

Mocking is used for most of the functions.
Testing the standard runner class was most challenging.
The README file can be parsed and run. This is a great achievement.
Model identifiers should be specified as variables, not inline in the text. For example, MODEL_ID = "eos3b5e".
How is the GitBook processed, @Abellegese ?

miquelduranfrigola commented 2 weeks ago

Playground tests — brainstorming

Hi @Abellegese, @DhanshreeA and @GemmaTuron. We are facing quite a few challenges these last couple of weeks with the Ersilia Model Hub. We should be testing the CLI more, and more exhaustively.

General thoughts

We should select a few reference models. I suggest the following: eos3b5e, eos4e40, eos7d58 and eos9gg2. All these models have compounds and input.
To make things easier, run only CLI commands for the tests.
No mocking. Run the actual tests. It is OK if they take long.

Predefined input file

We can use this these molecules as input (input.csv):

smiles
CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O
C1=CN=CC=C1C(=O)NN
CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O
CC(=O)OC1=CC=CC=C1C(=O)O
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
CC1(OC2C(OC(C2O1)(C#N)C3=CC=C4N3N=CN=C4N)CO)C
COC1=CC23CCCN2CCC4=CC5=C(C=C4C3C1O)OCO5

Example pipelines

Below I suggest a few pipelines to test. The bullet points in each section should map 1-to-1 to tests.

Fetch a model from GitHub, run it and finally completely delete it.

ersilia -v fetch eos3b5e --from_github
ersilia -v serve eos3b5e
ersilia -v run -i input.csv -o output_eos3b5e.csv
ersilia close
ersilia delete

fetch: Check that an eos/repository and an eos/dest/ folders are created for the model, and make sure that files are there (no need to check all of them!). Models fetched from GitHub are cloned in the dest folder, so we are expected to find specific files there.
serve: Check that the model is not served (SRV) with pulled_docker, since the model should have been fetched from GitHub.
run: Check that the output_eos3b5e.csv file does not contain null values.
close: Check that the model is no longer served after closing. Additionally, we could check that, after closing, run gives raises a message since no model is served.
delete: Check that no traces of the model are left in the eos folder.

Fetch a model from DockerHub, run it and finally completely delete it.

ersilia -v fetch eos3b5e --from_dockerhub
ersilia -v serve eos3b5e
ersilia -v run -i input.csv -o output_eos3b5e.csv
ersilia close
ersilia delete

fetch: Check that the image of the model is pulled. Additionally, check that the eos folder contains a few files that are characteristic of the fetching from docker.
serve: Check that the model is served (SRV) with pulled_docker.
run: Check that the output_eos3b5e.csv file does not contain null values.
close: Check that the model is no longer served after closing. Additionally, we could check that, after closing, run gives raises a message since no model is served. Also, check that no container related to the image remains active.
delete: Check that no traces of the model are left in the eos folder. Make sure that the image of the model is removed from the system.

Automatically decide the fetch mode when Docker is inactive or active

ersilia -v fetch eos3b5e

If possible within the action, deactivate docker.
Make sure that the fetch mode was not docker, since docker was inactive.

ersilia -v fetch eos3b5e

Activate docker again and fetch. In this case, the default fetch mode should be docker.
Make sure that the previous version of the model is automatically deleted and the model is refetched, this time with docker.

Fetch and serve multiple models

ersilia -v fetch eos3b5e --from_dockerhub
ersilia -v fetch eos4e40 --from_dockerhub
ersilia -v fetch eos7d58 --from_dockerhub
ersilia -v fetch eos9gg2 --from_dockerhub

The first model (eos3b5e) should not be fetched against since it was already fetched with docker previously.
The rest of the models should be fetched successfully.
Importantly, make sure that no container is active after fetching. All containers should have been closed after fetching.

ersilia -v serve eos3b5e
ersilia -v serve eos4e40
ersilia -v serve eos7d58
ersilia -v serve eos9gg2

It is not possible to serve multiple models from one same terminal. So, in principle, only eos9gg2 should be served at the end of this pipeline.
To serve multiple models, multiple terminals need to be opened, corresponding to multiple sessions. I am not sure how this can be tested in GitHub Actions or PyTest? Perhaps just with different subprocesses?

Standard runner and conventional runner

ersilia -v run -i input.csv > output_eos9gg2_0.json
ersilia -v run -i "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O" > output_eos9gg2_1.json
ersilia -v run -i "['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN']" > output_eos9gg2_2.json

These runs should be done by default with the conventional (old) runner.
Check that the JSON files produced contain no null values.

ersilia -v run -i input.csv -o output_eos9gg2_0.csv
ersilia -v run -i "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O" -o output_eos9gg2_1.csv
ersilia -v run -i "['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN']" -o output_eos9gg2_2.csv

These runs should be done by default with the standard runner.
Check that the CSV files produced contain no null values.

Note: (for @Abellegese) It is not clear to me when is the --standard flag used. As in, what happens if we run the following?

ersilia -v run -i input.csv > output_eos9gg2.json --standard

Are we going to run the standard runner or the conventional runner? In my opinion, the useful flag is the opposite: --non-standard, which would force the following command to run in non-standard mode:

ersilia -v run -i input.csv -o output_eos9gg2.csv --non-standard

More thoughts: If the input file does not have the appropriate header, then it is not safe to run the standard runner and we should fall back to the old runner. For example, when a file contains many columns, the standard runner is not able to resolve the right column. Please make sure that the internal resolver for standard/non-standard runs is still able to take this into account.

Even more thoughts: Likewise, when the size of the file is very large, the standard runner should not be used.

To be continued...

Let's start by testing these items, and then we can expand more! Please let me know if this sounds good.

Abellegese commented 1 week ago

Hi @miquelduranfrigola I have not seen and yet I created the pipeline to everything you specified above with just nox -s test_cli. I could update the code to support pipeline batching but will definetly mess things up.

I created a branch in my forked ersilia. So you can pass all that parameters in the config.yml file.

On the flag --standard it true by default so no need to use it. I put that if developers come up with something and want to disable it. It should be there I guess.

Abellegese commented 1 week ago

Failed log

miquelduranfrigola commented 1 week ago

Hi @Abellegese Thanks for this, it goes in the right direction I believe. It is not just about failing or not failing the execution, though. We need to be sure that the commands are actually doing what is expected from them, as specified above.

Abellegese commented 1 week ago

Hi @miquelduranfrigola yes I saw those check ups and the code will be updated accordingly.

miquelduranfrigola commented 1 week ago

Bringing model testing a step further

Hi @Abellegese and @DhanshreeA, here are some thoughts around model testing.

Background

@DhanshreeA — as you know, the testing workflows on the ersilia CLI code are much improved now, including (a) unit testing (mainly with pytest) and (b) a playground module to test sequences of CLI commands. These tests are run in a pre-selected list of models (simple ones, like eos3b5e). The primary goal of all these tests is to ensure safe commits on the ersilia CLI code.

In a meeting yesterday, @Abellege expressed his interest in developing tests that apply to the models specifically. That is, tests for the eos repos. I told him that this is the goal of the ModelTester class. I am not very familiar with this class, but what is clear is that it contains several elements that are very useful and we are still underusing. So, in my opinion, to avoid redundancy, we need to build on top of the ModelTester class. I'll let you guys take it from here.

As a reminder, there is also a ModelInspector class that was originally developed for the ersilia-maintenance repository. Although these two classes may overlap to some degree, and at first they may look redundant, there is a fundamental difference between them:

ModelTester is principally aimed at testing models that are under develpment. That is, it helps us make sure everything is in order before the model is officially marked at Ready by the team.
ModelInspector is aimed at maintenance. So, in principle, the model inspector makes sure that, once the model is incorporated into Ersilia, all the relevant assets are persistent — imagine a file is deleted from an eos repository post hoc; the model inspector is supposed to find it.

Here, we will focus only on the ModelTester class.

Ideas

I will now list some items that I believe are worth checking for each of the models, especially at model contribution time. Many of these checkups are already implemented in the ModelTester class.

1. Tests on the repository folder structure

These set of tests should check that a minimal set of files exist. Note that we have models that were packaged with BentoML and, for the new ones, we have a completely new folder structure.

For BentoML-styled models

[ ] A src/service.py file should exist.
[ ] A pack.py file should exist.
[ ] A Dockerfile should exist.
[ ] A metadata.json file should exist.

For Ersilia Pack models

[ ] An install.yml file should exist.
[ ] A metadata.json file or a metadata.yml file should exist.
[ ] A model/framework/example/input.csv and a model/framework/example/output.csv file should exist. (Note that this should be a requirement for the Ersilia Pack models but it was not for the BentoML models — we may want to rethink this).

Common

[ ] The mock.txt file should be removed.
[ ] A README.md file should exist.
[ ] A model/framework/run.sh file should exist.
[ ] A LICENSE file should exist.

2. Tests on specific files

[ ] To test the metadata, the best way will be to actually run the Information class, which actually contains many test for each of the fields on the metadata. For example, that class checks that the model description is long enough, or that tags are valid, etc.

I am unsure about testing other files specifically. Perhaps it is not necessary at this stage. In my experience, the files that give more problems are the metadata and the installation instructions. The installation instructions will be anyway tested in point 3 below.

3. Tests on different running modes

The is the true test that models need to pass, i.e. are we able to fetch and run the model? It is important to note that some of these tests will be possible to do immediately before contribution, and some after contribution. For example, for models to be in DockerHub, they will have to already have passed the workflows (unless we build the model locally, which I am not sure it is what we want to do).

Running modes before the GitHub Actions workflows (i.e. before push/merge)

[ ] Make sure that ersilia fetch $MODEL_ID --from_dir $DIR_NAME works.
[ ] Make sure that ersilia serve works.
[ ] Make sure that ersilia example -f input.csv -n 3 works.
[ ] Make sure that ersilia run -i input.csv -o output.csv works and produces an output that is not null.
[ ] Make sure that ersilia close works.
[ ] Make sure that ersilia delete $MODEL_ID works and completely deletes the model from ~/eos/dest and ~/eos/repository.

Running modes after the GitHub Actions workflows

[ ] Make sure that ersilia fetch $MODEL_ID --from_github, --from_s3 and, especially, --from_dockerhub work.
[ ] Idem.

4. Tests on computational performance and resource consumption

The ModelTester class should already provide several performance test, including number of inputs run, memory consumption, etc. I am not sure if these tests can be categorized as passed or failed to be honest. Let me just list here what I believe is most important and then we can decide together:

[ ] Model size. Size of the container, size of the model folder, size of the dependencies, etc. Assuming the base image is acceptably small (~300MB, @DhanshreeA?), the size of the container should not be much larger than the size of model folder + dependencies.
[ ] Memory consumption. We should check memory consumption for models served as containers, as well as models served from the system.
[ ] Parallelization. I don't know how this should be tested (if at all). It is beyond my expertise.
[ ] Ideally, it would be good to keep track of the time used to run 1, 10 and 100 inputs (potentially, with a timeout).

5. Other tests

Finally, a few extra tests that we may want to consider.

[ ] Model successfully handles corrupt/wrong inputs.
[ ] Model does not accidentally shuffle outputs (this happened with a few specific models in the past, and it is a very critical error). This can be tested by repeating the queries
[ ] Model always returns the same number of columns in a CSV file.
[ ] Model works with ersilia run -i $INPUT_CSV -o $OUTPUT_CSV.
[ ] Model works with ersilia run -i $INPUT_CSV.

Abellegese commented 1 week ago

I think this is well defined @miquelduranfrigola and will take it from here. In my opinion this will require carefull design and will take a while. This test is gonna be the most important as well.

miquelduranfrigola commented 5 days ago

Hi @Abellegese Whenever you have time please provide updates on this issue

Abellegese commented 5 days ago

Hi @miquelduranfrigola I coudn't modify the checkboxes(privilege?).

GemmaTuron commented 5 days ago

could be as I can tick the boxes... I don't know maybe you need to be an admin of Ersilia to do so...

Abellegese commented 5 days ago

Yes @GemmaTuron just give me privilege and will use it responsibly :).

miquelduranfrigola commented 5 days ago

Oh ok. Thanks @Abellegese - then I'll let @GemmaTuron or @DhanshreeA take action as they see fit

DhanshreeA commented 4 days ago

@Abellegese we can quickly chat whenever works for you and take care of this together.

ersilia-os / ersilia

🐕 Batch: A more comprehensive PyTest pipeline #1368

A more comprehensive PyTest pipeline

Wishlist

Comments

Improvements

Objective(s)

Documentation

Playground tests — brainstorming

General thoughts

Predefined input file

Example pipelines

Fetch a model from GitHub, run it and finally completely delete it.

Fetch a model from DockerHub, run it and finally completely delete it.

Automatically decide the fetch mode when Docker is inactive or active

Fetch and serve multiple models

Standard runner and conventional runner

To be continued...

Bringing model testing a step further

Background

Ideas

1. Tests on the repository folder structure

For BentoML-styled models

For Ersilia Pack models

Common

2. Tests on specific files

3. Tests on different running modes

Running modes before the GitHub Actions workflows (i.e. before push/merge)

Running modes after the GitHub Actions workflows

4. Tests on computational performance and resource consumption

5. Other tests