Open miquelduranfrigola opened 3 weeks ago
Hi @Abellegese please mark the tasks that are completed so we can have an idea of the current status of this?
@Abellegese - let's document PyTesting extensively so it becomes easier to maintain and extend.
Noting here a few comments and questions based on previous meetings:
MODEL_ID = "eos3b5e"
.Hi @Abellegese, @DhanshreeA and @GemmaTuron. We are facing quite a few challenges these last couple of weeks with the Ersilia Model Hub. We should be testing the CLI more, and more exhaustively.
We can use this these molecules as input (input.csv
):
smiles
CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O
C1=CN=CC=C1C(=O)NN
CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O
CC(=O)OC1=CC=CC=C1C(=O)O
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
CC1(OC2C(OC(C2O1)(C#N)C3=CC=C4N3N=CN=C4N)CO)C
COC1=CC23CCCN2CCC4=CC5=C(C=C4C3C1O)OCO5
Below I suggest a few pipelines to test. The bullet points in each section should map 1-to-1 to tests.
ersilia -v fetch eos3b5e --from_github
ersilia -v serve eos3b5e
ersilia -v run -i input.csv -o output_eos3b5e.csv
ersilia close
ersilia delete
fetch
: Check that an eos/repository
and an eos/dest/
folders are created for the model, and make sure that files are there (no need to check all of them!). Models fetched from GitHub are cloned in the dest
folder, so we are expected to find specific files there.serve
: Check that the model is not served (SRV
) with pulled_docker
, since the model should have been fetched from GitHub.run
: Check that the output_eos3b5e.csv
file does not contain null
values.close
: Check that the model is no longer served after closing. Additionally, we could check that, after closing, run
gives raises a message since no model is served.delete
: Check that no traces of the model are left in the eos
folder.ersilia -v fetch eos3b5e --from_dockerhub
ersilia -v serve eos3b5e
ersilia -v run -i input.csv -o output_eos3b5e.csv
ersilia close
ersilia delete
fetch
: Check that the image of the model is pulled. Additionally, check that the eos
folder contains a few files that are characteristic of the fetching from docker.serve
: Check that the model is served (SRV
) with pulled_docker
.run
: Check that the output_eos3b5e.csv
file does not contain null
values.close
: Check that the model is no longer served after closing. Additionally, we could check that, after closing, run
gives raises a message since no model is served. Also, check that no container related to the image remains active.delete
: Check that no traces of the model are left in the eos
folder. Make sure that the image of the model is removed from the system.ersilia -v fetch eos3b5e
ersilia -v fetch eos3b5e
ersilia -v fetch eos3b5e --from_dockerhub
ersilia -v fetch eos4e40 --from_dockerhub
ersilia -v fetch eos7d58 --from_dockerhub
ersilia -v fetch eos9gg2 --from_dockerhub
eos3b5e
) should not be fetched against since it was already fetched with docker previously.ersilia -v serve eos3b5e
ersilia -v serve eos4e40
ersilia -v serve eos7d58
ersilia -v serve eos9gg2
eos9gg2
should be served at the end of this pipeline.ersilia -v run -i input.csv > output_eos9gg2_0.json
ersilia -v run -i "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O" > output_eos9gg2_1.json
ersilia -v run -i "['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN']" > output_eos9gg2_2.json
null
values.ersilia -v run -i input.csv -o output_eos9gg2_0.csv
ersilia -v run -i "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O" -o output_eos9gg2_1.csv
ersilia -v run -i "['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN']" -o output_eos9gg2_2.csv
null
values.Note: (for @Abellegese) It is not clear to me when is the --standard
flag used. As in, what happens if we run the following?
ersilia -v run -i input.csv > output_eos9gg2.json --standard
Are we going to run the standard runner or the conventional runner? In my opinion, the useful flag is the opposite: --non-standard
, which would force the following command to run in non-standard mode:
ersilia -v run -i input.csv -o output_eos9gg2.csv --non-standard
More thoughts: If the input file does not have the appropriate header, then it is not safe to run the standard runner and we should fall back to the old runner. For example, when a file contains many columns, the standard runner is not able to resolve the right column. Please make sure that the internal resolver for standard/non-standard runs is still able to take this into account.
Even more thoughts: Likewise, when the size of the file is very large, the standard runner should not be used.
Let's start by testing these items, and then we can expand more! Please let me know if this sounds good.
Hi @miquelduranfrigola I have not seen and
yet I created the pipeline to everything you specified above with just nox -s test_cli
. I could update the code to support pipeline batching but will definetly mess things up.
I created a branch in my forked ersilia. So you can pass all that parameters in the config.yml file.
On the flag --standard it true by default so no need to use it. I put that if developers come up with something and want to disable it. It should be there I guess.
Failed log
Hi @Abellegese Thanks for this, it goes in the right direction I believe. It is not just about failing or not failing the execution, though. We need to be sure that the commands are actually doing what is expected from them, as specified above.
Hi @miquelduranfrigola yes I saw those check ups and the code will be updated accordingly.
Hi @Abellegese and @DhanshreeA, here are some thoughts around model testing.
@DhanshreeA — as you know, the testing workflows on the ersilia CLI code are much improved now, including (a) unit testing (mainly with pytest) and (b) a playground module to test sequences of CLI commands. These tests are run in a pre-selected list of models (simple ones, like eos3b5e). The primary goal of all these tests is to ensure safe commits on the ersilia CLI code.
In a meeting yesterday, @Abellege expressed his interest in developing tests that apply to the models specifically. That is, tests for the eos repos. I told him that this is the goal of the ModelTester
class. I am not very familiar with this class, but what is clear is that it contains several elements that are very useful and we are still underusing. So, in my opinion, to avoid redundancy, we need to build on top of the ModelTester
class. I'll let you guys take it from here.
As a reminder, there is also a ModelInspector
class that was originally developed for the ersilia-maintenance repository. Although these two classes may overlap to some degree, and at first they may look redundant, there is a fundamental difference between them:
ModelTester
is principally aimed at testing models that are under develpment. That is, it helps us make sure everything is in order before the model is officially marked at Ready by the team.ModelInspector
is aimed at maintenance. So, in principle, the model inspector makes sure that, once the model is incorporated into Ersilia, all the relevant assets are persistent — imagine a file is deleted from an eos
repository post hoc; the model inspector is supposed to find it.Here, we will focus only on the ModelTester
class.
I will now list some items that I believe are worth checking for each of the models, especially at model contribution time. Many of these checkups are already implemented in the ModelTester
class.
These set of tests should check that a minimal set of files exist. Note that we have models that were packaged with BentoML and, for the new ones, we have a completely new folder structure.
src/service.py
file should exist.pack.py
file should exist.Dockerfile
should exist.metadata.json
file should exist.install.yml
file should exist.metadata.json
file or a metadata.yml
file should exist.model/framework/example/input.csv
and a model/framework/example/output.csv
file should exist. (Note that this should be a requirement for the Ersilia Pack models but it was not for the BentoML models — we may want to rethink this).mock.txt
file should be removed.README.md
file should exist.model/framework/run.sh
file should exist.LICENSE
file should exist.Information
class, which actually contains many test for each of the fields on the metadata. For example, that class checks that the model description is long enough, or that tags are valid, etc.I am unsure about testing other files specifically. Perhaps it is not necessary at this stage. In my experience, the files that give more problems are the metadata and the installation instructions. The installation instructions will be anyway tested in point 3 below.
The is the true test that models need to pass, i.e. are we able to fetch and run the model? It is important to note that some of these tests will be possible to do immediately before contribution, and some after contribution. For example, for models to be in DockerHub, they will have to already have passed the workflows (unless we build the model locally, which I am not sure it is what we want to do).
ersilia fetch $MODEL_ID --from_dir $DIR_NAME
works.ersilia serve
works.ersilia example -f input.csv -n 3
works.ersilia run -i input.csv -o output.csv
works and produces an output that is not null.ersilia close
works.ersilia delete $MODEL_ID
works and completely deletes the model from ~/eos/dest
and ~/eos/repository
.ersilia fetch $MODEL_ID --from_github
, --from_s3
and, especially, --from_dockerhub
work.The ModelTester
class should already provide several performance test, including number of inputs run, memory consumption, etc. I am not sure if these tests can be categorized as passed or failed to be honest. Let me just list here what I believe is most important and then we can decide together:
Finally, a few extra tests that we may want to consider.
ersilia run -i $INPUT_CSV -o $OUTPUT_CSV
.ersilia run -i $INPUT_CSV
.I think this is well defined @miquelduranfrigola and will take it from here. In my opinion this will require carefull design and will take a while. This test is gonna be the most important as well.
Hi @Abellegese Whenever you have time please provide updates on this issue
Hi @miquelduranfrigola I coudn't modify the checkboxes(privilege?).
could be as I can tick the boxes... I don't know maybe you need to be an admin of Ersilia to do so...
Yes @GemmaTuron just give me privilege and will use it responsibly :).
Oh ok. Thanks @Abellegese - then I'll let @GemmaTuron or @DhanshreeA take action as they see fit
@Abellegese we can quickly chat whenever works for you and take care of this together.
A more comprehensive PyTest pipeline
Below I copy-paste some items we have listed together with @Abellegese on today's 1-on-1 meeting (04/11/2024).
Wishlist
--from_dockerhub
,--from_github
.ErsiliaModel
class).delete
,catalog
, etc. (less prioritary at the moment since we already have some tests for those).Comments
Improvements
ErsiliaModel
and related Python classes may have to be improved along this process. If necessary, open a separate issue (batch or task) for this.Objective(s)
Documentation
N/A