ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
210 stars 146 forks source link

🐈 Task: Merge card command with catalog. #1264

Open DhanshreeA opened 2 weeks ago

DhanshreeA commented 2 weeks ago

Summary

All Ersilia models have associated metadata which users often find useful to view quickly while working with these models. At present this can be achieved in a couple of different ways, through ersilia card and ersilia info commands. On top of this, ersilia also has a catalog command which lets user view the catalog of Ersilia models present within the hub, or on their system. To reduce the complexity within ersilia's suite of CLI commands, we want to consolidate the functionality of viewing model metadata through the card command with the ersilia catalog command instead.

To get an overview of the commands present within the Ersilia CLI, check out this related issue: #1262.

You can get an overview of how both ersilia card, and ersiila info operate in the JSON snippets shared below.

ersilia card <model-id> outputs information that is similar to what ersilia info --as_json outputs. For example, running ersilia card eos3b5e prints the following:


{
    "pack_mode": "conda",
    "service_class": "pulled_docker",
    "apis_list": [
        "run"
    ],
    "api_schema": {
        "run": {
            "input": {
                "key": {
                    "type": "string"
                },
                "input": {
                    "type": "string"
                },
                "text": {
                    "type": "string"
                }
            },
            "output": {
                "outcome": {
                    "type": "numeric_array",
                    "shape": [
                        1
                    ],
                    "meta": [
                        "value"
                    ]
                }
            }
        }
    },
    "size": 891.7064189910889,
    "metadata": {
        "Identifier": "eos3b5e",
        "Slug": "molecular-weight",
        "Status": "Ready",
        "Title": "Molecular weight",
        "Description": "The model is simply an implementation of the function Descriptors.MolWt of the chemoinformatics package RDKIT. It takes as input a small molecule (SMILES) and calculates its molecular weight in g/mol.\n",
        "Mode": "Pretrained",
        "Input": [
            "Compound"
        ],
        "Input Shape": "Single",
        "Task": [
            "Regression"
        ],
        "Output": [
            "Other value"
        ],
        "Output Type": [
            "Float"
        ],
        "Output Shape": "Single",
        "Interpretation": "Calculated molecular weight (g/mol)",
        "Tag": [
            "Molecular weight"
        ],
        "Publication": "https://www.rdkit.org/docs/RDKit_Book.html",
        "Source Code": "https://github.com/rdkit/rdkit",
        "License": "BSD-3.0",
        "Contributor": "miquelduranfrigola",
        "S3": "https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3b5e.zip",
        "DockerHub": "https://hub.docker.com/r/ersiliaos/eos3b5e",
        "Docker Architecture": [
            "AMD64",
            "ARM64"
        ]
    },
    "card": {
        "Identifier": "eos3b5e",
        "Slug": "molecular-weight",
        "Status": "Ready",
        "Title": "Molecular weight",
        "Description": "The model is simply an implementation of the function Descriptors.MolWt of the chemoinformatics package RDKIT. It takes as input a small molecule (SMILES) and calculates its molecular weight in g/mol.\n",
        "Mode": "Pretrained",
        "Input": [
            "Compound"
        ],
        "Input Shape": "Single",
        "Task": [
            "Regression"
        ],
        "Output": [
            "Other value"
        ],
        "Output Type": [
            "Float"
        ],
        "Output Shape": "Single",
        "Interpretation": "Calculated molecular weight (g/mol)",
        "Tag": [
            "Molecular weight"
        ],
        "Publication": "https://www.rdkit.org/docs/RDKit_Book.html",
        "Source Code": "https://github.com/rdkit/rdkit",
        "License": "BSD-3.0",
        "Contributor": "miquelduranfrigola",
        "S3": "https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3b5e.zip",
        "DockerHub": "https://hub.docker.com/r/ersiliaos/eos3b5e",
        "Docker Architecture": [
            "AMD64",
            "ARM64"
        ]
    }
}

while running ersilia info --as_json when the model is served prints:

{
    "pack_mode": "conda",
    "service_class": "pulled_docker",
    "apis_list": [
        "run"
    ],
    "api_schema": {
        "run": {
            "input": {
                "key": {
                    "type": "string"
                },
                "input": {
                    "type": "string"
                },
                "text": {
                    "type": "string"
                }
            },
            "output": {
                "outcome": {
                    "type": "numeric_array",
                    "shape": [
                        1
                    ],
                    "meta": [
                        "value"
                    ]
                }
            }
        }
    },
    "size": 891.7064189910889,
    "metadata": {
        "Identifier": "eos3b5e",
        "Slug": "molecular-weight",
        "Status": "Ready",
        "Title": "Molecular weight",
        "Description": "The model is simply an implementation of the function Descriptors.MolWt of the chemoinformatics package RDKIT. It takes as input a small molecule (SMILES) and calculates its molecular weight in g/mol.\n",
        "Mode": "Pretrained",
        "Input": [
            "Compound"
        ],
        "Input Shape": "Single",
        "Task": [
            "Regression"
        ],
        "Output": [
            "Other value"
        ],
        "Output Type": [
            "Float"
        ],
        "Output Shape": "Single",
        "Interpretation": "Calculated molecular weight (g/mol)",
        "Tag": [
            "Molecular weight"
        ],
        "Publication": "https://www.rdkit.org/docs/RDKit_Book.html",
        "Source Code": "https://github.com/rdkit/rdkit",
        "License": "BSD-3.0",
        "Contributor": "miquelduranfrigola",
        "S3": "https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3b5e.zip",
        "DockerHub": "https://hub.docker.com/r/ersiliaos/eos3b5e",
        "Docker Architecture": [
            "AMD64",
            "ARM64"
        ]
    },
    "card": {
        "Identifier": "eos3b5e",
        "Slug": "molecular-weight",
        "Status": "Ready",
        "Title": "Molecular weight",
        "Description": "The model is simply an implementation of the function Descriptors.MolWt of the chemoinformatics package RDKIT. It takes as input a small molecule (SMILES) and calculates its molecular weight in g/mol.\n",
        "Mode": "Pretrained",
        "Input": [
            "Compound"
        ],
        "Input Shape": "Single",
        "Task": [
            "Regression"
        ],
        "Output": [
            "Other value"
        ],
        "Output Type": [
            "Float"
        ],
        "Output Shape": "Single",
        "Interpretation": "Calculated molecular weight (g/mol)",
        "Tag": [
            "Molecular weight"
        ],
        "Publication": "https://www.rdkit.org/docs/RDKit_Book.html",
        "Source Code": "https://github.com/rdkit/rdkit",
        "License": "BSD-3.0",
        "Contributor": "miquelduranfrigola",
        "S3": "https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3b5e.zip",
        "DockerHub": "https://hub.docker.com/r/ersiliaos/eos3b5e",
        "Docker Architecture": [
            "AMD64",
            "ARM64"
        ]
    }
}

These outputs above consist of session information about the model, as well as model metadata. There's another distinction between ersilia info, and ersilia card, in that the former only runs for a served model because it makes use of a locally available file information.json to return this information; while ersilia card can make use of remote resources such as the database of Ersilia models maintained within S3 or AirTable.

The card command was designed as a drop in replacement for info, but for models that are not actually running on a user's system, or are even present on the user's system. Since ersilia catalog currently serves the functionality for listing models within the Hub, it would make sense to add a --card flag to actually print this information should a user request it along with the catalog itself. However we would need to constraint this to work with only one model at a time otherwise this would end up printing a lot of information on the terminal by fetching metadata for all models within ersilia.

In a nutshell, we want the command to look as follows: ersilia catalog --card <model-id, for example, ersilia catalog --card eos3b5e.

Objective(s)

Steps

  1. Make sure you are able to run ersilia card <model-id> for any ersilia model. You can browse a list of available models here.
  2. Share the update from the command here.
  3. Continue on to de-registering the command from the CLI.
  4. In the [catalog.py](https://github.com/ersilia-os/ersilia/blob/master/ersilia/cli/commands/catalog.py) module, proceed to create a new flag --card, and re-implement the card function there. Note that in this step we can ignore flags for the card command, and simply use the ModelCard class.
  5. Share your output from running the command ersilia catalog --card <model-id> here.

Documentation

No response

teddyCodex commented 5 days ago

Hi @DhanshreeA I'd like to work on this task please

DhanshreeA commented 5 days ago

Hi @teddyCodex thank you for your interest. Currently we are only assigning issues to Outreachy contributors. If you are not an approved Outreachy applicant, we cannot assign you this issue. Thank you for your understanding.

teddyCodex commented 5 days ago

Hi @DhanshreeA I am an approved applicant. My tag on Slack is Samted Uche.

DhanshreeA commented 5 days ago

Hi @teddyCodex go ahead.

teddyCodex commented 5 days ago

Step 1 Ersilia Model Cards.pdf

Hi @DhanshreeA Checked model cards (ersilia card <model>) for the first 10 models I grabbed via ersilia catalog.

I noticed that some of the models appear in the catalog but do not exist on https://www.ersilia.io/model-hub or on Docker Hub. This may require further investigation.

Please clarify if I am to replicate this process (checking model cards) for all the available models. Pending your response, I'll start working on Step 2.

teddyCodex commented 5 days ago

Hello @DhanshreeA Forked a local copy of the repo and de-registered the card command.

Proceeded to implement the card command in the catalog.py file using the ModelCard class to fetch the metadata.

command tested on local: ersilia catalog --card eos935d

output:

{
    "Identifier": "eos935d",
    "Input": [
        "Compound"
    ],
    "Mode": "Pretrained",
    "GitHub": "https://github.com/ersilia-os/eos935d",
    "Publication": "https://pubs.rsc.org/en/content/articlelanding/2020/sc/d0sc02639e#fn1",
    "Source Code": "https://github.com/KavrakiLab/MetaTrans",
    "License": "BSD-3.0",
    "Output": [
        "Compound"
    ],
    "Description": "Small molecules are metabolized by the liver in what is known as phase I and phase II reactions. Those can lead to reduced drug efficacy and generation of toxic metabolites, causing serious side effects. This model predicts the human metabolites of small molecules using a molecular transformer pr-trained on general chemical reactions and fine tuned to human metabolism. It provides up to 10 metabolites for each input molecule.\n",
    "Status": "Ready",
    "Slug": "meta-trans",
    "Title": "MetaTrans: human drug metabolites",
    "Tag": [
        "Metabolism"
    ],
    "Input Shape": "Single",
    "Interpretation": "A maximum of 10 human metabolites generated from the input molecule",
    "Task": [
        "Generative"
    ],
    "Contributor": "carcablop",
    "Output Shape": "List",
    "Output Type": [
        "String"
    ],
    "DockerHub": "https://hub.docker.com/r/ersiliaos/eos935d",
    "Docker Architecture": [
        "AMD64"
    ],
    "S3": "https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos935d.zip",
    "Deployment": "Local",
    "Repository": {
        "label": "GitHub",
        "url": "https://github.com/ersilia-os/eos935d"
    },
    "Code": "$ ersilia serve meta-trans\n$ ersilia api -i 'CCCOCCC'\n$ ersilia close",
    "Date": "2022-12-20",
    "Calculation": "https://github.com/carcablop"
}
teddyCodex commented 5 days ago

Second test: ersilia catalog --card eos3b5e

output:

{
    "Identifier": "eos3b5e",
    "Input": [
        "Compound"
    ],
    "Mode": "Pretrained",
    "GitHub": "https://github.com/ersilia-os/eos3b5e",
    "Publication": "https://www.rdkit.org/docs/RDKit_Book.html",
    "Source Code": "https://github.com/rdkit/rdkit",
    "License": "BSD-3.0",
    "Output": [
        "Other value"
    ],
    "Description": "The model is simply an implementation of the function Descriptors.MolWt of the chemoinformatics package RDKIT. It takes as input a small molecule (SMILES) and calculates its molecular weight in g/mol.\n",
    "Status": "Ready",
    "Slug": "molecular-weight",
    "Title": "Molecular weight",
    "Tag": [
        "Molecular weight"
    ],
    "Input Shape": "Single",
    "Interpretation": "Calculated molecular weight (g/mol)",
    "Task": [
        "Regression"
    ],
    "Contributor": "miquelduranfrigola",
    "Output Shape": "Single",
    "Output Type": [
        "Float"
    ],
    "DockerHub": "https://hub.docker.com/r/ersiliaos/eos3b5e",
    "Docker Architecture": [
        "AMD64",
        "ARM64"
    ],
    "S3": "https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3b5e.zip",
    "Runtime": [
        "CPU"
    ],
    "Deployment": "Local",
    "Repository": {
        "label": "GitHub",
        "url": "https://github.com/ersilia-os/eos3b5e"
    },
    "Code": "$ ersilia serve molecular-weight\n$ ersilia api -i 'CCCOCCC'\n$ ersilia close",
    "Date": "2021-09-13",
    "Calculation": "https://github.com/miquelduranfrigola"
}
teddyCodex commented 5 days ago

Additionally, the command ersilia card eos935d will now produce an error:

Usage: ersilia [OPTIONS] COMMAND [ARGS]...
Try 'ersilia --help' for help.

Error: No such command 'card'.