🐛 Bug: Ersilia Test Command: False Positive Test Failure

kurysauce commented 2 months ago

Describe the bug.

When using the test command for certain models an error message appears indicating a failure with Ersilia. However, the model will appear to produce the desired output, indicating a false positive test failure. The faulty error message is specifically: ExampleGenerator.example() missing 1 required positional argument: 'try_predefined'.

Log Output and Error Message:

{
    “input”: {
        “key”: “VQPBIJGXSXEOCU-UHFFFAOYSA-N”,
        “input”: “COc1ccc2c(NC(=O)Nc3cccc(C(F)(F)F)n3)ccnc2c1”,
        “text”: “COc1ccc2c(NC(=O)Nc3cccc(C(F)(F)F)n3)ccnc2c1”
    },
    “output”: {
        “outcome”: [
            0.6682353222434173,
            1.0
        ]
    }
}

11:59:01 | DEBUG    | Reading card from eos1pu1
11:59:01 | DEBUG    | Reading shape from eos1pu1
11:59:01 | DEBUG    | Input Shape: Single
11:59:01 | DEBUG    | Input type is: compound
11:59:01 | DEBUG    | Input shape is: Single
11:59:01 | DEBUG    | Importing module: .types.compound
11:59:01 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
11:59:01 | DEBUG    | InputShapeSingle shape: Single

🚨Something went wrong with Ersilia 🚨 Error message: ExampleGenerator.example() missing 1 required positional argument: 'try_predefined'

Describe the steps to reproduce the behavior

No response

Operating environment

MacOS, Ersilia Virtual Environment

kurysauce commented 2 months ago

Hi @DhanshreeA . Could you check my logic of the bug for further understanding?

After navigating the codebase, I believe that the error is being produced because there are calls to the example method that do not have the try_predefined parameter defined. I am using the command and "git grep -n "example(" to find out where the parameter is not defined. It seems like the solution would be to check the code and determine whether to set this value true or false.

From your explanation this morning, were you suggesting that you want to do a complete overhaul of the try_predefined parameter, or do you wish to keep it and adjust the codebase to ensure that it is being defined properly?

I was also looking into how the example method was being defined in the ExampleGenerator class (located in ersilia/io/input.py) and I noticed that we also have predefined_done set here, but I am unsure if this is related to the bug.

kurysauce commented 2 months ago

Hi @miquelduranfrigola and @GemmaTuron I created a script to test the different methods of the ExampleGenerator class.

It seems that the example and random example methods are able to run. I tried creating a test for predefined examples and the output states it does not work, but this may be because the test I created was not correct. Here is the output after running it on my local machine:

> (ersilia) kurtenriquez@1KurtsMacbookProM1-4 ersilia % python test_example.py
> Testing Example Method
> 22:34:53 | DEBUG    | Reading card from eos1pu1
> 22:34:53 | DEBUG    | Reading shape from eos1pu1
> 22:34:53 | DEBUG    | Input Shape: Single
> 22:34:53 | DEBUG    | Input type is: compound
> 22:34:53 | DEBUG    | Input shape is: Single
> 22:34:53 | DEBUG    | Importing module: .types.compound
> 22:34:53 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
> 22:34:54 | DEBUG    | InputShapeSingle shape: Single
> 22:34:54 | DEBUG    | Trying with predefined input
> 22:34:54 | DEBUG    | Randomly sampling input
> Example method with try_predefined=True ran successfully.
> 22:34:54 | DEBUG    | Randomly sampling input
> Example method with try_predefined=False ran successfully.
> 
>  Testing random example Method
> 22:34:55 | DEBUG    | Reading card from eos1pu1
> 22:34:55 | DEBUG    | Reading shape from eos1pu1
> 22:34:55 | DEBUG    | Input Shape: Single
> 22:34:55 | DEBUG    | Input type is: compound
> 22:34:55 | DEBUG    | Input shape is: Single
> 22:34:55 | DEBUG    | Importing module: .types.compound
> 22:34:55 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
> 22:34:55 | DEBUG    | InputShapeSingle shape: Single
> random_example method ran successfully.
> 
>  Testing predefined example Method
> 22:34:55 | DEBUG    | Reading card from eos1pu1
> 22:34:55 | DEBUG    | Reading shape from eos1pu1
> 22:34:55 | DEBUG    | Input Shape: Single
> 22:34:55 | DEBUG    | Input type is: compound
> 22:34:55 | DEBUG    | Input shape is: Single
> 22:34:55 | DEBUG    | Importing module: .types.compound
> 22:34:55 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
> 22:34:55 | DEBUG    | InputShapeSingle shape: Single
> Model path: /Users/kurtenriquez/eos/dest/eos1pu1
> Checking if file exists: /Users/kurtenriquez/eos/dest/eos1pu1/example1.txt
> File exists: True
> Checking if file exists: /Users/kurtenriquez/eos/dest/eos1pu1/example2.txt
> File exists: True
> predefined_example method did not find a predefined example.
> Dummy files created at /Users/kurtenriquez/eos/dest/eos1pu1
> Temporary directory at /Users/kurtenriquez/eos/dest/eos1pu1 has been removed.

kurysauce commented 2 months ago

Additionally, I am trying to print out the input and output info but I am not seeing it being displayed on the log output? I have pushed the print statements and debugging onto this branch. I also added the return statement in the Example method and the error remains.

Log output:

> ersilia % ersilia -v test eos1pu1
> 23:03:49 | DEBUG    | Reading model information from /Users/kurtenriquez/eos/dest/eos1pu1/information.json
> 23:03:49 | DEBUG    | Reading model information from /Users/kurtenriquez/eos/dest/eos1pu1/information.json
> 23:03:49 | DEBUG    | Checking that model information is correct
> Beginning checks for eos1pu1 model information:
> Checking model ID...
> Checking model slug...
> Checking model description...
> Checking model task...
> Checking model input...
> Checking model input shape...
> Checking model output...
> Checking model output type...
> Checking model output shape...
> SUCCESS! Model information verified.
> 
> 23:03:49 | DEBUG    | Getting session from /Users/kurtenriquez/eos/session.json
> Testing model on single smiles input...
> 
> 23:03:49 | WARNING  | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
> 23:03:49 | ERROR    | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
> 23:03:50 | DEBUG    | Is fetched: True
> 23:03:50 | DEBUG    | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json
> 23:03:50 | DEBUG    | Setting BentoML AutoService for eos1pu1
> 23:03:50 | INFO     | Service class provided
> 23:03:51 | DEBUG    | Using port 51186
> 23:03:51 | DEBUG    | Starting Docker Daemon service
> 23:03:51 | DEBUG    | Creating temporary folder /var/folders/d9/g2m8l__123zgj7vpqbnhsm3c0000gn/T/ersilia-0ewmbnj9 and mounting as volume in container
> 23:03:51 | DEBUG    | Image ersiliaos/eos1pu1:latest is available locally
> 23:03:51 | DEBUG    | Using port 51187
> 23:03:51 | DEBUG    | Starting Docker Daemon service
> 23:03:51 | DEBUG    | Creating temporary folder /var/folders/d9/g2m8l__123zgj7vpqbnhsm3c0000gn/T/ersilia-z9ve92zg and mounting as volume in container
> 23:03:51 | INFO     | Done with initialization!
> 23:03:51 | INFO     | Starting runner
> 23:03:51 | DEBUG    | Trying standard API
> 23:03:51 | INFO     | You are running the app with a standard runner. Beware that this runner does not do as many checks on the input as the conventional runner: use it at your own risk.
> 23:03:51 | DEBUG    | Standard API processor started at http://0.0.0.0:51160
> 23:03:51 | DEBUG    | This is the input type: ['Compound']
> 23:03:52 | DEBUG    | This is the expected header (max 10): ['key', 'input', 'Probability', 'Prediction']
> 23:03:52 | DEBUG    | Standard CSV Api runner is not amenable for this model, input and output
> 23:03:52 | DEBUG    | Trying conventional run
> 23:03:52 | DEBUG    | No file splitting necessary!
> 23:03:52 | DEBUG    | Reading card from eos1pu1
> 23:03:52 | DEBUG    | Reading shape from eos1pu1
> 23:03:52 | DEBUG    | Input Shape: None
> 23:03:52 | DEBUG    | Input type is: compound
> 23:03:52 | DEBUG    | Input shape is: Single
> 23:03:52 | DEBUG    | Importing module: .types.compound
> 23:03:52 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
> Requirement already satisfied: chembl_webresource_client in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (0.10.9)
> Requirement already satisfied: urllib3 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.2.2)
> Requirement already satisfied: requests>=2.18.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.31.0)
> Requirement already satisfied: requests-cache~=1.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.2.1)
> Requirement already satisfied: easydict in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.13)
> Requirement already satisfied: charset-normalizer<4,>=2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.3.2)
> Requirement already satisfied: idna<4,>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.7)
> Requirement already satisfied: certifi>=2017.4.17 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (2024.7.4)
> Requirement already satisfied: attrs>=21.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.0)
> Requirement already satisfied: cattrs>=22.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.3)
> Requirement already satisfied: platformdirs>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (4.2.2)
> Requirement already satisfied: url-normalize>=1.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (1.4.3)
> Requirement already satisfied: exceptiongroup>=1.1.1 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (1.2.2)
> Requirement already satisfied: typing-extensions!=4.6.3,>=4.1.0 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (4.12.2)
> Requirement already satisfied: six in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from url-normalize>=1.4->requests-cache~=1.2->chembl_webresource_client) (1.16.0)
> 23:03:53 | DEBUG    | InputShapeSingle shape: Single
> 23:03:53 | WARNING  | Could not resolve pack method
> 23:03:53 | DEBUG    | API eos1pu1:run initialized at URL http://0.0.0.0:51160
> 23:03:53 | DEBUG    | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json
> Printing output...
> 23:03:53 | DEBUG    | Posting to run
> 23:03:53 | DEBUG    | Batch size 100
> 23:03:53 | DEBUG    | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json
> 23:04:00 | DEBUG    | Status code: 200
> 23:04:00 | DEBUG    | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json
> 23:04:00 | DEBUG    | Done with unique posting
> {
>     "input": {
>         "key": "VQPBIJGXSXEOCU-UHFFFAOYSA-N",
>         "input": "COc1ccc2c(NC(=O)Nc3cccc(C(F)(F)F)n3)ccnc2c1",
>         "text": "COc1ccc2c(NC(=O)Nc3cccc(C(F)(F)F)n3)ccnc2c1"
>     },
>     "output": {
>         "outcome": [
>             0.6682353222434173,
>             1.0
>         ]
>     }
> }
> 23:04:00 | DEBUG    | Getting session from /Users/kurtenriquez/eos/session.json
> 23:04:01 | DEBUG    | Reading card from eos1pu1
> 23:04:01 | DEBUG    | Reading shape from eos1pu1
> 23:04:01 | DEBUG    | Input Shape: None
> 23:04:01 | DEBUG    | Input type is: compound
> 23:04:01 | DEBUG    | Input shape is: Single
> 23:04:01 | DEBUG    | Importing module: .types.compound
> 23:04:01 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
> Requirement already satisfied: chembl_webresource_client in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (0.10.9)
> Requirement already satisfied: urllib3 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.2.2)
> Requirement already satisfied: requests>=2.18.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.31.0)
> Requirement already satisfied: requests-cache~=1.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.2.1)
> Requirement already satisfied: easydict in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.13)
> Requirement already satisfied: charset-normalizer<4,>=2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.3.2)
> Requirement already satisfied: idna<4,>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.7)
> Requirement already satisfied: certifi>=2017.4.17 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (2024.7.4)
> Requirement already satisfied: attrs>=21.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.0)
> Requirement already satisfied: cattrs>=22.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.3)
> Requirement already satisfied: platformdirs>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (4.2.2)
> Requirement already satisfied: url-normalize>=1.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (1.4.3)
> Requirement already satisfied: exceptiongroup>=1.1.1 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (1.2.2)
> Requirement already satisfied: typing-extensions!=4.6.3,>=4.1.0 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (4.12.2)
> Requirement already satisfied: six in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from url-normalize>=1.4->requests-cache~=1.2->chembl_webresource_client) (1.16.0)
> 23:04:02 | DEBUG    | InputShapeSingle shape: Single
> 23:04:02 | DEBUG    | Randomly sampling input
> 
> Testing model on input of 5 smiles given by 'example' command...
> 
> 23:04:02 | WARNING  | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
> 23:04:02 | ERROR    | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
> 23:04:04 | DEBUG    | Is fetched: True
> 23:04:04 | DEBUG    | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json
> 23:04:04 | DEBUG    | Setting BentoML AutoService for eos1pu1
> 23:04:04 | INFO     | Service class provided
> 23:04:04 | DEBUG    | Using port 51196
> 23:04:04 | DEBUG    | Starting Docker Daemon service
> 23:04:04 | DEBUG    | Creating temporary folder /var/folders/d9/g2m8l__123zgj7vpqbnhsm3c0000gn/T/ersilia-pmg12fsg and mounting as volume in container
> 23:04:04 | DEBUG    | Image ersiliaos/eos1pu1:latest is available locally
> 23:04:05 | DEBUG    | Using port 51197
> 23:04:05 | DEBUG    | Starting Docker Daemon service
> 23:04:05 | DEBUG    | Creating temporary folder /var/folders/d9/g2m8l__123zgj7vpqbnhsm3c0000gn/T/ersilia-1v51tlpf and mounting as volume in container
> 23:04:05 | INFO     | Done with initialization!
> 23:04:05 | INFO     | Starting runner
> 23:04:05 | DEBUG    | Trying standard API
> 23:04:05 | INFO     | You are running the app with a standard runner. Beware that this runner does not do as many checks on the input as the conventional runner: use it at your own risk.
> 23:04:05 | DEBUG    | Standard API processor started at http://0.0.0.0:51160
> 23:04:05 | DEBUG    | This is the input type: ['Compound']
> 23:04:05 | DEBUG    | This is the expected header (max 10): ['key', 'input', 'Probability', 'Prediction']
> 23:04:05 | DEBUG    | Standard CSV Api runner is not amenable for this model, input and output
> 23:04:05 | DEBUG    | Trying conventional run
> 23:04:05 | DEBUG    | No file splitting necessary!
> 23:04:05 | DEBUG    | Reading card from eos1pu1
> 23:04:05 | DEBUG    | Reading shape from eos1pu1
> 23:04:05 | DEBUG    | Input Shape: None
> 23:04:05 | DEBUG    | Input type is: compound
> 23:04:05 | DEBUG    | Input shape is: Single
> 23:04:05 | DEBUG    | Importing module: .types.compound
> 23:04:05 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
> Requirement already satisfied: chembl_webresource_client in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (0.10.9)
> Requirement already satisfied: urllib3 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.2.2)
> Requirement already satisfied: requests>=2.18.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.31.0)
> Requirement already satisfied: requests-cache~=1.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.2.1)
> Requirement already satisfied: easydict in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.13)
> Requirement already satisfied: charset-normalizer<4,>=2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.3.2)
> Requirement already satisfied: idna<4,>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.7)
> Requirement already satisfied: certifi>=2017.4.17 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (2024.7.4)
> Requirement already satisfied: attrs>=21.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.0)
> Requirement already satisfied: cattrs>=22.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.3)
> Requirement already satisfied: platformdirs>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (4.2.2)
> Requirement already satisfied: url-normalize>=1.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (1.4.3)
> Requirement already satisfied: exceptiongroup>=1.1.1 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (1.2.2)
> Requirement already satisfied: typing-extensions!=4.6.3,>=4.1.0 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (4.12.2)
> Requirement already satisfied: six in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from url-normalize>=1.4->requests-cache~=1.2->chembl_webresource_client) (1.16.0)
> 23:04:07 | DEBUG    | InputShapeSingle shape: Single
> 23:04:07 | WARNING  | Could not resolve pack method
> 23:04:07 | DEBUG    | API eos1pu1:run initialized at URL http://0.0.0.0:51160
> 23:04:07 | DEBUG    | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json
> Printing output...
> 23:04:07 | DEBUG    | Posting to run
> 23:04:07 | DEBUG    | Batch size 100
> 🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨
> 
> Error message:
> 
> 'NoneType' object is not iterable

DhanshreeA commented 2 months ago

Hi @DhanshreeA . Could you check my logic of the bug for further understanding?

After navigating the codebase, I believe that the error is being produced because there are calls to the example method that do not have the try_predefined parameter defined. I am using the command and "git grep -n "example(" to find out where the parameter is not defined. It seems like the solution would be to check the code and determine whether to set this value true or false.

From your explanation this morning, were you suggesting that you want to do a complete overhaul of the try_predefined parameter, or do you wish to keep it and adjust the codebase to ensure that it is being defined properly?

I was also looking into how the example method was being defined in the ExampleGenerator class (located in ersilia/io/input.py) and I noticed that we also have predefined_done set here, but I am unsure if this is related to the bug.

Hey @kurysauce,

You are absolutely right, in [ModelTester.check_example_input()](https://github.com/ersilia-os/ersilia/blob/61bb8857649949fb5d90646d673e2ca969fc2fb8/ersilia/publish/test.py#L356), the call to example from ExampleGenerator is not respecting the method's signature. We do indeed need to set the try_predefined variable, and I think we can safely set it to False for now because not every model has an example file, and we need the test command to generalize both to previous models as well as current models. Another solution is to add a predefined flag to the test command (quite like how it exists within the example) command, and then pass it onto the example command when that part of the code gets run. I'm inclined towards the latter.
No we do not need to do an overhaul of the try_predefined approach. Generally speaking, the ExampleGenerator class and all the code related to it is fine as it is.

DhanshreeA commented 2 months ago

https://github.com/kurysauce/ersilia_test/blob/master/ersilia/test_example.py

Hey @kurysauce good effort on writing a test for the ExampleGenerator class, however there's a major fallacy in your test for checking predefined_example, you're defining PREDEFINED_EXAMPLE_FILES as [example1.txt, example2.txt] whereas the method predefined_example looks at these PREDEFINED_EXAMPLE_FILES :

PREDEFINED_EXAMPLE_FILES = [
    "model/framework/examples/input.csv",
    "model/framework/input.csv",
    "model/framework/example.csv",
    "example.csv",
]

ie your overwriting of PREDEFINED_EXAMPLE_FILES variable does not change it in the scope where the method predefined_example is called.

DhanshreeA commented 2 months ago

Testing model on input of 5 smiles given by 'example' command...

Hey @kurysauce, I think I understand what's going on, let's look at this step by step. We see in the logs that you shared, there's a line that says:

> Testing model on input of 5 smiles given by 'example' command...

Now if you look for this line in the codebase, you'd find it here. This gives us some clues! You'll notice that this line is after the call to ExampleGenerator.example, ie, that part is not what's raising the error, but it's possibly coming from its result. If we inspect this method, we can see that it returns nothing, however we are still trying to capture its return value in input, which of course then gets set to None. This method only has a side effect in that it creates an input.csv in the current working directory but does not return anything. For being able to do anything useful with this method, we have to return from the method, which I see that you have already done in your fork. At this point, for me, the following checks work:

        self.check_information(output_file)
        self.check_single_input(output_file)
        self.check_example_input(output_file)

The next one, check_consistent_output also suffers from an incorrect call to ExampleGenerator.example, which you should be able to fix. I think it gets more interesting from there - keep us posted!

kurysauce commented 2 months ago

Running into a raised error in the run_bash script: Check halted. Either run.sh file does not exist, or model was not fetched via --from_github or --from_s3.

I might be cd'd into the wrong directory, leading to incorrect relative path resolution. The base directory returned by self.conda_prefix(self.is_base()) might not be as expected, causing the relative path ../eos/dest/{model_id}/model/framework/run.sh to point to a non-existent location. I am getting 0 as the size and the output is:

Model path does not exist: /opt/homebrew/../eos/dest/eos1pu1

Model Size:
KB: 0.0
MB: 0.0
GB: 0.0

Running the model bash script...
Checking if run.sh exists at: /opt/homebrew/../eos/dest/eos1pu1/model/framework/run.sh
Check halted. Either run.sh file does not exist, or model was not fetched via --from_github or --from_s3.
(ersilia) kurtenriquez@1KurtsMacbookProM1-4 eos %

kurysauce commented 2 months ago

the tests:

self.check_information(output_file)
        self.check_single_input(output_file)
        self.check_example_input(output_file)
        self.check_consistent_output()

are working for me locally. Can others verify?

miquelduranfrigola commented 2 months ago

Thanks @kurysauce ! Should I use any specific branch?

DhanshreeA commented 2 months ago

Hi @kurysauce - I'll take a look into this, but could you also update your fork against the latest code in ersilia, there's a slim chance that it might fix the issue?

kurysauce commented 2 months ago

Thanks @kurysauce ! Should I use any specific branch?

Link to Fork https://github.com/kurysauce/ersilia_test/blob/master/ersilia/publish/test.py

kurysauce commented 2 months ago

Hi @kurysauce - I'll take a look into this, but could you also update your fork against the latest code in ersilia, there's a slim chance that it might fix the issue?

Thanks @DhanshreeA! I synced my fork but the issue still remains. I just had my meeting with Miquel and we discussed that the problem is this line of code: https://github.com/kurysauce/ersilia_test/blob/85f4e9df8ce47ec472ea3bc6d903308cf7eff652/ersilia/publish/test.py#L542C15-L543C57 in the run_bash test is not navigating the path to the model correctly (same issue with finding the path to the run.sh file.

Additionally, we also discussed that the method of calculating the model size is incorrect—the size of the model parameters and the dependencies must be calculated through navigating the path folders. This is what I'm currently working on right now!

kurysauce commented 2 months ago

Hi @miquelduranfrigola, a quick question.

The paths eos/dest/"model_id" and eos/repository/"model_id" , have no subfolders labeled frameworks or checkpoints, like what we discussed yesterday in our meeting. But, the subfolders exist in my local download of the model, just not the eos folder. How should I resolve this discrepancy? I have the output attached on slack.

miquelduranfrigola commented 2 months ago

Mmm... this is interesting. And you fetched the model successfully before testing?

kurysauce commented 2 months ago

@miquelduranfrigola @DhanshreeA Yes, I was able to fetch and serve the model before testing. From Dhanshree's message on Slack, it seems that the test command tests "a model that has been specified through repo_path". I've been fetching models without using the repo_path (specifically ersilia fetch eos1pu1 and eos7yti).

Should the test command parsing be different then, perhaps navigating the directories starting with model_id/model/framework and model_id/model/checkpoints, similar to how we write the main.py script (https://github.com/kurysauce/eos1pu1/blob/320af07082e9a8d253fb932f8247503fad561631/model/framework/code/main.py#L14C1-L17C68) for model incorporation? Further clarification would be appreciated!

kurysauce commented 2 months ago

@miquelduranfrigola @DhanshreeA Yes, I was able to fetch and serve the model before testing. From Dhanshree's message on Slack, it seems that the test command tests "a model that has been specified through repo_path". I've been fetching models without using the repo_path (specifically ersilia fetch eos1pu1 and eos7yti).

Should the test command parsing be different then, perhaps navigating the directories starting with model_id/model/framework and model_id/model/checkpoints, similar to how we write the main.py script (https://github.com/kurysauce/eos1pu1/blob/320af07082e9a8d253fb932f8247503fad561631/model/framework/code/main.py#L14C1-L17C68) for model incorporation? Further clarification would be appreciated!

I went ahead and created an additional path parsing to be specified using the repo_path in the code. I am assuming that the user is following the Ersilia Book Template of specifying the path to be --repo_path ~/Desktop/"MODEL_ID, if I am understanding correctly.

Additionally, I added more logging info of the count and types of the extensions in both the checkpoints and framework folder. @miquelduranfrigola , I recalculated the model size by parsing through both folders and adding their sums together in by calling the analyze_files function on each subfolder..

With the modified paths, the run.sh script is able to run. However, there is an error I am getting:

Ersilia run completed VALUES: ['GEFQWZLICWMTKF-CDUCUWFYSA-N', 'CC@H C@Hc1ccc(O)c(O)c1', '0.7299036850891121', '1.0'] ['Float'] 🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

could not convert string to float: 'GEFQWZLICWMTKF-CDUCUWFYSA-N'

Since the Ersilia run is completed and the next line that is called is ersilia_run = self.read_csv(output_file) I am assuming there is a problem reading in the output file. One thing that I notice is that the VALUE should be [Probability, Prediction] (both of type Float). But, I am seeing strings of the molecule names. Attatched is the log for reference. Any feedback will be appreciated!

(ersilia) kurtenriquez@1KurtsMacbookProM1-4 framework % ersilia -v test eos1pu1 14:57:58 | DEBUG | Reading model information from /Users/kurtenriquez/eos/dest/eos1pu1/information.json 14:57:58 | DEBUG | Reading model information from /Users/kurtenriquez/eos/dest/eos1pu1/information.json Calculating model size and checking model path validity... 14:57:58 | DEBUG | Size of 'checkpoints' subfolder: 2961606 bytes 14:57:58 | DEBUG | File types & count in checkpoints folder: defaultdict(<class 'int'>, {'.sav': 1, '.md': 1, '.pkl': 1}) 14:57:58 | DEBUG | Size of frameworks folder: 19093 bytes 14:57:58 | DEBUG | File types & count in frameworks folder: defaultdict(<class 'int'>, {'.csv': 2, '': 1, '.sh': 1, '.md': 1, '.py': 3, '.pyc': 2})

Model Size: KB: 2980699 MB: 2910.8388671875 GB: 2.842616081237793

Running the model bash script... 14:57:59 | DEBUG | Reading card from eos1pu1 14:57:59 | DEBUG | Reading shape from eos1pu1 14:57:59 | DEBUG | Input Shape: None 14:57:59 | DEBUG | Input type is: compound 14:57:59 | DEBUG | Input shape is: Single 14:57:59 | DEBUG | Importing module: .types.compound 14:57:59 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs Requirement already satisfied: chembl_webresource_client in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (0.10.9) Requirement already satisfied: urllib3 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.2.2) Requirement already satisfied: requests>=2.18.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.31.0) Requirement already satisfied: requests-cache~=1.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.2.1) Requirement already satisfied: easydict in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.13) Requirement already satisfied: charset-normalizer<4,>=2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.7) Requirement already satisfied: certifi>=2017.4.17 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (2024.7.4) Requirement already satisfied: attrs>=21.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.0) Requirement already satisfied: cattrs>=22.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.3) Requirement already satisfied: platformdirs>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (4.2.2) Requirement already satisfied: url-normalize>=1.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (1.4.3) Requirement already satisfied: exceptiongroup>=1.1.1 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (1.2.2) Requirement already satisfied: typing-extensions!=4.6.3,>=4.1.0 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (4.12.2) Requirement already satisfied: six in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from url-normalize>=1.4->requests-cache~=1.2->chembl_webresource_client) (1.16.0) 14:58:01 | DEBUG | InputShapeSingle shape: Single 14:58:01 | DEBUG | Randomly sampling input Checking if run.sh exists at: /Users/kurtenriquez/Desktop/eos1pu1/model/framework/run.sh 14:58:01 | DEBUG | Changing directory to: /Users/kurtenriquez/Desktop/eos1pu1/model/framework Executing 'bash run.sh'... Error encountered while running the bash script. Captured Output:

Captured Error: Traceback (most recent call last): File "/Users/kurtenriquez/Desktop/eos1pu1/model/framework/./code/main.py", line 3, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

Executing ersilia run... 14:58:03 | DEBUG | Getting session from /Users/kurtenriquez/eos/sessions/session_51048/session.json 14:58:03 | WARNING | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently 14:58:03 | ERROR | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately. 14:58:05 | DEBUG | Is fetched: True 14:58:05 | DEBUG | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json 14:58:05 | DEBUG | Setting BentoML AutoService for eos1pu1 14:58:05 | INFO | Service class provided 14:58:05 | DEBUG | Using port 62321 14:58:05 | DEBUG | Starting Docker Daemon service 14:58:05 | DEBUG | Creating container tmp logs folder /Users/kurtenriquez/eos/sessions/session_51048/_logs/tmp and mounting as volume in container 14:58:05 | DEBUG | Image ersiliaos/eos1pu1:latest is available locally 14:58:05 | DEBUG | Using port 62322 14:58:05 | DEBUG | Starting Docker Daemon service 14:58:05 | DEBUG | Creating container tmp logs folder /Users/kurtenriquez/eos/sessions/session_51048/_logs/tmp and mounting as volume in container 14:58:05 | INFO | Done with initialization! 14:58:05 | INFO | Starting runner 14:58:05 | DEBUG | Trying standard API 14:58:05 | INFO | You are running the app with a standard runner. Beware that this runner does not do as many checks on the input as the conventional runner: use it at your own risk. 14:58:05 | DEBUG | Standard API processor started at http://0.0.0.0:59787 14:58:05 | DEBUG | This is the input type: ['Compound'] 14:58:05 | DEBUG | This is the expected header (max 10): ['key', 'input', 'Probability', 'Prediction'] 14:58:05 | DEBUG | Standard CSV Api runner is not amenable for this model, input and output 14:58:05 | DEBUG | Trying conventional run 14:58:06 | DEBUG | Reading card from eos1pu1 14:58:06 | DEBUG | Reading shape from eos1pu1 14:58:06 | DEBUG | Input Shape: None 14:58:06 | DEBUG | Input type is: compound 14:58:06 | DEBUG | Input shape is: Single 14:58:06 | DEBUG | Importing module: .types.compound 14:58:06 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs Requirement already satisfied: chembl_webresource_client in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (0.10.9) Requirement already satisfied: urllib3 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.2.2) Requirement already satisfied: requests>=2.18.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.31.0) Requirement already satisfied: requests-cache~=1.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.2.1) Requirement already satisfied: easydict in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.13) Requirement already satisfied: charset-normalizer<4,>=2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.7) Requirement already satisfied: certifi>=2017.4.17 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (2024.7.4) Requirement already satisfied: attrs>=21.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.0) Requirement already satisfied: cattrs>=22.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.3) Requirement already satisfied: platformdirs>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (4.2.2) Requirement already satisfied: url-normalize>=1.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (1.4.3) Requirement already satisfied: exceptiongroup>=1.1.1 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (1.2.2) Requirement already satisfied: typing-extensions!=4.6.3,>=4.1.0 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (4.12.2) Requirement already satisfied: six in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from url-normalize>=1.4->requests-cache~=1.2->chembl_webresource_client) (1.16.0) 14:58:08 | DEBUG | InputShapeSingle shape: Single 14:58:08 | DEBUG | Expected number: 1 14:58:08 | DEBUG | Entity is list: False 14:58:08 | DEBUG | Resolving columns 14:58:08 | DEBUG | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None} 14:58:08 | DEBUG | Candidate header is ['smilesCN[C@@H]1Cc2cccc3[nH]c(=O)n(C1)c23'] 14:58:08 | DEBUG | Matching for input is [0] 14:58:08 | DEBUG | Has header True 14:58:08 | DEBUG | Schema {'input': [0], 'key': None} 14:58:08 | DEBUG | Standardizing input single 14:58:08 | DEBUG | Writing standardized input to /var/folders/d9/g2m8l123zgj7vpqbnhsm3c0000gn/T/ersilia-aqhf8x7i/standard_input_file.csv 14:58:08 | DEBUG | Reading standard file from /var/folders/d9/g2m8l123zgj7vpqbnhsm3c0000gn/T/ersilia-aqhf8x7i/standard_input_file.csv 14:58:08 | DEBUG | File has 5 lines 14:58:08 | DEBUG | No file splitting necessary! 14:58:08 | DEBUG | Reading card from eos1pu1 14:58:08 | DEBUG | Reading shape from eos1pu1 14:58:08 | DEBUG | Input Shape: None 14:58:08 | DEBUG | Input type is: compound 14:58:08 | DEBUG | Input shape is: Single 14:58:08 | DEBUG | Importing module: .types.compound 14:58:08 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs Requirement already satisfied: chembl_webresource_client in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (0.10.9) Requirement already satisfied: urllib3 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.2.2) Requirement already satisfied: requests>=2.18.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (2.31.0) Requirement already satisfied: requests-cache~=1.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.2.1) Requirement already satisfied: easydict in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from chembl_webresource_client) (1.13) Requirement already satisfied: charset-normalizer<4,>=2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (3.7) Requirement already satisfied: certifi>=2017.4.17 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests>=2.18.4->chembl_webresource_client) (2024.7.4) Requirement already satisfied: attrs>=21.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.0) Requirement already satisfied: cattrs>=22.2 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (23.2.3) Requirement already satisfied: platformdirs>=2.5 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (4.2.2) Requirement already satisfied: url-normalize>=1.4 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from requests-cache~=1.2->chembl_webresource_client) (1.4.3) Requirement already satisfied: exceptiongroup>=1.1.1 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (1.2.2) Requirement already satisfied: typing-extensions!=4.6.3,>=4.1.0 in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache~=1.2->chembl_webresource_client) (4.12.2) Requirement already satisfied: six in /Users/kurtenriquez/miniconda3/envs/ersilia/lib/python3.10/site-packages (from url-normalize>=1.4->requests-cache~=1.2->chembl_webresource_client) (1.16.0) 14:58:10 | DEBUG | InputShapeSingle shape: Single 14:58:10 | WARNING | Could not resolve pack method 14:58:10 | DEBUG | API eos1pu1:run initialized at URL http://0.0.0.0:59787 14:58:10 | DEBUG | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json 14:58:10 | DEBUG | Posting to run 14:58:10 | DEBUG | Batch size 100 14:58:10 | DEBUG | Expected number: 1 14:58:10 | DEBUG | Entity is list: False 14:58:10 | DEBUG | Resolving columns 14:58:10 | DEBUG | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None} 14:58:10 | DEBUG | Candidate header is ['smilesCN[C@@H]1Cc2cccc3[nH]c(=O)n(C1)c23'] 14:58:10 | DEBUG | Matching for input is [0] 14:58:10 | DEBUG | Has header True 14:58:10 | DEBUG | Schema {'input': [0], 'key': None} 14:58:10 | DEBUG | Standardizing input single 14:58:10 | DEBUG | Writing standardized input to /var/folders/d9/g2m8l123zgj7vpqbnhsm3c0000gn/T/ersilia-71o3up2z/standard_input_file.csv 14:58:10 | DEBUG | Reading standard file from /var/folders/d9/g2m8l123zgj7vpqbnhsm3c0000gn/T/ersilia-71o3up2z/standard_input_file.csv 14:58:10 | DEBUG | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json 14:58:30 | DEBUG | Status code: 200 14:58:30 | DEBUG | Schema available in /Users/kurtenriquez/eos/dest/eos1pu1/api_schema.json 14:58:30 | DEBUG | Done with unique posting 14:58:32 | DEBUG | Data: outcome 14:58:32 | DEBUG | Values: [0.6342247198070918, 0.0] 14:58:32 | DEBUG | Getting pure dtype for outcome 14:58:32 | DEBUG | This is the pure datatype: numeric_array 14:58:32 | DEBUG | Datatype: numeric_array 14:58:32 | DEBUG | Datatype has been matched: numeric_array over {'array', 'numeric_array', 'mixed_array', 'string_array'} 14:58:32 | DEBUG | No merge key 14:58:32 | DEBUG | [0.6342247198070918, 0.0] 14:58:32 | DEBUG | numeric_array 14:58:32 | DEBUG | outcome 14:58:32 | DEBUG | [0.613037498439379, 0.0] 14:58:32 | DEBUG | numeric_array 14:58:32 | DEBUG | outcome 14:58:32 | DEBUG | [0.6505330763410392, 1.0] 14:58:32 | DEBUG | numeric_array 14:58:32 | DEBUG | outcome 14:58:32 | DEBUG | [0.6142044909803329, 0.0] 14:58:32 | DEBUG | numeric_array 14:58:32 | DEBUG | outcome Ersilia run completed!

VALUES: ['QGWIQIAWOCJRPI-WSCVWKGISA-N', 'COc1ccc2cc(ccc2c1)S(=O)(=O)NC@Hcc1)C(=O)N(C)C(C)C)c1ccc2OCOc2c1', '0.6342247198070918', '0.0'] ['Float'] 🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

could not convert string to float: 'QGWIQIAWOCJRPI-WSCVWKGISA-N' If this error message is not helpful, open an issue at:

https://github.com/ersilia-os/ersilia Or feel free to reach out to us at:

hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)

You will find the console log file in: /Users/kurtenriquez/eos/current.log (ersilia) kurtenriquez@1KurtsMacbookProM1-4 framework %

miquelduranfrigola commented 2 months ago

Hello @kurysauce

Quick answer. Indeed, there is a problem with parsing the output file. The first two columns correspond to the molecule InChIKey (column names key) and molecule SMILES (column named input). Therefore, we need to read from the third column onwards. I hope this makes sense?

kurysauce commented 2 months ago

Hello @kurysauce

Quick answer. Indeed, there is a problem with parsing the output file. The first two columns correspond to the molecule InChIKey (column names key) and molecule SMILES (column named input). Therefore, we need to read from the third column onwards. I hope this makes sense?

Great thanks @miquelduranfrigola! I would like to clarify two questions:

Should I implement the EOS paths we discussed Monday or stick with the repo_path I currently have implemented?

I fetched using the repo path flag, but the code for the EOS path still did not have the subfolders.

Since the output path is not being parsed correctly, would it be effective to edit the read_csv function to read from the third column on? I hesitate because this would affect other model outputs being read vs the current one I am testing (eos1pu1).

kurysauce commented 2 months ago

For updates, I explored hard-coding the read_csv function to skip to the 3rd column and the error was resolved. However, there is an error with executing the bash script that I believe is initialized in this section.

The error yields:

Ersilia run completed!
TESTING...
VALUES: ['0.836212474909928', '1.0']
['Float']
VALUES: ['0.7751439196065594', '1.0']
['Float']
VALUES: ['0.6591778773800523', '1.0']
['Float']
VALUES: ['0.5331598029304332', '0.0']
['Float']
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

[Errno 2] No such file or directory: '/var/folders/d9/g2m8l__123zgj7vpqbnhsm3c0000gn/T/tmpc8mlip5z/bash_output.csv'

I verified that the bash_output.csv path is the same as the existing temporary files in the logs. I believe the error is in two places: 1) accessing the conda.sh file from the line source {0}/etc/profile.d/conda.sh 2) Importing pandas in the main.py (from model eos1pu1)

I am not sure how to fix the access of the conda.sh file since it is being created in a temp directory. I verified that the model eos1pu1 is able to run and andas was installed in the dockerfile, so I am not sure why the second error is appearing. Ersilia run was successful, not the bash script.

Relevant Log Output:


Executing 'bash run.sh'...
Error encountered while running the bash script.
14:50:34 | DEBUG    | STDOUT: 
14:50:34 | DEBUG    | STDERR: /var/folders/d9/g2m8l__123zgj7vpqbnhsm3c0000gn/T/tmpc8mlip5z/script.sh: line 2: /opt/homebrew/etc/profile.d/conda.sh: No such file or directory

CondaError: Run 'conda init' before 'conda activate'

CondaError: Run 'conda init' before 'conda deactivate'

Captured Output:

Captured Error:
Traceback (most recent call last):
  File "/Users/kurtenriquez/Desktop/eos1pu1/model/framework/./code/main.py", line 3, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

kurysauce commented 2 months ago

For updates, I explored hard-coding the read_csv function to skip to the 3rd column and the error was resolved. However, there is an error with executing the bash script that I believe is initialized in this section.

The error yields:
Ersilia run completed!
TESTING...
VALUES: ['0.836212474909928', '1.0']
['Float']
VALUES: ['0.7751439196065594', '1.0']
['Float']
VALUES: ['0.6591778773800523', '1.0']
['Float']
VALUES: ['0.5331598029304332', '0.0']
['Float']
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

[Errno 2] No such file or directory: '/var/folders/d9/g2m8l__123zgj7vpqbnhsm3c0000gn/T/tmpc8mlip5z/bash_output.csv'
I verified that the bash_output.csv path is the same as the existing temporary files in the logs. I believe the error is in two places:

accessing the conda.sh file from the line source {0}/etc/profile.d/conda.sh

Importing pandas in the main.py (from model eos1pu1)

I am not sure how to fix the access of the conda.sh file since it is being created in a temp directory. I verified that the model eos1pu1 is able to run and andas was installed in the dockerfile, so I am not sure why the second error is appearing. Ersilia run was successful, not the bash script.

Relevant Log Output:
Executing 'bash run.sh'...
Error encountered while running the bash script.
14:50:34 | DEBUG    | STDOUT: 
14:50:34 | DEBUG    | STDERR: /var/folders/d9/g2m8l__123zgj7vpqbnhsm3c0000gn/T/tmpc8mlip5z/script.sh: line 2: /opt/homebrew/etc/profile.d/conda.sh: No such file or directory

CondaError: Run 'conda init' before 'conda activate'

CondaError: Run 'conda init' before 'conda deactivate'

Captured Output:

Captured Error:
Traceback (most recent call last):
  File "/Users/kurtenriquez/Desktop/eos1pu1/model/framework/./code/main.py", line 3, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

I tried fixing the first two lines of the bash script using:

source {0}$CONDA_PREFIX/etc/profile.d/conda.sh conda init {1}

but this did not work ask well.

kurysauce commented 1 month ago

Updated Logs for empty output errors, logs indicate existing output:

Model Size:
KB: 994239493
MB: 970937.0048828125
GB: 948.1806688308716
05:06:43 | DEBUG    | Sizes of directories:
05:06:43 | DEBUG    | dest_size: 942221 bytes
05:06:43 | DEBUG    | bundle_size: 2994988 bytes
05:06:43 | DEBUG    | bentoml_size: 2994988 bytes
05:06:43 | DEBUG    | env_size: 987307296 bytes

Running the model bash script...
05:06:44 | DEBUG    | Reading card from eos1pu1
05:06:44 | DEBUG    | Reading shape from eos1pu1
05:06:44 | DEBUG    | Input Shape: None
05:06:44 | DEBUG    | Input type is: compound
05:06:44 | DEBUG    | Input shape is: Single
05:06:44 | DEBUG    | Importing module: .types.compound
05:06:44 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
05:06:44 | DEBUG    | InputShapeSingle shape: Single
05:06:44 | DEBUG    | Randomly sampling input
Checking if run.sh exists at: /root/eos/dest/eos1pu1/model/framework/run.sh
run.sh exists!
05:06:44 | DEBUG    | Changing directory to: /root/eos/dest/eos1pu1/model/framework
05:06:44 | DEBUG    | Script path: /tmp/tmph7i204kx/script.sh
05:06:44 | DEBUG    | bash output path: /tmp/tmph7i204kx/bash_output.csv
05:06:44 | DEBUG    | Output log path: /tmp/tmph7i204kx/output.txt
05:06:44 | DEBUG    | Error log path: /tmp/tmph7i204kx/error.txt
Executing 'bash run.sh'...
Bash execution completed!

Captured Bash Output:
Probability,Prediction
0.7998176704883471,1
0.74573027133634,1
0.45618433101799605,0
0.9222907483436306,1

Captured Error:

Executing ersilia run...
Ersilia output will be written to: /tmp/tmph7i204kx/ersilia_output.csv
05:06:47 | DEBUG    | Getting session from /root/eos/sessions/session_32279/session.json
05:06:47 | WARNING  | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
05:06:47 | ERROR    | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
05:06:47 | DEBUG    | Is fetched: True
05:06:47 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
05:06:47 | DEBUG    | Setting BentoML AutoService for eos1pu1
05:06:47 | INFO     | Service class provided
05:06:47 | DEBUG    | Pack method is: bentoml
05:06:48 | DEBUG    | Pack method is: bentoml
05:06:48 | INFO     | Done with initialization!
05:06:48 | INFO     | Starting runner
05:06:48 | DEBUG    | Trying standard API
05:06:48 | INFO     | You are running the app with a standard runner. Beware that this runner does not do as many checks on the input as the conventional runner: use it at your own risk.
05:06:48 | DEBUG    | Standard API processor started at http://127.0.0.1:53515
05:06:48 | DEBUG    | This is the input type: ['Compound']
05:06:48 | DEBUG    | This is the expected header (max 10): ['key', 'input', 'Probability', 'Prediction']
05:06:48 | DEBUG    | Standard CSV Api runner is not amenable for this model, input and output
05:06:48 | DEBUG    | Trying conventional run
05:06:48 | DEBUG    | Reading card from eos1pu1
05:06:48 | DEBUG    | Reading shape from eos1pu1
05:06:48 | DEBUG    | Input Shape: None
05:06:48 | DEBUG    | Input type is: compound
05:06:48 | DEBUG    | Input shape is: Single
05:06:48 | DEBUG    | Importing module: .types.compound
05:06:48 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
05:06:48 | DEBUG    | InputShapeSingle shape: Single
05:06:48 | DEBUG    | Expected number: 1
05:06:48 | DEBUG    | Entity is list: False
05:06:48 | DEBUG    | Resolving columns
05:06:48 | DEBUG    | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None}
05:06:48 | DEBUG    | Candidate header is ['smiles[H][C@@]12COCCN1C(=O)c1c(O)c(=O)ccn1N2[C@@H]1c2ccccc2SCc2c(F)c(F)ccc12']
05:06:48 | DEBUG    | Matching for input is [0]
05:06:48 | DEBUG    | Has header True
05:06:48 | DEBUG    | Schema {'input': [0], 'key': None}
05:06:48 | DEBUG    | Standardizing input single
05:06:48 | DEBUG    | Writing standardized input to /tmp/ersilia-z66c7632/standard_input_file.csv
05:06:48 | DEBUG    | Reading standard file from /tmp/ersilia-z66c7632/standard_input_file.csv
05:06:48 | DEBUG    | File has 5 lines
05:06:48 | DEBUG    | No file splitting necessary!
05:06:49 | DEBUG    | Reading card from eos1pu1
05:06:49 | DEBUG    | Reading shape from eos1pu1
05:06:49 | DEBUG    | Input Shape: None
05:06:49 | DEBUG    | Input type is: compound
05:06:49 | DEBUG    | Input shape is: Single
05:06:49 | DEBUG    | Importing module: .types.compound
05:06:49 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
05:06:49 | DEBUG    | InputShapeSingle shape: Single
05:06:49 | DEBUG    | API eos1pu1:run initialized at URL http://127.0.0.1:53515
05:06:49 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
05:06:49 | DEBUG    | Posting to run
05:06:49 | DEBUG    | Batch size 100
05:06:49 | DEBUG    | Expected number: 1
05:06:49 | DEBUG    | Entity is list: False
05:06:49 | DEBUG    | Resolving columns
05:06:49 | DEBUG    | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None}
05:06:49 | DEBUG    | Candidate header is ['smiles[H][C@@]12COCCN1C(=O)c1c(O)c(=O)ccn1N2[C@@H]1c2ccccc2SCc2c(F)c(F)ccc12']
05:06:49 | DEBUG    | Matching for input is [0]
05:06:49 | DEBUG    | Has header True
05:06:49 | DEBUG    | Schema {'input': [0], 'key': None}
05:06:49 | DEBUG    | Standardizing input single
05:06:49 | DEBUG    | Writing standardized input to /tmp/ersilia-9okgnmf4/standard_input_file.csv
05:06:49 | DEBUG    | Reading standard file from /tmp/ersilia-9okgnmf4/standard_input_file.csv
05:06:49 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
05:06:55 | DEBUG    | Status code: 200
05:06:55 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
05:06:55 | DEBUG    | Done with unique posting
05:06:56 | DEBUG    | Data: outcome
05:06:56 | DEBUG    | Values: [0.7998176704883474, 1.0]
05:06:56 | DEBUG    | Getting pure dtype for outcome
05:06:56 | DEBUG    | This is the pure datatype: numeric_array
05:06:56 | DEBUG    | Datatype: numeric_array
05:06:56 | DEBUG    | Datatype has been matched: numeric_array over {'mixed_array', 'array', 'string_array', 'numeric_array'}
05:06:56 | DEBUG    | No merge key
05:06:56 | DEBUG    | [0.7998176704883474, 1.0]
05:06:56 | DEBUG    | numeric_array
05:06:56 | DEBUG    | outcome
05:06:56 | DEBUG    | [0.74573027133634, 1.0]
05:06:56 | DEBUG    | numeric_array
05:06:56 | DEBUG    | outcome
05:06:56 | DEBUG    | [0.4561843310179961, 0.0]
05:06:56 | DEBUG    | numeric_array
05:06:56 | DEBUG    | outcome
05:06:56 | DEBUG    | [0.9222907483436306, 1.0]
05:06:56 | DEBUG    | numeric_array
05:06:56 | DEBUG    | outcome
Ersilia run completed!

Captured Ersilia Output:
key,input,Probability,Prediction
HOBAELRKJCKHQD-QNEBEIHSSA-N,CCCCC\C=C/C\C=C/C\C=C/CCCCCCC(O)=O,0.7998176704883474,1.0
YKJYKKNCCRKFSL-RDBSUJKOSA-N,COc1ccc(C[C@H]2NC[C@H](O)[C@H]2OC(C)=O)cc1,0.74573027133634,1.0
MNHNIVNAFBSLLX-UHFFFAOYSA-N,Cc1cccc(C)c1N(CC(=O)Nc1ccc(cc1)-c1ncon1)C(=O)C1CCS(=O)(=O)CC1,0.4561843310179961,0.0
YASBOGFWAMXINH-TZMCWYRMSA-N,CN1CC[C@@H]2CN3CCc4cccc([C@@H]2C1)c34,0.9222907483436306,1.0

05:06:56 | DEBUG    | Captured Ersilia Output:
05:06:56 | DEBUG    | key,input,Probability,Prediction
HOBAELRKJCKHQD-QNEBEIHSSA-N,CCCCC\C=C/C\C=C/C\C=C/CCCCCCC(O)=O,0.7998176704883474,1.0
YKJYKKNCCRKFSL-RDBSUJKOSA-N,COc1ccc(C[C@H]2NC[C@H](O)[C@H]2OC(C)=O)cc1,0.74573027133634,1.0
MNHNIVNAFBSLLX-UHFFFAOYSA-N,Cc1cccc(C)c1N(CC(=O)Nc1ccc(cc1)-c1ncon1)C(=O)C1CCS(=O)(=O)CC1,0.4561843310179961,0.0
YASBOGFWAMXINH-TZMCWYRMSA-N,CN1CC[C@@H]2CN3CCc4cccc([C@@H]2C1)c34,0.9222907483436306,1.0

TESTING...
VALUES: ['0.7998176704883474', '1.0']
['Float']
VALUES: ['0.74573027133634', '1.0']
['Float']
VALUES: ['0.4561843310179961', '0.0']
['Float']
VALUES: ['0.9222907483436306', '1.0']
['Float']
Ersilia run after read_csv :
 [{'key': 0.7998176704883474, 'input': 1.0}, {'key': 0.74573027133634, 'input': 1.0}, {'key': 0.4561843310179961, 'input': 0.0}, {'key': 0.9222907483436306, 'input': 1.0}]
TESTING...
VALUES: []
['Float']
VALUES: []
['Float']
VALUES: []
['Float']
VALUES: []
['Float']
Bash output:
 [{}, {}, {}, {}]

Ersilia output:
 [{}, {}, {}, {}]

 Ersilia columns:  set()

 Bash columns:  set()
Common columns: set() 

SUCCESS! Bash run and Ersilia run produce consistent results.

miquelduranfrigola commented 1 month ago

This is looking good, @kurysauce - Tagging @DhanshreeA so she is in the loop

kurysauce commented 1 month ago

For updates, implemented a potential fix called updated_read_csv, where we just extract the last two columns of the bash/ersilia run and their values since I believe this is what we care about. This is still a work in progress (as only the first values of the last two columns are processed), but I do believe that the original read_csvimplementation was the issue.

Log Output:

Running the model bash script...
01:14:47 | DEBUG    | Reading card from eos1pu1
01:14:47 | DEBUG    | Reading shape from eos1pu1
01:14:47 | DEBUG    | Input Shape: None
01:14:47 | DEBUG    | Input type is: compound
01:14:47 | DEBUG    | Input shape is: Single
01:14:47 | DEBUG    | Importing module: .types.compound
01:14:47 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:14:47 | DEBUG    | InputShapeSingle shape: Single
01:14:47 | DEBUG    | Randomly sampling input
Checking if run.sh exists at: /root/eos/dest/eos1pu1/model/framework/run.sh
run.sh exists!
01:14:47 | DEBUG    | Changing directory to: /root/eos/dest/eos1pu1/model/framework
01:14:47 | DEBUG    | Script path: /tmp/tmpxju9ed25/script.sh
01:14:47 | DEBUG    | bash output path: /tmp/tmpxju9ed25/bash_output.csv
01:14:47 | DEBUG    | Output log path: /tmp/tmpxju9ed25/output.txt
01:14:47 | DEBUG    | Error log path: /tmp/tmpxju9ed25/error.txt
Executing 'bash run.sh'...
Bash execution completed! Return code: 0 

Captured Raw Bash Output:
Probability,Prediction
0.9168235886886539,1
0.58042319950123,0
0.529730521761002,0
0.1026072802857601,0

Captured Error:

Executing ersilia run...
01:14:51 | DEBUG    | Ersilia output will be written to: /tmp/tmpxju9ed25/ersilia_output.csv
01:14:51 | DEBUG    | Getting session from /root/eos/sessions/session_564/session.json
01:14:51 | WARNING  | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
01:14:51 | ERROR    | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
01:14:51 | DEBUG    | Is fetched: True
01:14:51 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:51 | DEBUG    | Setting BentoML AutoService for eos1pu1
01:14:51 | INFO     | Service class provided
01:14:51 | DEBUG    | Pack method is: bentoml
01:14:51 | DEBUG    | Pack method is: bentoml
01:14:51 | INFO     | Done with initialization!
01:14:51 | INFO     | Starting runner
01:14:51 | DEBUG    | Trying standard API
01:14:51 | INFO     | You are running the app with a standard runner. Beware that this runner does not do as many checks on the input as the conventional runner: use it at your own risk.
01:14:51 | DEBUG    | Standard API processor started at http://127.0.0.1:38873
01:14:51 | DEBUG    | This is the input type: ['Compound']
01:14:51 | DEBUG    | This is the expected header (max 10): ['key', 'input', 'Probability', 'Prediction']
01:14:51 | DEBUG    | Standard CSV Api runner is not amenable for this model, input and output
01:14:51 | DEBUG    | Trying conventional run
01:14:52 | DEBUG    | Reading card from eos1pu1
01:14:52 | DEBUG    | Reading shape from eos1pu1
01:14:52 | DEBUG    | Input Shape: None
01:14:52 | DEBUG    | Input type is: compound
01:14:52 | DEBUG    | Input shape is: Single
01:14:52 | DEBUG    | Importing module: .types.compound
01:14:52 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:14:52 | DEBUG    | InputShapeSingle shape: Single
01:14:52 | DEBUG    | Expected number: 1
01:14:52 | DEBUG    | Entity is list: False
01:14:52 | DEBUG    | Resolving columns
01:14:52 | DEBUG    | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None}
01:14:52 | DEBUG    | Candidate header is ['smilesCCS(=O)(=O)N(C)[C@@H]1[C@@H](O)C(C)(C)Oc2ccc(cc12)C#N']
01:14:52 | DEBUG    | Matching for input is [0]
01:14:52 | DEBUG    | Has header True
01:14:52 | DEBUG    | Schema {'input': [0], 'key': None}
01:14:52 | DEBUG    | Standardizing input single
01:14:52 | DEBUG    | Writing standardized input to /tmp/ersilia-77jyyqf9/standard_input_file.csv
01:14:52 | DEBUG    | Reading standard file from /tmp/ersilia-77jyyqf9/standard_input_file.csv
01:14:52 | DEBUG    | File has 5 lines
01:14:52 | DEBUG    | No file splitting necessary!
01:14:53 | DEBUG    | Reading card from eos1pu1
01:14:53 | DEBUG    | Reading shape from eos1pu1
01:14:53 | DEBUG    | Input Shape: None
01:14:53 | DEBUG    | Input type is: compound
01:14:53 | DEBUG    | Input shape is: Single
01:14:53 | DEBUG    | Importing module: .types.compound
01:14:53 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:14:53 | DEBUG    | InputShapeSingle shape: Single
01:14:53 | DEBUG    | API eos1pu1:run initialized at URL http://127.0.0.1:38873
01:14:53 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:53 | DEBUG    | Posting to run
01:14:53 | DEBUG    | Batch size 100
01:14:53 | DEBUG    | Expected number: 1
01:14:53 | DEBUG    | Entity is list: False
01:14:53 | DEBUG    | Resolving columns
01:14:53 | DEBUG    | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None}
01:14:53 | DEBUG    | Candidate header is ['smilesCCS(=O)(=O)N(C)[C@@H]1[C@@H](O)C(C)(C)Oc2ccc(cc12)C#N']
01:14:53 | DEBUG    | Matching for input is [0]
01:14:53 | DEBUG    | Has header True
01:14:53 | DEBUG    | Schema {'input': [0], 'key': None}
01:14:53 | DEBUG    | Standardizing input single
01:14:53 | DEBUG    | Writing standardized input to /tmp/ersilia-pjmauqy3/standard_input_file.csv
01:14:53 | DEBUG    | Reading standard file from /tmp/ersilia-pjmauqy3/standard_input_file.csv
01:14:53 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:59 | DEBUG    | Status code: 200
01:14:59 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:59 | DEBUG    | Done with unique posting
01:15:00 | DEBUG    | Data: outcome
01:15:00 | DEBUG    | Values: [0.9168235886886538, 1.0]
01:15:00 | DEBUG    | Getting pure dtype for outcome
01:15:00 | DEBUG    | This is the pure datatype: numeric_array
01:15:00 | DEBUG    | Datatype: numeric_array
01:15:00 | DEBUG    | Datatype has been matched: numeric_array over {'array', 'string_array', 'numeric_array', 'mixed_array'}
01:15:00 | DEBUG    | No merge key
01:15:00 | DEBUG    | [0.9168235886886538, 1.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
01:15:00 | DEBUG    | [0.58042319950123, 0.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
01:15:00 | DEBUG    | [0.529730521761002, 0.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
01:15:00 | DEBUG    | [0.1026072802857601, 0.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
Ersilia run completed!

Captured Raw Ersilia Output:
key,input,Probability,Prediction
BNRNXUUZRGQAQC-UHFFFAOYSA-N,CCCc1nn(C)c2c1nc([nH]c2=O)-c1cc(ccc1OCC)S(=O)(=O)N1CCN(C)CC1,0.9168235886886538,1.0
ODYAQBDIXCVKAE-UHFFFAOYSA-N,Oc1ccc(NC(=O)CCCc2ccc(cc2)-c2ccccc2F)cc1,0.58042319950123,0.0
ZRYMMWAJAFUANM-INIZCTEOSA-N,Cc1c(cccc1-n1c(=O)n(C)c2c(F)cccc2c1=O)-c1c(F)cc(C(N)=O)c2[nH]c3C[C@H](CCc3c12)C(C)(C)O,0.529730521761002,0.0
KGFYHTZWPPHNLQ-AWEZNQCLSA-N,Clc1ccc(s1)C(=O)NC[C@H]1CN(C(=O)O1)c1ccc(cc1)N1CCOCC1=O,0.1026072802857601,0.0

Processing ersilia csv output...
Header: ['key', 'input', 'Probability', 'Prediction']
01:15:00 | DEBUG    | Header: ['key', 'input', 'Probability', 'Prediction']
Selected Columns: ['Probability', 'Prediction']
01:15:00 | DEBUG    | Selected Columns: ['Probability', 'Prediction']
Selected Values: [0.9168235886886538, 1.0]
01:15:00 | DEBUG    | Selected Values: [0.9168235886886538, 1.0]
Data appended: {'Probability': 0.9168235886886538, 'Prediction': 1.0}
01:15:00 | DEBUG    | Data appended: {'Probability': 0.9168235886886538, 'Prediction': 1.0}
Processing raw bash output...
Header: ['Probability', 'Prediction']
01:15:00 | DEBUG    | Header: ['Probability', 'Prediction']
Selected Columns: ['Probability', 'Prediction']
01:15:00 | DEBUG    | Selected Columns: ['Probability', 'Prediction']
Selected Values: [0.9168235886886539, 1.0]
01:15:00 | DEBUG    | Selected Values: [0.9168235886886539, 1.0]
Data appended: {'Probability': 0.9168235886886539, 'Prediction': 1.0}
01:15:00 | DEBUG    | Data appended: {'Probability': 0.9168235886886539, 'Prediction': 1.0}

Bash output:
 [{'Probability': 0.9168235886886539, 'Prediction': 1.0}]

Ersilia output:
 [{'Probability': 0.9168235886886538, 'Prediction': 1.0}]

 Ersilia columns:  {'Prediction', 'Probability'}

 Bash columns:  {'Prediction', 'Probability'}
Common columns: {'Prediction', 'Probability'} 

New Section... printing output types between ersilia and bash run
<class 'float'>
1.0
<class 'float'>
1.0
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

'float' object is not iterable

kurysauce commented 1 month ago

For updates, implemented a potential fix called updated_read_csv, where we just extract the last two columns of the bash/ersilia run and their values since I believe this is what we care about. This is still a work in progress (as only the first values of the last two columns are processed), but I do believe that the original read_csvimplementation was the issue.

Log Output:

Running the model bash script...
01:14:47 | DEBUG    | Reading card from eos1pu1
01:14:47 | DEBUG    | Reading shape from eos1pu1
01:14:47 | DEBUG    | Input Shape: None
01:14:47 | DEBUG    | Input type is: compound
01:14:47 | DEBUG    | Input shape is: Single
01:14:47 | DEBUG    | Importing module: .types.compound
01:14:47 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:14:47 | DEBUG    | InputShapeSingle shape: Single
01:14:47 | DEBUG    | Randomly sampling input
Checking if run.sh exists at: /root/eos/dest/eos1pu1/model/framework/run.sh
run.sh exists!
01:14:47 | DEBUG    | Changing directory to: /root/eos/dest/eos1pu1/model/framework
01:14:47 | DEBUG    | Script path: /tmp/tmpxju9ed25/script.sh
01:14:47 | DEBUG    | bash output path: /tmp/tmpxju9ed25/bash_output.csv
01:14:47 | DEBUG    | Output log path: /tmp/tmpxju9ed25/output.txt
01:14:47 | DEBUG    | Error log path: /tmp/tmpxju9ed25/error.txt
Executing 'bash run.sh'...
Bash execution completed! Return code: 0 

Captured Raw Bash Output:
Probability,Prediction
0.9168235886886539,1
0.58042319950123,0
0.529730521761002,0
0.1026072802857601,0

Captured Error:

Executing ersilia run...
01:14:51 | DEBUG    | Ersilia output will be written to: /tmp/tmpxju9ed25/ersilia_output.csv
01:14:51 | DEBUG    | Getting session from /root/eos/sessions/session_564/session.json
01:14:51 | WARNING  | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
01:14:51 | ERROR    | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
01:14:51 | DEBUG    | Is fetched: True
01:14:51 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:51 | DEBUG    | Setting BentoML AutoService for eos1pu1
01:14:51 | INFO     | Service class provided
01:14:51 | DEBUG    | Pack method is: bentoml
01:14:51 | DEBUG    | Pack method is: bentoml
01:14:51 | INFO     | Done with initialization!
01:14:51 | INFO     | Starting runner
01:14:51 | DEBUG    | Trying standard API
01:14:51 | INFO     | You are running the app with a standard runner. Beware that this runner does not do as many checks on the input as the conventional runner: use it at your own risk.
01:14:51 | DEBUG    | Standard API processor started at http://127.0.0.1:38873
01:14:51 | DEBUG    | This is the input type: ['Compound']
01:14:51 | DEBUG    | This is the expected header (max 10): ['key', 'input', 'Probability', 'Prediction']
01:14:51 | DEBUG    | Standard CSV Api runner is not amenable for this model, input and output
01:14:51 | DEBUG    | Trying conventional run
01:14:52 | DEBUG    | Reading card from eos1pu1
01:14:52 | DEBUG    | Reading shape from eos1pu1
01:14:52 | DEBUG    | Input Shape: None
01:14:52 | DEBUG    | Input type is: compound
01:14:52 | DEBUG    | Input shape is: Single
01:14:52 | DEBUG    | Importing module: .types.compound
01:14:52 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:14:52 | DEBUG    | InputShapeSingle shape: Single
01:14:52 | DEBUG    | Expected number: 1
01:14:52 | DEBUG    | Entity is list: False
01:14:52 | DEBUG    | Resolving columns
01:14:52 | DEBUG    | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None}
01:14:52 | DEBUG    | Candidate header is ['smilesCCS(=O)(=O)N(C)[C@@H]1[C@@H](O)C(C)(C)Oc2ccc(cc12)C#N']
01:14:52 | DEBUG    | Matching for input is [0]
01:14:52 | DEBUG    | Has header True
01:14:52 | DEBUG    | Schema {'input': [0], 'key': None}
01:14:52 | DEBUG    | Standardizing input single
01:14:52 | DEBUG    | Writing standardized input to /tmp/ersilia-77jyyqf9/standard_input_file.csv
01:14:52 | DEBUG    | Reading standard file from /tmp/ersilia-77jyyqf9/standard_input_file.csv
01:14:52 | DEBUG    | File has 5 lines
01:14:52 | DEBUG    | No file splitting necessary!
01:14:53 | DEBUG    | Reading card from eos1pu1
01:14:53 | DEBUG    | Reading shape from eos1pu1
01:14:53 | DEBUG    | Input Shape: None
01:14:53 | DEBUG    | Input type is: compound
01:14:53 | DEBUG    | Input shape is: Single
01:14:53 | DEBUG    | Importing module: .types.compound
01:14:53 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:14:53 | DEBUG    | InputShapeSingle shape: Single
01:14:53 | DEBUG    | API eos1pu1:run initialized at URL http://127.0.0.1:38873
01:14:53 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:53 | DEBUG    | Posting to run
01:14:53 | DEBUG    | Batch size 100
01:14:53 | DEBUG    | Expected number: 1
01:14:53 | DEBUG    | Entity is list: False
01:14:53 | DEBUG    | Resolving columns
01:14:53 | DEBUG    | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None}
01:14:53 | DEBUG    | Candidate header is ['smilesCCS(=O)(=O)N(C)[C@@H]1[C@@H](O)C(C)(C)Oc2ccc(cc12)C#N']
01:14:53 | DEBUG    | Matching for input is [0]
01:14:53 | DEBUG    | Has header True
01:14:53 | DEBUG    | Schema {'input': [0], 'key': None}
01:14:53 | DEBUG    | Standardizing input single
01:14:53 | DEBUG    | Writing standardized input to /tmp/ersilia-pjmauqy3/standard_input_file.csv
01:14:53 | DEBUG    | Reading standard file from /tmp/ersilia-pjmauqy3/standard_input_file.csv
01:14:53 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:59 | DEBUG    | Status code: 200
01:14:59 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:59 | DEBUG    | Done with unique posting
01:15:00 | DEBUG    | Data: outcome
01:15:00 | DEBUG    | Values: [0.9168235886886538, 1.0]
01:15:00 | DEBUG    | Getting pure dtype for outcome
01:15:00 | DEBUG    | This is the pure datatype: numeric_array
01:15:00 | DEBUG    | Datatype: numeric_array
01:15:00 | DEBUG    | Datatype has been matched: numeric_array over {'array', 'string_array', 'numeric_array', 'mixed_array'}
01:15:00 | DEBUG    | No merge key
01:15:00 | DEBUG    | [0.9168235886886538, 1.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
01:15:00 | DEBUG    | [0.58042319950123, 0.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
01:15:00 | DEBUG    | [0.529730521761002, 0.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
01:15:00 | DEBUG    | [0.1026072802857601, 0.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
Ersilia run completed!

Captured Raw Ersilia Output:
key,input,Probability,Prediction
BNRNXUUZRGQAQC-UHFFFAOYSA-N,CCCc1nn(C)c2c1nc([nH]c2=O)-c1cc(ccc1OCC)S(=O)(=O)N1CCN(C)CC1,0.9168235886886538,1.0
ODYAQBDIXCVKAE-UHFFFAOYSA-N,Oc1ccc(NC(=O)CCCc2ccc(cc2)-c2ccccc2F)cc1,0.58042319950123,0.0
ZRYMMWAJAFUANM-INIZCTEOSA-N,Cc1c(cccc1-n1c(=O)n(C)c2c(F)cccc2c1=O)-c1c(F)cc(C(N)=O)c2[nH]c3C[C@H](CCc3c12)C(C)(C)O,0.529730521761002,0.0
KGFYHTZWPPHNLQ-AWEZNQCLSA-N,Clc1ccc(s1)C(=O)NC[C@H]1CN(C(=O)O1)c1ccc(cc1)N1CCOCC1=O,0.1026072802857601,0.0

Processing ersilia csv output...
Header: ['key', 'input', 'Probability', 'Prediction']
01:15:00 | DEBUG    | Header: ['key', 'input', 'Probability', 'Prediction']
Selected Columns: ['Probability', 'Prediction']
01:15:00 | DEBUG    | Selected Columns: ['Probability', 'Prediction']
Selected Values: [0.9168235886886538, 1.0]
01:15:00 | DEBUG    | Selected Values: [0.9168235886886538, 1.0]
Data appended: {'Probability': 0.9168235886886538, 'Prediction': 1.0}
01:15:00 | DEBUG    | Data appended: {'Probability': 0.9168235886886538, 'Prediction': 1.0}
Processing raw bash output...
Header: ['Probability', 'Prediction']
01:15:00 | DEBUG    | Header: ['Probability', 'Prediction']
Selected Columns: ['Probability', 'Prediction']
01:15:00 | DEBUG    | Selected Columns: ['Probability', 'Prediction']
Selected Values: [0.9168235886886539, 1.0]
01:15:00 | DEBUG    | Selected Values: [0.9168235886886539, 1.0]
Data appended: {'Probability': 0.9168235886886539, 'Prediction': 1.0}
01:15:00 | DEBUG    | Data appended: {'Probability': 0.9168235886886539, 'Prediction': 1.0}

Bash output:
 [{'Probability': 0.9168235886886539, 'Prediction': 1.0}]

Ersilia output:
 [{'Probability': 0.9168235886886538, 'Prediction': 1.0}]

 Ersilia columns:  {'Prediction', 'Probability'}

 Bash columns:  {'Prediction', 'Probability'}
Common columns: {'Prediction', 'Probability'} 

New Section... printing output types between ersilia and bash run
<class 'float'>
1.0
<class 'float'>
1.0
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

'float' object is not iterable

Update: Completed the updated_read_csv method and I believe it is correct according to the log output. I worry that it may be too specific to the eos model I picked (eos1pu1) so testing this against other models + feedback would be appreciated. Advice on approaching the comparison section would also be appreciated.

Updated Log

Captured Raw Ersilia Output:
key,input,Probability,Prediction
XLXSAKCOAKORKW-AQJXLSMYSA-N,CC(C)C[C@H](NC(=O)CNC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](CO)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](Cc1cnc[nH]1)NC(=O)[C@@H]1CCC(=O)N1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N1CCC[C@H]1C(=O)NCC(N)=O,0.8729711752454675,1.0
OLROWHGDTNFZBH-XEMWPYQTSA-N,[H][C@]1(C)NC(=O)[C@H](C)NC(=O)[C@H](CC(C)C)N(C)C(=O)[C@@H](NC(=O)[C@H](C(C)C)N(CC)C(=O)[C@@H](C)N(C)C(=O)[C@H](CC)NC(=O)[C@]([H])([C@H](O)[C@H](C)C\C=C\C)N(C)C(=O)[C@H](C(C)C)N(C)C(=O)[C@]([H])(CC(C)C)N(C)C(=O)[C@H](CC(C)C)N(C)C1=O)C(C)C,0.7067416550261579,1.0
MOVBBVMDHIRCTG-LJQANCHMSA-N,Clc1ccc2[nH]c(=O)c(-c3nc4ccccc4[nH]3)c(N[C@@H]3CN4CCC3CC4)c2c1,0.7477766936837326,1.0
ZPANWZBSGMDWON-UHFFFAOYSA-N,Oc1ccc2ccccc2c1Cc1c(O)ccc2ccccc12,0.6063569932426209,0.0

Processing ersilia csv output...

03:36:13 | DEBUG    | Processing line: XLXSAKCOAKORKW-AQJXLSMYSA-N,CC(C)C[C@H](NC(=O)CNC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](CO)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](Cc1cnc[nH]1)NC(=O)[C@@H]1CCC(=O)N1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N1CCC[C@H]1C(=O)NCC(N)=O,0.8729711752454675,1.0

03:36:13 | DEBUG    | Selected Values: ['0.8729711752454675', '1.0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.8729711752454675, 1.0]
03:36:13 | DEBUG    | Processing line: OLROWHGDTNFZBH-XEMWPYQTSA-N,[H][C@]1(C)NC(=O)[C@H](C)NC(=O)[C@H](CC(C)C)N(C)C(=O)[C@@H](NC(=O)[C@H](C(C)C)N(CC)C(=O)[C@@H](C)N(C)C(=O)[C@H](CC)NC(=O)[C@]([H])([C@H](O)[C@H](C)C\C=C\C)N(C)C(=O)[C@H](C(C)C)N(C)C(=O)[C@]([H])(CC(C)C)N(C)C(=O)[C@H](CC(C)C)N(C)C1=O)C(C)C,0.7067416550261579,1.0

03:36:13 | DEBUG    | Selected Values: ['0.7067416550261579', '1.0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.7067416550261579, 1.0]
03:36:13 | DEBUG    | Processing line: MOVBBVMDHIRCTG-LJQANCHMSA-N,Clc1ccc2[nH]c(=O)c(-c3nc4ccccc4[nH]3)c(N[C@@H]3CN4CCC3CC4)c2c1,0.7477766936837326,1.0

03:36:13 | DEBUG    | Selected Values: ['0.7477766936837326', '1.0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.7477766936837326, 1.0]
03:36:13 | DEBUG    | Processing line: ZPANWZBSGMDWON-UHFFFAOYSA-N,Oc1ccc2ccccc2c1Cc1c(O)ccc2ccccc12,0.6063569932426209,0.0

03:36:13 | DEBUG    | Selected Values: ['0.6063569932426209', '0.0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.6063569932426209, 0.0]
Captured Raw Bash Output:
Probability,Prediction
0.8729711752454676,1
0.7067416550261579,1
0.7477766936837327,1
0.606356993242621,0

Processing raw bash output...: 

03:36:13 | DEBUG    | Processing line: 0.8729711752454676,1

03:36:13 | DEBUG    | Selected Values: ['0.8729711752454676', '1'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.8729711752454676, 1.0]
03:36:13 | DEBUG    | Processing line: 0.7067416550261579,1

03:36:13 | DEBUG    | Selected Values: ['0.7067416550261579', '1'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.7067416550261579, 1.0]
03:36:13 | DEBUG    | Processing line: 0.7477766936837327,1

03:36:13 | DEBUG    | Selected Values: ['0.7477766936837327', '1'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.7477766936837327, 1.0]
03:36:13 | DEBUG    | Processing line: 0.606356993242621,0

03:36:13 | DEBUG    | Selected Values: ['0.606356993242621', '0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.606356993242621, 0.0]

Bash output:
 [{'Probability': 0.8729711752454676, 'Prediction': 1.0}, {'Probability': 0.7067416550261579, 'Prediction': 1.0}, {'Probability': 0.7477766936837327, 'Prediction': 1.0}, {'Probability': 0.606356993242621, 'Prediction': 0.0}]

Ersilia output:
 [{'Probability': 0.8729711752454675, 'Prediction': 1.0}, {'Probability': 0.7067416550261579, 'Prediction': 1.0}, {'Probability': 0.7477766936837326, 'Prediction': 1.0}, {'Probability': 0.6063569932426209, 'Prediction': 0.0}]

 Ersilia columns:  {'Prediction', 'Probability'}

 Bash columns:  {'Prediction', 'Probability'}
New Section... printing output types between ersilia and bash run
<class 'float'>
1.0
<class 'float'>
1.0
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

'float' object is not iterable

kurysauce commented 1 month ago

For updates, implemented a potential fix called updated_read_csv, where we just extract the last two columns of the bash/ersilia run and their values since I believe this is what we care about. This is still a work in progress (as only the first values of the last two columns are processed), but I do believe that the original read_csvimplementation was the issue.

Log Output:

Running the model bash script...
01:14:47 | DEBUG    | Reading card from eos1pu1
01:14:47 | DEBUG    | Reading shape from eos1pu1
01:14:47 | DEBUG    | Input Shape: None
01:14:47 | DEBUG    | Input type is: compound
01:14:47 | DEBUG    | Input shape is: Single
01:14:47 | DEBUG    | Importing module: .types.compound
01:14:47 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:14:47 | DEBUG    | InputShapeSingle shape: Single
01:14:47 | DEBUG    | Randomly sampling input
Checking if run.sh exists at: /root/eos/dest/eos1pu1/model/framework/run.sh
run.sh exists!
01:14:47 | DEBUG    | Changing directory to: /root/eos/dest/eos1pu1/model/framework
01:14:47 | DEBUG    | Script path: /tmp/tmpxju9ed25/script.sh
01:14:47 | DEBUG    | bash output path: /tmp/tmpxju9ed25/bash_output.csv
01:14:47 | DEBUG    | Output log path: /tmp/tmpxju9ed25/output.txt
01:14:47 | DEBUG    | Error log path: /tmp/tmpxju9ed25/error.txt
Executing 'bash run.sh'...
Bash execution completed! Return code: 0 

Captured Raw Bash Output:
Probability,Prediction
0.9168235886886539,1
0.58042319950123,0
0.529730521761002,0
0.1026072802857601,0

Captured Error:

Executing ersilia run...
01:14:51 | DEBUG    | Ersilia output will be written to: /tmp/tmpxju9ed25/ersilia_output.csv
01:14:51 | DEBUG    | Getting session from /root/eos/sessions/session_564/session.json
01:14:51 | WARNING  | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
01:14:51 | ERROR    | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
01:14:51 | DEBUG    | Is fetched: True
01:14:51 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:51 | DEBUG    | Setting BentoML AutoService for eos1pu1
01:14:51 | INFO     | Service class provided
01:14:51 | DEBUG    | Pack method is: bentoml
01:14:51 | DEBUG    | Pack method is: bentoml
01:14:51 | INFO     | Done with initialization!
01:14:51 | INFO     | Starting runner
01:14:51 | DEBUG    | Trying standard API
01:14:51 | INFO     | You are running the app with a standard runner. Beware that this runner does not do as many checks on the input as the conventional runner: use it at your own risk.
01:14:51 | DEBUG    | Standard API processor started at http://127.0.0.1:38873
01:14:51 | DEBUG    | This is the input type: ['Compound']
01:14:51 | DEBUG    | This is the expected header (max 10): ['key', 'input', 'Probability', 'Prediction']
01:14:51 | DEBUG    | Standard CSV Api runner is not amenable for this model, input and output
01:14:51 | DEBUG    | Trying conventional run
01:14:52 | DEBUG    | Reading card from eos1pu1
01:14:52 | DEBUG    | Reading shape from eos1pu1
01:14:52 | DEBUG    | Input Shape: None
01:14:52 | DEBUG    | Input type is: compound
01:14:52 | DEBUG    | Input shape is: Single
01:14:52 | DEBUG    | Importing module: .types.compound
01:14:52 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:14:52 | DEBUG    | InputShapeSingle shape: Single
01:14:52 | DEBUG    | Expected number: 1
01:14:52 | DEBUG    | Entity is list: False
01:14:52 | DEBUG    | Resolving columns
01:14:52 | DEBUG    | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None}
01:14:52 | DEBUG    | Candidate header is ['smilesCCS(=O)(=O)N(C)[C@@H]1[C@@H](O)C(C)(C)Oc2ccc(cc12)C#N']
01:14:52 | DEBUG    | Matching for input is [0]
01:14:52 | DEBUG    | Has header True
01:14:52 | DEBUG    | Schema {'input': [0], 'key': None}
01:14:52 | DEBUG    | Standardizing input single
01:14:52 | DEBUG    | Writing standardized input to /tmp/ersilia-77jyyqf9/standard_input_file.csv
01:14:52 | DEBUG    | Reading standard file from /tmp/ersilia-77jyyqf9/standard_input_file.csv
01:14:52 | DEBUG    | File has 5 lines
01:14:52 | DEBUG    | No file splitting necessary!
01:14:53 | DEBUG    | Reading card from eos1pu1
01:14:53 | DEBUG    | Reading shape from eos1pu1
01:14:53 | DEBUG    | Input Shape: None
01:14:53 | DEBUG    | Input type is: compound
01:14:53 | DEBUG    | Input shape is: Single
01:14:53 | DEBUG    | Importing module: .types.compound
01:14:53 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:14:53 | DEBUG    | InputShapeSingle shape: Single
01:14:53 | DEBUG    | API eos1pu1:run initialized at URL http://127.0.0.1:38873
01:14:53 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:53 | DEBUG    | Posting to run
01:14:53 | DEBUG    | Batch size 100
01:14:53 | DEBUG    | Expected number: 1
01:14:53 | DEBUG    | Entity is list: False
01:14:53 | DEBUG    | Resolving columns
01:14:53 | DEBUG    | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None}
01:14:53 | DEBUG    | Candidate header is ['smilesCCS(=O)(=O)N(C)[C@@H]1[C@@H](O)C(C)(C)Oc2ccc(cc12)C#N']
01:14:53 | DEBUG    | Matching for input is [0]
01:14:53 | DEBUG    | Has header True
01:14:53 | DEBUG    | Schema {'input': [0], 'key': None}
01:14:53 | DEBUG    | Standardizing input single
01:14:53 | DEBUG    | Writing standardized input to /tmp/ersilia-pjmauqy3/standard_input_file.csv
01:14:53 | DEBUG    | Reading standard file from /tmp/ersilia-pjmauqy3/standard_input_file.csv
01:14:53 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:59 | DEBUG    | Status code: 200
01:14:59 | DEBUG    | Schema available in /root/eos/dest/eos1pu1/api_schema.json
01:14:59 | DEBUG    | Done with unique posting
01:15:00 | DEBUG    | Data: outcome
01:15:00 | DEBUG    | Values: [0.9168235886886538, 1.0]
01:15:00 | DEBUG    | Getting pure dtype for outcome
01:15:00 | DEBUG    | This is the pure datatype: numeric_array
01:15:00 | DEBUG    | Datatype: numeric_array
01:15:00 | DEBUG    | Datatype has been matched: numeric_array over {'array', 'string_array', 'numeric_array', 'mixed_array'}
01:15:00 | DEBUG    | No merge key
01:15:00 | DEBUG    | [0.9168235886886538, 1.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
01:15:00 | DEBUG    | [0.58042319950123, 0.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
01:15:00 | DEBUG    | [0.529730521761002, 0.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
01:15:00 | DEBUG    | [0.1026072802857601, 0.0]
01:15:00 | DEBUG    | numeric_array
01:15:00 | DEBUG    | outcome
Ersilia run completed!

Captured Raw Ersilia Output:
key,input,Probability,Prediction
BNRNXUUZRGQAQC-UHFFFAOYSA-N,CCCc1nn(C)c2c1nc([nH]c2=O)-c1cc(ccc1OCC)S(=O)(=O)N1CCN(C)CC1,0.9168235886886538,1.0
ODYAQBDIXCVKAE-UHFFFAOYSA-N,Oc1ccc(NC(=O)CCCc2ccc(cc2)-c2ccccc2F)cc1,0.58042319950123,0.0
ZRYMMWAJAFUANM-INIZCTEOSA-N,Cc1c(cccc1-n1c(=O)n(C)c2c(F)cccc2c1=O)-c1c(F)cc(C(N)=O)c2[nH]c3C[C@H](CCc3c12)C(C)(C)O,0.529730521761002,0.0
KGFYHTZWPPHNLQ-AWEZNQCLSA-N,Clc1ccc(s1)C(=O)NC[C@H]1CN(C(=O)O1)c1ccc(cc1)N1CCOCC1=O,0.1026072802857601,0.0

Processing ersilia csv output...
Header: ['key', 'input', 'Probability', 'Prediction']
01:15:00 | DEBUG    | Header: ['key', 'input', 'Probability', 'Prediction']
Selected Columns: ['Probability', 'Prediction']
01:15:00 | DEBUG    | Selected Columns: ['Probability', 'Prediction']
Selected Values: [0.9168235886886538, 1.0]
01:15:00 | DEBUG    | Selected Values: [0.9168235886886538, 1.0]
Data appended: {'Probability': 0.9168235886886538, 'Prediction': 1.0}
01:15:00 | DEBUG    | Data appended: {'Probability': 0.9168235886886538, 'Prediction': 1.0}
Processing raw bash output...
Header: ['Probability', 'Prediction']
01:15:00 | DEBUG    | Header: ['Probability', 'Prediction']
Selected Columns: ['Probability', 'Prediction']
01:15:00 | DEBUG    | Selected Columns: ['Probability', 'Prediction']
Selected Values: [0.9168235886886539, 1.0]
01:15:00 | DEBUG    | Selected Values: [0.9168235886886539, 1.0]
Data appended: {'Probability': 0.9168235886886539, 'Prediction': 1.0}
01:15:00 | DEBUG    | Data appended: {'Probability': 0.9168235886886539, 'Prediction': 1.0}

Bash output:
 [{'Probability': 0.9168235886886539, 'Prediction': 1.0}]

Ersilia output:
 [{'Probability': 0.9168235886886538, 'Prediction': 1.0}]

 Ersilia columns:  {'Prediction', 'Probability'}

 Bash columns:  {'Prediction', 'Probability'}
Common columns: {'Prediction', 'Probability'} 

New Section... printing output types between ersilia and bash run
<class 'float'>
1.0
<class 'float'>
1.0
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

'float' object is not iterable

Update: Completed the updated_read_csv method and I believe it is correct according to the log output. I worry that it may be too specific to the eos model I picked (eos1pu1) so testing this against other models + feedback would be appreciated. Advice on approaching the comparison section would also be appreciated.

Updated Log

Captured Raw Ersilia Output:
key,input,Probability,Prediction
XLXSAKCOAKORKW-AQJXLSMYSA-N,CC(C)C[C@H](NC(=O)CNC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](CO)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](Cc1cnc[nH]1)NC(=O)[C@@H]1CCC(=O)N1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N1CCC[C@H]1C(=O)NCC(N)=O,0.8729711752454675,1.0
OLROWHGDTNFZBH-XEMWPYQTSA-N,[H][C@]1(C)NC(=O)[C@H](C)NC(=O)[C@H](CC(C)C)N(C)C(=O)[C@@H](NC(=O)[C@H](C(C)C)N(CC)C(=O)[C@@H](C)N(C)C(=O)[C@H](CC)NC(=O)[C@]([H])([C@H](O)[C@H](C)C\C=C\C)N(C)C(=O)[C@H](C(C)C)N(C)C(=O)[C@]([H])(CC(C)C)N(C)C(=O)[C@H](CC(C)C)N(C)C1=O)C(C)C,0.7067416550261579,1.0
MOVBBVMDHIRCTG-LJQANCHMSA-N,Clc1ccc2[nH]c(=O)c(-c3nc4ccccc4[nH]3)c(N[C@@H]3CN4CCC3CC4)c2c1,0.7477766936837326,1.0
ZPANWZBSGMDWON-UHFFFAOYSA-N,Oc1ccc2ccccc2c1Cc1c(O)ccc2ccccc12,0.6063569932426209,0.0

Processing ersilia csv output...

03:36:13 | DEBUG    | Processing line: XLXSAKCOAKORKW-AQJXLSMYSA-N,CC(C)C[C@H](NC(=O)CNC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](CO)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](Cc1cnc[nH]1)NC(=O)[C@@H]1CCC(=O)N1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N1CCC[C@H]1C(=O)NCC(N)=O,0.8729711752454675,1.0

03:36:13 | DEBUG    | Selected Values: ['0.8729711752454675', '1.0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.8729711752454675, 1.0]
03:36:13 | DEBUG    | Processing line: OLROWHGDTNFZBH-XEMWPYQTSA-N,[H][C@]1(C)NC(=O)[C@H](C)NC(=O)[C@H](CC(C)C)N(C)C(=O)[C@@H](NC(=O)[C@H](C(C)C)N(CC)C(=O)[C@@H](C)N(C)C(=O)[C@H](CC)NC(=O)[C@]([H])([C@H](O)[C@H](C)C\C=C\C)N(C)C(=O)[C@H](C(C)C)N(C)C(=O)[C@]([H])(CC(C)C)N(C)C(=O)[C@H](CC(C)C)N(C)C1=O)C(C)C,0.7067416550261579,1.0

03:36:13 | DEBUG    | Selected Values: ['0.7067416550261579', '1.0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.7067416550261579, 1.0]
03:36:13 | DEBUG    | Processing line: MOVBBVMDHIRCTG-LJQANCHMSA-N,Clc1ccc2[nH]c(=O)c(-c3nc4ccccc4[nH]3)c(N[C@@H]3CN4CCC3CC4)c2c1,0.7477766936837326,1.0

03:36:13 | DEBUG    | Selected Values: ['0.7477766936837326', '1.0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.7477766936837326, 1.0]
03:36:13 | DEBUG    | Processing line: ZPANWZBSGMDWON-UHFFFAOYSA-N,Oc1ccc2ccccc2c1Cc1c(O)ccc2ccccc12,0.6063569932426209,0.0

03:36:13 | DEBUG    | Selected Values: ['0.6063569932426209', '0.0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.6063569932426209, 0.0]
Captured Raw Bash Output:
Probability,Prediction
0.8729711752454676,1
0.7067416550261579,1
0.7477766936837327,1
0.606356993242621,0

Processing raw bash output...: 

03:36:13 | DEBUG    | Processing line: 0.8729711752454676,1

03:36:13 | DEBUG    | Selected Values: ['0.8729711752454676', '1'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.8729711752454676, 1.0]
03:36:13 | DEBUG    | Processing line: 0.7067416550261579,1

03:36:13 | DEBUG    | Selected Values: ['0.7067416550261579', '1'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.7067416550261579, 1.0]
03:36:13 | DEBUG    | Processing line: 0.7477766936837327,1

03:36:13 | DEBUG    | Selected Values: ['0.7477766936837327', '1'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.7477766936837327, 1.0]
03:36:13 | DEBUG    | Processing line: 0.606356993242621,0

03:36:13 | DEBUG    | Selected Values: ['0.606356993242621', '0'] and their type ['Float']
03:36:13 | DEBUG    | these values are floats: [0.606356993242621, 0.0]

Bash output:
 [{'Probability': 0.8729711752454676, 'Prediction': 1.0}, {'Probability': 0.7067416550261579, 'Prediction': 1.0}, {'Probability': 0.7477766936837327, 'Prediction': 1.0}, {'Probability': 0.606356993242621, 'Prediction': 0.0}]

Ersilia output:
 [{'Probability': 0.8729711752454675, 'Prediction': 1.0}, {'Probability': 0.7067416550261579, 'Prediction': 1.0}, {'Probability': 0.7477766936837326, 'Prediction': 1.0}, {'Probability': 0.6063569932426209, 'Prediction': 0.0}]

 Ersilia columns:  {'Prediction', 'Probability'}

 Bash columns:  {'Prediction', 'Probability'}
New Section... printing output types between ersilia and bash run
<class 'float'>
1.0
<class 'float'>
1.0
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

'float' object is not iterable

Update: I believe I fixed the issue and the update code is pushed to my fork. Can others test?

miquelduranfrigola commented 1 month ago

Thanks @kurysauce!

(just a small note, we don't want to parse the last two columns only, we strictly want to parse all columns from the third one (i.e. index 2 in python) to the end of the CSV file. This is what you are doing, so all is good).

@DhanshreeA and @HarmonySosa - do you think we can quickly test this? @kurysauce please give instruction on which commands you'd like folks to test. I recommend using Codespaces.

Many thanks! Great progress!

kurysauce commented 1 month ago

Thanks @kurysauce!

(just a small note, we don't want to parse the last two columns only, we strictly want to parse all columns from the third one (i.e. index 2 in python) to the end of the CSV file. This is what you are doing, so all is good).

@DhanshreeA and @HarmonySosa - do you think we can quickly test this? @kurysauce please give instruction on which commands you'd like folks to test. I recommend using Codespaces.

Many thanks! Great progress!

Hi @DhanshreeA and @HarmonySosa ! I agree, running the testing on Codespace works better for me than on my local environment. Fetch the models with the --from_github flag and run the test command with the verbose flag ersilia -v test MODEL_ID.

I started testing a few different models based on their output types:

eos8a5g - string: passes eos4tcc - float: inconsistent outputs, exceeds 5% difference threshold (currently working on issue for this) eos4e40 - float: missing run.sh from framework folder eos1pu1 - float: passes

I am particularly paying attention to the outputs in the check_consistent output and run_bash method to see if the model outputs are reasonable, if a run.sh file exists within the model, and if the outputs between the bash file and Ersilia run match. Additionally, there should be no issues with parsing the .csv outputs of the Ersilia and bash file. If there are any unexpected errors raised please post them here, thanks!

DhanshreeA commented 1 month ago

Hey @kurysauce Fantastic work on this so far! I see that the test command has evolved phenomenally and it would be really helpful if we could document all the changes somewhere - especially the following points:

The use of correlation coefficient and RMSE to within the check for consistent outputs
Working with models that don't have a run.sh
Anything else that I am missing from this list.

I think for now putting these details in a Google doc would be sufficient, and we'll try to move it to our GitBook within the model contribution template while you're here.

kurysauce commented 1 month ago

Hey @kurysauce Fantastic work on this so far! I see that the test command has evolved phenomenally and it would be really helpful if we could document all the changes somewhere - especially the following points:

The use of correlation coefficient and RMSE to within the check for consistent outputs

Working with models that don't have a run.sh

Anything else that I am missing from this list.

I think for now putting these details in a Google doc would be sufficient, and we'll try to move it to our GitBook within the model contribution template while you're here.

Here is the link to the Googel Doc Link! I believe I have made all the edits requested as well @DhanshreeA.

kurysauce commented 1 month ago

Closed with PR

ersilia-os / ersilia