ersilia-os / eos31ve

GNU General Public License v3.0
1 stars 0 forks source link

Clean UP & Dockerization eos31ve #4

Closed GemmaTuron closed 1 year ago

GemmaTuron commented 1 year ago

Please refactor this model to work with the latest structure of the template

HellenNamulinda commented 1 year ago

Hello @GemmaTuron While this model had GitHub workflows, they needed modification. Before making any changes, I tested the initial model which fetches successfully, and I was able to make predictions when given string inputs.

15:32:13 | DEBUG    | Done with unique posting
{
    "input": {
        "key": "RZVAJINKPMORJF-UHFFFAOYSA-N",
        "input": "CC(=O)NC1=CC=C(O)C=C1",
        "text": "CC(=O)NC1=CC=C(O)C=C1"
    },
    "output": {
        "outcome": [
            0.008
        ]
    }
}

However, when given an output file, the model was failing, eos31ve_predict_file.log

  File "/home/hellenah/ersilia/ersilia/io/output.py", line 248, in _to_dataframe
    vals = self.__cast_values(vals, dtypes, output_keys)
  File "/home/hellenah/ersilia/ersilia/io/output.py", line 173, in __cast_values
    v += v_
TypeError: 'float' object is not iterable

Just as the two models I tested. One was worked on by Emma, model eos9tyg and another was worked on by Samuel, model eos2r5a

HellenNamulinda commented 1 year ago

When I checked the api schema, the model was returning a numeric array(list) instead of a float.

"output": {
            "outcome": {
                "type": "numeric_array",
                "shape": [
                    1
                ],
                "meta": [
                    "hlm_proba1"
                ]
            }
        }

It was so, because of this code in service.py which returns a list

R += [
                   {"outcome": [Float(x) for x in r]}
               ]

Which I think should be modified to return a float,

R += [
           {"outcome": Float(r[0])}
          ] 
miquelduranfrigola commented 1 year ago

Thanks @HellenNamulinda this is very useful.

This seems to be related to the "flattening" of JSON outputs. Please let me take the lead here - I wil keep you posted.

GemmaTuron commented 1 year ago

@miquelduranfrigola

This is happening also for model eos9tyg, which is another ADME NCATS model - they were all incorporated at the same time so probably some change in the code makes them all fail

GemmaTuron commented 1 year ago

@miquelduranfrigola for eos9tyg - works for @emmakodes and @ZakiaYahya so maybe its an issue of the Ersilia base code changes?

miquelduranfrigola commented 1 year ago

Indeed, eos9tyg works as it can be seen in https://github.com/ersilia-os/eos9tyg/actions/ .

For eos31ve, @GemmaTuron, I would suggest that we try on GitHub Actions, with the new workflow. I did not manage to make it run on my Apple M2, so I would take a brute-force approach and directly try the workflows, and see.

HellenNamulinda commented 1 year ago

Hello @miquelduranfrigola, the model fetches and works perfectly when you pass inputs, without specifying the output file(-o)

The problem was only happening when you specify say -o output.csv

HellenNamulinda commented 1 year ago

When I checked the api schema, the model was returning a numeric array(list) instead of a float.

"output": {
            "outcome": {
                "type": "numeric_array",
                "shape": [
                    1
                ],
                "meta": [
                    "hlm_proba1"
                ]
            }
        }

It was so, because of this code in service.py which returns a list

R += [
                   {"outcome": [Float(x) for x in r]}
               ]

Which I think should be modified to return a float,

R += [
           {"outcome": Float(r[0])}
          ] 

Hello @GemmaTuron and @miquelduranfrigola, The metadata shows;

When I changed the code in service.py to return a single float instead of an array(list), the model now works.

R += [ 
            {"hlm_proba1": Float(r[0])}
           ]

👍 Model eos31ve fetched successfully! eos31ve_fetch2.log Makes predictions for inputs without output files, eos31ve2_predict_one.log

19:59:59 | DEBUG    | Done with unique posting
{
    "input": {
        "key": "RZVAJINKPMORJF-UHFFFAOYSA-N",
        "input": "CC(=O)NC1=CC=C(O)C=C1",
        "text": "CC(=O)NC1=CC=C(O)C=C1"
    },
    "output": {
        "hlm_proba1": 0.008
    }
}

And also for csv inputs and outputs, eml_eos31ve2.csv eos31ve2_predict_file.log

20:03:23 | DEBUG    | Data: hlm_proba1
20:03:23 | DEBUG    | Values: 0.009
20:03:23 | DEBUG    | Pure datatype: numeric
eml_eos31ve2.csv

Also, the schema now shows single,

{
    "run": {
        "input": {
            "key": {
                "type": "string"
            },
            "input": {
                "type": "string"
            },
            "text": {
                "type": "string"
            }
        },
        "output": {
            "hlm_proba1": {
                "type": "numeric",
                "meta": [
                    "hlm_proba1"
                ]
            }
        }
    }
}

Unlike the previous which was returning a numeric array(list)

"output": {
            "outcome": {
                "type": "numeric_array",
                "shape": [
                    1
                ],
                "meta": [
                    "hlm_proba1"
                ]
            }
        }
miquelduranfrigola commented 1 year ago

Thanks, @HellenNamulinda - very good work here.

It is true that numeric arrays of length 1 are, de facto, single-point outputs. Consistency in the outputs terminology is surprisingly difficult to achieve and, now that we have a lot of models, we should start writing about it in the Ersilia Book.

For now, let's keep a working solution and we will revisit models in light of an output ontology later.

@GemmaTuron can we check that the model works now?

GemmaTuron commented 1 year ago

The Docker build was failing, I've updated the workflow to build separately for arm64 and amd64 let's see!

HellenNamulinda commented 1 year ago

The Docker build was failing, I've updated the workflow to build separately for arm64 and amd64 let's see!

Hello @Gemma, I have checked the log. Arm/64 failed because of sklearn.

#8 998.4     from sklearn.metrics import auc, mean_absolute_error, mean_squared_error, precision_recall_curve, r2_score,\
#8 998.4 ModuleNotFoundError: No module named 'sklearn'

I have seen the cause, which is an installation error for sckit-learn in arm64.

#8 833.5 × Encountered error while trying to install package.
#8 833.5 ╰─> scikit-learn

I will try fixing the installation of scikit-learn for future builds, just like the way torch has been resolved for eos9yui.