Closed GemmaTuron closed 1 year ago
Hello @GemmaTuron While this model had GitHub workflows, they needed modification. Before making any changes, I tested the initial model which fetches successfully, and I was able to make predictions when given string inputs.
15:32:13 | DEBUG | Done with unique posting
{
"input": {
"key": "RZVAJINKPMORJF-UHFFFAOYSA-N",
"input": "CC(=O)NC1=CC=C(O)C=C1",
"text": "CC(=O)NC1=CC=C(O)C=C1"
},
"output": {
"outcome": [
0.008
]
}
}
However, when given an output file, the model was failing, eos31ve_predict_file.log
File "/home/hellenah/ersilia/ersilia/io/output.py", line 248, in _to_dataframe
vals = self.__cast_values(vals, dtypes, output_keys)
File "/home/hellenah/ersilia/ersilia/io/output.py", line 173, in __cast_values
v += v_
TypeError: 'float' object is not iterable
Just as the two models I tested. One was worked on by Emma, model eos9tyg and another was worked on by Samuel, model eos2r5a
When I checked the api schema, the model was returning a numeric array(list) instead of a float.
"output": {
"outcome": {
"type": "numeric_array",
"shape": [
1
],
"meta": [
"hlm_proba1"
]
}
}
It was so, because of this code in service.py which returns a list
R += [
{"outcome": [Float(x) for x in r]}
]
Which I think should be modified to return a float,
R += [
{"outcome": Float(r[0])}
]
Thanks @HellenNamulinda this is very useful.
This seems to be related to the "flattening" of JSON outputs. Please let me take the lead here - I wil keep you posted.
@miquelduranfrigola
This is happening also for model eos9tyg, which is another ADME NCATS model - they were all incorporated at the same time so probably some change in the code makes them all fail
@miquelduranfrigola for eos9tyg - works for @emmakodes and @ZakiaYahya so maybe its an issue of the Ersilia base code changes?
Indeed, eos9tyg
works as it can be seen in https://github.com/ersilia-os/eos9tyg/actions/ .
For eos31ve
, @GemmaTuron, I would suggest that we try on GitHub Actions, with the new workflow. I did not manage to make it run on my Apple M2, so I would take a brute-force approach and directly try the workflows, and see.
Hello @miquelduranfrigola, the model fetches and works perfectly when you pass inputs, without specifying the output file(-o)
The problem was only happening when you specify say -o output.csv
When I checked the api schema, the model was returning a numeric array(list) instead of a float.
"output": { "outcome": { "type": "numeric_array", "shape": [ 1 ], "meta": [ "hlm_proba1" ] } }
It was so, because of this code in service.py which returns a list
R += [ {"outcome": [Float(x) for x in r]} ]
Which I think should be modified to return a float,
R += [ {"outcome": Float(r[0])} ]
Hello @GemmaTuron and @miquelduranfrigola, The metadata shows;
Float
Single
When I changed the code in service.py to return a single float instead of an array(list), the model now works.
R += [
{"hlm_proba1": Float(r[0])}
]
👍 Model eos31ve fetched successfully! eos31ve_fetch2.log Makes predictions for inputs without output files, eos31ve2_predict_one.log
19:59:59 | DEBUG | Done with unique posting
{
"input": {
"key": "RZVAJINKPMORJF-UHFFFAOYSA-N",
"input": "CC(=O)NC1=CC=C(O)C=C1",
"text": "CC(=O)NC1=CC=C(O)C=C1"
},
"output": {
"hlm_proba1": 0.008
}
}
And also for csv inputs and outputs, eml_eos31ve2.csv eos31ve2_predict_file.log
20:03:23 | DEBUG | Data: hlm_proba1
20:03:23 | DEBUG | Values: 0.009
20:03:23 | DEBUG | Pure datatype: numeric
eml_eos31ve2.csv
Also, the schema now shows single,
{
"run": {
"input": {
"key": {
"type": "string"
},
"input": {
"type": "string"
},
"text": {
"type": "string"
}
},
"output": {
"hlm_proba1": {
"type": "numeric",
"meta": [
"hlm_proba1"
]
}
}
}
}
Unlike the previous which was returning a numeric array(list)
"output": {
"outcome": {
"type": "numeric_array",
"shape": [
1
],
"meta": [
"hlm_proba1"
]
}
}
Thanks, @HellenNamulinda - very good work here.
It is true that numeric arrays of length 1 are, de facto, single-point outputs. Consistency in the outputs terminology is surprisingly difficult to achieve and, now that we have a lot of models, we should start writing about it in the Ersilia Book.
For now, let's keep a working solution and we will revisit models in light of an output ontology later.
@GemmaTuron can we check that the model works now?
The Docker build was failing, I've updated the workflow to build separately for arm64 and amd64 let's see!
The Docker build was failing, I've updated the workflow to build separately for arm64 and amd64 let's see!
Hello @Gemma, I have checked the log. Arm/64 failed because of sklearn.
#8 998.4 from sklearn.metrics import auc, mean_absolute_error, mean_squared_error, precision_recall_curve, r2_score,\
#8 998.4 ModuleNotFoundError: No module named 'sklearn'
I have seen the cause, which is an installation error for sckit-learn in arm64.
#8 833.5 × Encountered error while trying to install package.
#8 833.5 ╰─> scikit-learn
I will try fixing the installation of scikit-learn for future builds, just like the way torch has been resolved for eos9yui.
Please refactor this model to work with the latest structure of the template