Open GemmaTuron opened 1 month ago
oh no @miquelduranfrigola
The model is only having this behaviour when run inside ZairaChem. Using the Ersilia installed in the same environment but as a standalone:
18:10:00 | DEBUG | Data: outcome
18:10:00 | DEBUG | Values: [26.605158585931452, 21.16339214179242, 3.0, 1.0, 42.87062313866525, 2.572597677509152, 5.1445266340
18:10:00 | DEBUG | Getting pure dtype for outcome
18:10:00 | DEBUG | This is the pure datatype: numeric_array
18:10:00 | DEBUG | Datatype: numeric_array
18:10:00 | DEBUG | Datatype has been matched: numeric_array over {'mixed_array', 'array', 'numeric_array', 'string_array'}
18:10:00 | DEBUG | No merge key
18:10:00 | DEBUG | [26.605158585931452, 21.16339214179242, 3.0, 1.0, 42.87062313866525, 2.572597677509152, 5.1445266340153015, ...]
18:10:00 | DEBUG | numeric_array
18:10:00 | DEBUG | outcome
Ersilia is v0.1.34 and cannot be upgraded btw
Could it be because we are using the Ersilia API instead of the CLI in ZairaChem? I'll test it
FYI @DhanshreeA and @Abellegese this issue does not happen when using the Ersilia Python API as a standalone (outside ZairaChem). I really do not understand what is going on, but it seems this was already reported and fixed, can you confirm @DhanshreeA what was the issue and how was it fixed?
from ersilia import ErsiliaModel
em = ErsiliaModel("eos78ao")
em.api(input="test.csv", output="out.csv")
Hi @GemmaTuron I will take and inform you.
@miquelduranfrigola quick quetion as well. Do you think it is because the datatype is identified as Null that the Mordred descriptors are not being processed, or the reason is another? The raw.h5 file IS created, but then the pipeline breaks.
I would not link one issue to the other:
@DhanshreeA please confirm the None error is fine in other versions of Ersilia @Abellegese just add the python api tests as we discussed but do not lose too mich time in investigating this
@miquelduranfrigola you and I should do a deep dive in ZairaChem soon and fix those issues.
Thanks @GemmaTuron, @DhanshreeA and @Abellegese
My immediate reaction would be that we work on making ZairaChem compatible with the latest Ersilia version (if it isn't, yet), so we can at least reflect the changes we make in Ersilia in ZairaChem.
Also, the None
issue with the metadata was usually just fine and it was resolved dynamically by inspecting the data (generally, not sure about this case in particular). To me, what is happening is that Mordred is giving too many NaN
values and then the data type resolver fails. On a possibly related note, I have noticed that Mordred tends to give more NaN
values with the latest Numpy versions, which is absolutely critical. So, can you confirm which is the Numpy version that is being used to run Mordred? That is, in the eos78ao
conda environment.
Hi @miquelduranfrigola and @DhanshreeA
Quite urgent. The Mordred descriptor is used in ZairaChem, but when I run the pipeline, it fails to use them. I think the error is somewhere on the model metadata, as this is the output I am getting. It does calculate the descriptor metadata but then the outcome is None - is the issue in the
service.py
file, and could you have a quick look?FYI I am using the Docker version of the model