ersilia-os / eos2401

GNU General Public License v3.0
0 stars 0 forks source link

New model ready for testing! #2

Open github-actions[bot] opened 8 months ago

github-actions[bot] commented 8 months ago

This model is ready for testing. If you are assigned to this issue, please try it out using the CLI, Google Colab and DockerHub and let us know if it works!

GemmaTuron commented 8 months ago

Hi @Inyrkz

I've tested the model but the output in the -csv file is not properly processed. It put a list of molecules in a single cell, instead of having one column per output molecule. You need to modify that in Main.py, and add a dynamic header (smi1, smi2... for example). You can get inspiration from any other model that uses that. I recall we discussed this in one of our meetings. --> also check that you did format this correctly in eos8bhe

GemmaTuron commented 7 months ago

This model is still not reformatted to provide the expected output

Inyrkz commented 7 months ago

Hi @GemmaTuron,

I did make the changes as I did in eos8bhe. We had a problem with the changes. It failed when I tested it with Ersilia, but it worked when running the main.py file and the bash script on Codespace.

The error we had was a NoneType error

@miquelduranfrigola said he would take a look at it. I forgot to remind him since he was at a conference then.

GemmaTuron commented 7 months ago

Hi @Inyrkz

Thansk for the clarification, I think the error is just due to how you are parsing the header, I'm fixing this. One question, does the model only produce up to 100 molecules (10 trials, 10 samples per trial) per molecule? because then it does not make sense to go as high as 1000 in the number of columns

Inyrkz commented 7 months ago

Alright, I hope the header parsing works. Please I'd like to see how you parse the header when you are done.

The number of molecules generated also depends on the number of core structures extracted from a molecule. If we have 3 core structures extracted, 10 trials and 10 samples per trial, we will have a maximum of 300 new molecules. If there are 5 core structures, then 500 will be the maximum for that molecule.

We don't have to go as high as 1000. I think the 1000 was based on the model I copied the code from. Maybe that use case needed exactly 1000 molecules.

GemmaTuron commented 7 months ago

That is what I was pointing at discussing. As it is now, we are not sampling from several cores, imagine a molecule has more than 10 cores, we should keep a representation of each of the cores to a maximmum of 1000 molecules. At the moment this is not implemented - it will be very unlikely that we get such high numbers but could happen

Inyrkz commented 7 months ago

Oh okay. I understand.

If there are more than 10 cores, we should still get a sample from all the cores, rather than cutting it off at the 1000th molecule.

miquelduranfrigola commented 7 months ago

Thanks both - do you need my feedback at this stage? Sorry @Inyrkz I was indeed at a conference and lost track of it.