Open github-actions[bot] opened 8 months ago
Hi @Inyrkz
I've tested the model but the output in the -csv file is not properly processed. It put a list of molecules in a single cell, instead of having one column per output molecule. You need to modify that in Main.py, and add a dynamic header (smi1, smi2... for example). You can get inspiration from any other model that uses that. I recall we discussed this in one of our meetings. --> also check that you did format this correctly in eos8bhe
This model is still not reformatted to provide the expected output
Hi @GemmaTuron,
I did make the changes as I did in eos8bhe. We had a problem with the changes. It failed when I tested it with Ersilia, but it worked when running the main.py
file and the bash script on Codespace.
The error we had was a NoneType
error
@miquelduranfrigola said he would take a look at it. I forgot to remind him since he was at a conference then.
Hi @Inyrkz
Thansk for the clarification, I think the error is just due to how you are parsing the header, I'm fixing this. One question, does the model only produce up to 100 molecules (10 trials, 10 samples per trial) per molecule? because then it does not make sense to go as high as 1000 in the number of columns
Alright, I hope the header parsing works. Please I'd like to see how you parse the header when you are done.
The number of molecules generated also depends on the number of core structures extracted from a molecule. If we have 3 core structures extracted, 10 trials and 10 samples per trial, we will have a maximum of 300 new molecules. If there are 5 core structures, then 500 will be the maximum for that molecule.
We don't have to go as high as 1000. I think the 1000 was based on the model I copied the code from. Maybe that use case needed exactly 1000 molecules.
That is what I was pointing at discussing. As it is now, we are not sampling from several cores, imagine a molecule has more than 10 cores, we should keep a representation of each of the cores to a maximmum of 1000 molecules. At the moment this is not implemented - it will be very unlikely that we get such high numbers but could happen
Oh okay. I understand.
If there are more than 10 cores, we should still get a sample from all the cores, rather than cutting it off at the 1000th molecule.
Thanks both - do you need my feedback at this stage? Sorry @Inyrkz I was indeed at a conference and lost track of it.
This model is ready for testing. If you are assigned to this issue, please try it out using the CLI, Google Colab and DockerHub and let us know if it works!