Closed pnhuy closed 1 month ago
Hi,
Thank you for your question. V0 is simply a list of atomic symbols in the compound. For example, for CH4, V0 would be: C,H,H,H,H. You can find examples at this link. V1 represents unsupervised classified subtypes. Examples can be found here. V2 is presented as an example. We can increase the number of subtypes without relation to the oxidation states, which increases the number of model parameters. This will be described in further works. If RDKit cannot generate atomic symbols, it's better to use Open Babel or any other tools.
Hi @2shakir , Thank you very much, you answer makes sense to me.
Dear team,
Thank you very much for your contribution.
When run your notebooks, I wonder how to generate the types (v0, v1, v2) in your dataset?
I saw the function
getRawInputs
which use ase Atoms data structure. But I don't know how to generatease
Atoms?Some solutions I found suggest to generate the conformation using rdkit EmbedMolecule:
Smiles --[rdkit]--> Mol --[EmbedMolecule]--> Conformation --[ase]--> Atoms --> Types
But I failed in many cases because rdkit was unable to generate the conformation? e.g: the smiles
CCOC(=O)CCC(C)=O
intoxic_nr-aromatase_ds.csv
.Could you please explain more on types generation?
Thank you very much!