Closed CalmScout closed 6 months ago
Dear @CalmScout ,
SchNetPack does not support Smiles inputs, because it is designed for 3D structures. You could utilize for example rdkit to get 3D embeddings from smiles and use that as an input for SchNetPack models.
It could look like this:
from ase import Atoms
from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np
smiles = "CC(=O)NC1=CC=C(C=C1)O" # paracetamol
# load data
rdkmol = Chem.MolFromSmiles(smiles)
rdkmol = Chem.AddHs(rdkmol)
AllChem.EmbedMolecule(rdkmol, randomSeed=42)
AllChem.MMFFOptimizeMolecule(rdkmol)
# get positions
n_nodes = rdkmol.GetNumAtoms()
pos = []
for i in range(n_nodes):
conformer_pos = rdkmol.GetConformer().GetAtomPosition(i)
pos.append([conformer_pos.x, conformer_pos.y, conformer_pos.z])
# get atomic numbers
atomic_numbers = []
for atom in rdkmol.GetAtoms():
atomic_numbers.append(atom.GetAtomicNum())
at = Atoms(positions=np.array(pos), numbers=np.array(atomic_numbers))
Best, Jonas
Dear @jnsLs ,
Thank you for a such prompt reply with a code snippet. Indeed, I used RDKit to generate 3D structure from SMILES, but my solution was not optimal because it was saving .xyz
files:
from rdkit import Chem
from rdkit.Chem import AllChem
from pathlib import Path
def generate_xyz_from_smiles(smi: str, dir_to: Path) -> None:
# check if file already exists
if (dir_to / f"{smi}.xyz").exists():
return
mol = Chem.MolFromSmiles(smi)
mol_add_h = Chem.AddHs(mol)
# embed molecule in 3D
AllChem.EmbedMolecule(mol_add_h)
# convert molecule to xyz and save to file
Chem.MolToXYZFile(mol_add_h, str(dir_to / f"{smi}.xyz"))
def read_xyz(path: Path) -> tuple[int, list[int], np.ndarray]:
"""
Read xyz file and return atomic numbers and corresponding positions.
"""
with path_xyz.open() as f:
lines = f.readlines()
N = int(lines[0])
lines = lines[2:]
positions = np.array([[float(_) for _ in line.split()[1:4]] for line in lines])
symbols = [line.split()[0] for line in lines]
atomic_numbers = [Chem.GetPeriodicTable().GetAtomicNumber(symbol) for symbol in symbols]
return N, atomic_numbers, positions
Please let me know if I understand correctly that for the creation of ase.Atoms
object we need only 3D coordinates of atoms in the molecule and corresponding atomic numbers as in the example below:
ats = Atoms(positions=positions, numbers=numbers)
If so, the properties like in the example from tutorial notebook:
properties = {'energy': energies, 'forces': forces}
can be viewed as labels to predict? Or, for example, energies
can be used during the training of SchNet as well?
Best regards, Anton.
Hi Anton,
thats right, you can define an ase.Atoms
object by solely providing atom positions and atomic numbers.
Properties such as energy and forces are labels (targets) in your dataset. Each structure (defined by the ase.Atoms
object) has a corresponding target value for each property. When you train a model in SchNetPack you train on those targets. I.e., the objective of the training is to minimize the deviation between properties predicted by the model (e.g. SchNet) and the respective traget values in your dataset (energies, forces, ...).
Best, Jonas
Hi Jonas,
Thank you for such a great support of SchNetPack
package and great explanations!
Best regards, Anton.
Dear maintainers of schnetpack,
I am exploring the official tutorial notebook tutorial_01_preparing_data.ipynb
Provided example with
md17_uracil.npz
data containspositions
,energies
,forces
andnumbers
entries that are required for the construction ofase.atoms.Atoms
objects stored inatoms_list
:May I ask if you have any functions for initialization of the SchNetPack with only SMILES molecular representations?
Best regards, Anton.