Closed Tigerrr07 closed 1 year ago
Hi @Tigerrr07 I'll check this out. Could you share your drug_config
? :grin:
Hi @Tigerrr07 I'll check this out. Could you share your
drug_config
? 😁
Yeah, here is my drug_config
:
drug_configs = {
"node_metadata_functions": [gm.atom_type_one_hot,
gm.formal_charge,
gm.hybridization,
gm.is_aromatic,
gm.degree,
gm.total_num_h,
],
"edge_metadata_functions": [gm.add_bond_type,
gm.bond_is_aromatic,
gm.bond_is_in_ring,
gm.bond_is_conjugated,
gm.bond_stereo
]
}
Hi @Tigerrr07
It looks like the behaviour for verbose="all_info"
defaults to being protein-specific.
You can try instead with:
import graphein.molecule as gm
from graphein.ml import GraphFormatConvertor
drug_configs = {
"node_metadata_functions": [gm.atom_type_one_hot,
gm.formal_charge,
gm.hybridization,
gm.is_aromatic,
gm.degree,
gm.total_num_h,
],
"edge_metadata_functions": [gm.add_bond_type,
gm.bond_is_aromatic,
gm.bond_is_in_ring,
gm.bond_is_conjugated,
gm.bond_stereo
]
}
config = gm.MoleculeGraphConfig(**drug_configs)
node_columns = ['atomic_num', 'element', 'rdmol_atom', 'coords', 'atom_type_one_hot', 'formal_charge', 'hybridization', 'is_aromatic', 'degree', 'total_num_h']
graph = gm.construct_graph(smiles="CC(=O)OC1=CC=CC=C1C(=O)O", config=config)
drug_format_convertor = GraphFormatConvertor('nx', 'pyg', columns = node_columns)
p = drug_format_convertor(graph)
print(p)
Which outputs:
Data(node_id=[13], atomic_num=[13], element=[13], rdmol_atom=[13], coords=[13], atom_type_one_hot=[13, 11], formal_charge=[13], hybridization=[13], is_aromatic=[13], degree=[13], total_num_h=[13], num_nodes=13)
Thank you! It works. I also want to know the range of discrete features, like degree and total_num_h, so I can make one-hot feature for them.
Hmm. How big is your dataset?
If you can fit it in memory you can do:
graphs = [graph, graph, graph]
max_num_h = 0
for g in graphs:
for n, d in g.nodes(data=True):
max_num_h = max(max_num_h, d['total_num_h'])
print(max_num_h)
or
import torch
from torch_geometric.data import Batch
b = Batch.from_data_list([p, p, p])
torch.max(b.total_num_h)
If it won't fit in memory you can just run this with a buffer. Otherwise, you could set a sane max & clip higher values. For degree, for instance, if you're working with small organic molecules (and using only bonds as edges) it's very unlikely that you'll see a degree > 4.
Thank you for that! my dataset won't be too large.
Is your feature request related to a problem? Please describe. I constructed a molecule graph by graphein.molecule module like the below, but I can't convert it to a PyG graph using GraphFormatConvertor.
Describe the solution you'd like I want a similar way like protein to convert a molecule graph to a PyG graph.
Additional context In Document, it should specify the length of every feature. Like degree, Degree: the degree (0-5) of this atom.