Closed thegodone closed 2 years ago
Hallo,
sorry for the changes. I wanted to move to a more consistent serialization scheme that is reflected also in the hyperparameter and that makes all the kgcnn.selection
code obsolete.
Have you installed openbabel? But even if not this should not be a problem. Have you used
from kgcnn.data.serial import deserialize as deserialize_dataset
dataset = deserialize_dataset(hyper["data"]["dataset"])
Or otherwise you would need to call the methods manually
dataset.set_attributes()
Or did the read molecules already raised an error?
Otherwise just send me a mail with the code and I will update it or you can send me a link with the github repo and I will make pull request.
it failed here:
I need to pass a file and columns interesting in to get the correct dataset
okay then you need
dataset.read_in_memory(...)
dataset.set_attributes()
# For geometric models
dataset.set_methods(hyper["data"]["dataset"]["methods"])
last bug
Okay so I added a set_methods()
function to the dataset:
dataset.read_in_memory(...)
dataset.set_attributes()
# For geometric models
dataset.set_methods(hyper["data"]["dataset"]["methods"])
and the GIN, I changed the parameters to better work with GIN edge variant like this
model_default_edge = {"name": "GIN",
"inputs": [{"shape": (None,), "name": "node_attributes", "dtype": "float32", "ragged": True},
{"shape": (None,), "name": "edge_attributes", "dtype": "float32", "ragged": True},
{"shape": (None, 2), "name": "edge_indices", "dtype": "int64", "ragged": True}],
"input_embedding": {"node": {"input_dim": 95, "output_dim": 64},
"edge": {"input_dim": 5, "output_dim": 64}},
"gin_mlp": {"units": [64, 64], "use_bias": True, "activation": ["relu", "linear"],
"use_normalization": True, "normalization_technique": "batch"},
"gin_args": {"epsilon_learnable": False},
"depth": 3, "dropout": 0.0, "verbose": 10,
"last_mlp": {"use_bias": [True, True, True], "units": [64, 64, 64],
"activation": ["relu", "relu", "linear"]},
"output_embedding": 'graph', "output_to_tensor": True,
"output_mlp": {"use_bias": True, "units": 1,
"activation": "softmax"}
}
you means I need to pass edge anyway even for GIN code not GINE
No you need to have for GINE:
"inputs": [{"shape": [None, 41], "name": "node_attributes", "dtype": "float32", "ragged": True},
{"shape": [None, 11], "name": "edge_attributes", "dtype": "float32", "ragged": True},
{"shape": [None, 2], "name": "edge_indices", "dtype": "int64", "ragged": True}],
and normal GIN:
"inputs": [{"shape": [None, 41], "name": "node_attributes", "dtype": "float32", "ragged": True},
{"shape": [None, 2], "name": "edge_indices", "dtype": "int64", "ragged": True}],
You can swith by changing the make_function
.
maybe I don't know how to switch between the GIN / GINE model call
463
WARNING:kgcnn.utils.models:Model kwargs: Unknown key edge with value {'input_dim': 5, 'output_dim': 100}
INFO:kgcnn.utils.models:Updated model kwargs:
INFO:kgcnn.utils.models:{'name': 'GIN', 'inputs': [{'shape': [None, 41], 'name': 'node_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 11], 'name': 'edge_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 2], 'name': 'edge_indices', 'dtype': 'int64', 'ragged': True}], 'input_embedding': {'node': {'input_dim': 96, 'output_dim': 64}, 'edge': {'input_dim': 5, 'output_dim': 64}}, 'gin_mlp': {'units': [64, 64], 'use_bias': True, 'activation': ['relu', 'relu'], 'use_normalization': True, 'normalization_technique': 'batch'}, 'gin_args': {}, 'depth': 4, 'dropout': 0.05, 'verbose': 10, 'last_mlp': {'use_bias': True, 'units': [128, 64, 1], 'activation': ['kgcnn>leaky_relu', 'kgcnn>leaky_relu', 'linear']}, 'output_embedding': 'graph', 'output_to_tensor': True, 'output_mlp': {'use_bias': True, 'units': 1, 'activation': 'linear'}}
Traceback (most recent call last):
File "/Users/tgg/Github/tensorflow-m1-setup/gcnn_keras/training/AttFP.py", line 904, in
just use make_model_edge
instead of make_model
Since it is essentially the same model I keep them in the same module, so same model name, but different make_functions.
This is for functional API. One could also just define subclasses instead of functional model definitions.
Then instead of make_model
we could have ModelGIN
that can be created the same way via
model = ModelGIN(**kwargs) # Subclassed
model = make_model(**kwargs) # Functional API
but we only have function API model definitions atm.
So for the future I have this as model config planned:
{"model": {
"class_name": "make_model_edge", # Or actual class definition in the future
"module_name": "kgcnn.literature.GIN",
"config": {
"name": "GIN",
"inputs": [{"shape": [None, 41], "name": "node_attributes", "dtype": "float32", "ragged": True},
{"shape": [None, 11], "name": "edge_attributes", "dtype": "float32", "ragged": True},
{"shape": [None, 2], "name": "edge_indices", "dtype": "int64", "ragged": True}],
"input_embedding": {"node": {"input_dim": 96, "output_dim": 64}},
"depth": 5,
"dropout": 0.05,
"gin_mlp": {"units": [64, 64], "use_bias": True, "activation": ["relu", "linear"],
"use_normalization": True, "normalization_technique": "batch"},
"gin_args": {},
"last_mlp": {"use_bias": True, "units": [64, 32, 1], "activation": ["relu", "relu", "linear"]},
"output_embedding": "graph",
"output_mlp": {"activation": "linear", "units": 1}
}
}
ok looks good now thanks I will clean the other models
can you add kgcnn>selu function please in the repo ?
question if I need to change the output dim I need to change both last_mlp and output_mlp last unit dimension correct ?
there is something wrong with TF mac M1
INFO:kgcnn.utils.models:Updated model kwargs:
INFO:kgcnn.utils.models:{'name': 'GIN', 'inputs': [{'shape': [None, 41], 'name': 'node_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 11], 'name': 'edge_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 2], 'name': 'edge_indices', 'dtype': 'int64', 'ragged': True}], 'input_embedding': {'node': {'input_dim': 96, 'output_dim': 64}, 'edge': {'input_dim': 5, 'output_dim': 64}}, 'gin_mlp': {'units': [64, 64], 'use_bias': True, 'activation': ['relu', 'relu'], 'use_normalization': True, 'normalization_technique': 'batch'}, 'gin_args': {'epsilon_learnable': False}, 'depth': 4, 'dropout': 0.05, 'verbose': 10, 'last_mlp': {'use_bias': True, 'units': [128, 64, 1], 'activation': ['kgcnn>leaky_relu', 'kgcnn>leaky_relu', 'linear']}, 'output_embedding': 'graph', 'output_to_tensor': True, 'output_mlp': {'use_bias': True, 'units': 1, 'activation': 'linear'}}
Traceback (most recent call last):
File "/Users/tgg/Github/tensorflow-m1-setup/gcnn_keras/training/AttFP.py", line 919, in
You mean other than tf.keras.activations.selu() ?
You do not have to change both you can keep last mlp at let's say 32 and the have output mlp be the output dimension.
This is just the last mlp before averaging the representations at each step. I think the original GIN does not have an output mlp. But I can add the option use_output_mlp=True/False
if you want to leave it out.
But yes out_mlp usually should always set the output dimension if applied.
Yes sorry, there was a bug, copy paste error in GIN module. I updated kgcnn, can you try again with current git version?
Yes normal selu but i see that you did leakyrelu as internal function
I am testing M1 max tf gpu
I do not fully understand do you mean a 'leakyselu', because you should always be able to simply set 'activation': 'selu'
to use tensorflow's selu function.
still not working ValueError: Model generation 'make_model' does not agree with hyperparameter 'make_model_edge' I will send you an email with the code
email sent I put 3 examples cases with 3 differents errors ... I did not test the others models yet. python keras-gcn.py configGIN.cfg python keras-gcn.py configGINE.cfg python keras-gcn.py configGCN.cfg
GINE + GAT + GATv2 working GIN not it's always asked for 3 inputs strangely... but it should not don't see the error GCN and DMPNN not working same bug on the cleaning process something went wrong PAiNN and HamNet not working again clean function error, I guess also I don't have computed correct features (3D generated)... (would be nice to make the diff on the Architecture features needed to be computed...)
I try to adapt old code to new version and I have this issue for none native dataset.
(tf) tgg@gvalmu00008 training % python AttFP.py config.cfg ERROR:root:Module 'kgcnn.utils.learning' is deprecated and will be removed in future versions. Please move to 'kgcnn.training'. WARNING:root:Module 'kgcnn.selection' will be removed in future versions in favour of 'kgcnn.hyper'. ERROR:root:Module 'kgcnn.utils.data' is deprecated and will be removed in future versions. Please move to 'kgcnn.data.utils'. ERROR:kgcnn.mol.convert:Can not import OpenBabel module for conversion. Load config file: config.cfg Architecture selected: GIN My parameters Loss RMSE LSTM 16 DENSE 16 PROBA 0 LR start 0.01 Metal device set to: Apple M1 Max
systemMemory: 64.00 GB maxCacheSize: 24.00 GB
2022-07-11 14:01:58.468538: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-07-11 14:01:58.468690: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id:)
before True
{'model': {'name': 'GIN', 'inputs': [{'shape': [None, 41], 'name': 'node_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 11], 'name': 'edge_attributes', 'dtype': 'float32', 'ragged': True}], 'input_embedding': {'node': {'input_dim': 96, 'output_dim': 100}}, 'last_mlp': {'use_bias': True, 'units': [200, 100, 1], 'activation': ['kgcnn>leaky_relu', 'kgcnn>selu', 'linear']}, 'depth': 4, 'dropout': 0.1, 'gin_args': {'units': [100, 100], 'use_bias': True, 'activation': ['relu', 'relu'], 'use_normalization': True, 'normalization_technique': 'batch'}, 'output_embedding': 'graph', 'output_mlp': {'activation': 'linear', 'units': 1}}, 'training': {'fit': {'batch_size': 32, 'epochs': 200, 'validation_freq': 1, 'verbose': 2, 'callbacks': []}, 'compile': {'optimizer': {'class_name': 'Adam', 'config': {'lr': {'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.001, 'decay_steps': 1600, 'decay_rate': 0.5, 'staircase': False}}}}, 'loss': 'mean_absolute_error'}, 'cross_validation': {'class_name': 'KFold', 'config': {'n_splits': 5, 'random_state': None, 'shuffle': True}}}, 'data': {'dataset': {'class_name': 'MoleculeNetDataset', 'config': {}, 'methods': [{'set_attributes': {}}]}, 'data_unit': 'unit'}, 'info': {'postfix': '', 'kgcnn_version': '2.0.3'}}
WARNING:kgcnn.hyper.hyper:Hyperparameter {'model': ...} changed to {'model': {'config': {...}}}
INFO:kgcnn.hyper.hyper:Adding model class to 'model': {'class_name': make_model}
INFO:kgcnn.hyper.hyper:Adding 'postfix_file' to 'info' category in hyperparameter.
INFO:kgcnn.hyper.hyper:Adding 'multi_target_indices' to 'training' category in hyperparameter.
training
INFO:kgcnn.data.MoleculeNetDataset:Generating molecules and store train.sdf to disk...
[14:01:58] Explicit valence for atom # 6 N, 4, is greater than permitted
WARNING:kgcnn.mol.convert:Failed conversion for smile C1C=CC=C2C3N(=CC=CC=3)CCN2=1
[14:01:58] Explicit valence for atom # 11 O, 3, is greater than permitted
[14:01:58] Explicit valence for atom # 6 N, 4, is greater than permitted
WARNING:kgcnn.mol.convert:Failed conversion for smile C1=CC(C2C=CN(C)=CC=2)=CC=N1C
WARNING:kgcnn.mol.convert:Failed conversion for smile C12=CC=C(N(CC)CC)C=C2O=C2C=C(C=CC2=C1C1=CC=CC=C1C(O)=O)N(CC)CC
INFO:kgcnn.data.MoleculeNetDataset: ... converted molecules 466 from 466
INFO:kgcnn.data.MoleculeNetDataset: ... read molecules 0 from 466
[14:02:00] CTAB version string invalid at line 4
[14:02:01] CTAB version string invalid at line 4
[14:02:01] CTAB version string invalid at line 4
WARNING:kgcnn.data.MoleculeNetDataset:Property node_attributes is not set on any graph. WARNING:kgcnn.data.MoleculeNetDataset:Can not clean property {'shape': [None, 41], 'name': 'node_attributes', 'dtype': 'float32', 'ragged': True} as it was not assigned to any graph. WARNING:kgcnn.data.MoleculeNetDataset:Property edge_attributes is not set on any graph. WARNING:kgcnn.data.MoleculeNetDataset:Can not clean property {'shape': [None, 11], 'name': 'edge_attributes', 'dtype': 'float32', 'ragged': True} as it was not assigned to any graph. INFO:kgcnn.data.MoleculeNetDataset:No invalid graphs for assigned properties found. dataset WARNING:kgcnn.data.MoleculeNetDataset:Property node_attributes is not set on any graph. Traceback (most recent call last): File "/Users/tgg/Github/tensorflow-m1-setup/gcnn_keras/training/AttFP.py", line 842, in
Xdata = dataset.tensor(inputs)
File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/data/base.py", line 370, in tensor
return [self._to_tensor(x, make_copy=make_copy) for x in items]
File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/data/base.py", line 370, in
return [self._to_tensor(x, make_copy=make_copy) for x in items]
File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/data/base.py", line 362, in _to_tensor
return ragged_tensor_from_nested_numpy(props)
File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/data/utils.py", line 156, in ragged_tensor_from_nested_numpy
return tf.RaggedTensor.from_row_lengths(np.concatenate(numpy_list, axis=0),
File "<__array_function__ internals>", line 180, in concatenate
TypeError: dispatcher for __array_function__ did not return an iterable