aimat-lab / gcnn_keras

Graph convolutions in Keras with TensorFlow, PyTorch or Jax.
MIT License
110 stars 30 forks source link

issue with last version of the code #56

Closed thegodone closed 2 years ago

thegodone commented 2 years ago

I try to adapt old code to new version and I have this issue for none native dataset.

(tf) tgg@gvalmu00008 training % python AttFP.py config.cfg ERROR:root:Module 'kgcnn.utils.learning' is deprecated and will be removed in future versions. Please move to 'kgcnn.training'. WARNING:root:Module 'kgcnn.selection' will be removed in future versions in favour of 'kgcnn.hyper'. ERROR:root:Module 'kgcnn.utils.data' is deprecated and will be removed in future versions. Please move to 'kgcnn.data.utils'. ERROR:kgcnn.mol.convert:Can not import OpenBabel module for conversion. Load config file: config.cfg Architecture selected: GIN My parameters Loss RMSE LSTM 16 DENSE 16 PROBA 0 LR start 0.01 Metal device set to: Apple M1 Max

systemMemory: 64.00 GB maxCacheSize: 24.00 GB

2022-07-11 14:01:58.468538: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-07-11 14:01:58.468690: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) before True {'model': {'name': 'GIN', 'inputs': [{'shape': [None, 41], 'name': 'node_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 11], 'name': 'edge_attributes', 'dtype': 'float32', 'ragged': True}], 'input_embedding': {'node': {'input_dim': 96, 'output_dim': 100}}, 'last_mlp': {'use_bias': True, 'units': [200, 100, 1], 'activation': ['kgcnn>leaky_relu', 'kgcnn>selu', 'linear']}, 'depth': 4, 'dropout': 0.1, 'gin_args': {'units': [100, 100], 'use_bias': True, 'activation': ['relu', 'relu'], 'use_normalization': True, 'normalization_technique': 'batch'}, 'output_embedding': 'graph', 'output_mlp': {'activation': 'linear', 'units': 1}}, 'training': {'fit': {'batch_size': 32, 'epochs': 200, 'validation_freq': 1, 'verbose': 2, 'callbacks': []}, 'compile': {'optimizer': {'class_name': 'Adam', 'config': {'lr': {'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.001, 'decay_steps': 1600, 'decay_rate': 0.5, 'staircase': False}}}}, 'loss': 'mean_absolute_error'}, 'cross_validation': {'class_name': 'KFold', 'config': {'n_splits': 5, 'random_state': None, 'shuffle': True}}}, 'data': {'dataset': {'class_name': 'MoleculeNetDataset', 'config': {}, 'methods': [{'set_attributes': {}}]}, 'data_unit': 'unit'}, 'info': {'postfix': '', 'kgcnn_version': '2.0.3'}} WARNING:kgcnn.hyper.hyper:Hyperparameter {'model': ...} changed to {'model': {'config': {...}}} INFO:kgcnn.hyper.hyper:Adding model class to 'model': {'class_name': make_model} INFO:kgcnn.hyper.hyper:Adding 'postfix_file' to 'info' category in hyperparameter. INFO:kgcnn.hyper.hyper:Adding 'multi_target_indices' to 'training' category in hyperparameter. training INFO:kgcnn.data.MoleculeNetDataset:Generating molecules and store train.sdf to disk... [14:01:58] Explicit valence for atom # 6 N, 4, is greater than permitted WARNING:kgcnn.mol.convert:Failed conversion for smile C1C=CC=C2C3N(=CC=CC=3)CCN2=1 [14:01:58] Explicit valence for atom # 11 O, 3, is greater than permitted [14:01:58] Explicit valence for atom # 6 N, 4, is greater than permitted WARNING:kgcnn.mol.convert:Failed conversion for smile C1=CC(C2C=CN(C)=CC=2)=CC=N1C WARNING:kgcnn.mol.convert:Failed conversion for smile C12=CC=C(N(CC)CC)C=C2O=C2C=C(C=CC2=C1C1=CC=CC=C1C(O)=O)N(CC)CC INFO:kgcnn.data.MoleculeNetDataset: ... converted molecules 466 from 466 INFO:kgcnn.data.MoleculeNetDataset: ... read molecules 0 from 466 [14:02:00] CTAB version string invalid at line 4 [14:02:01] CTAB version string invalid at line 4 [14:02:01] CTAB version string invalid at line 4


WARNING:kgcnn.data.MoleculeNetDataset:Property node_attributes is not set on any graph. WARNING:kgcnn.data.MoleculeNetDataset:Can not clean property {'shape': [None, 41], 'name': 'node_attributes', 'dtype': 'float32', 'ragged': True} as it was not assigned to any graph. WARNING:kgcnn.data.MoleculeNetDataset:Property edge_attributes is not set on any graph. WARNING:kgcnn.data.MoleculeNetDataset:Can not clean property {'shape': [None, 11], 'name': 'edge_attributes', 'dtype': 'float32', 'ragged': True} as it was not assigned to any graph. INFO:kgcnn.data.MoleculeNetDataset:No invalid graphs for assigned properties found. dataset WARNING:kgcnn.data.MoleculeNetDataset:Property node_attributes is not set on any graph. Traceback (most recent call last): File "/Users/tgg/Github/tensorflow-m1-setup/gcnn_keras/training/AttFP.py", line 842, in Xdata = dataset.tensor(inputs) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/data/base.py", line 370, in tensor return [self._to_tensor(x, make_copy=make_copy) for x in items] File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/data/base.py", line 370, in return [self._to_tensor(x, make_copy=make_copy) for x in items] File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/data/base.py", line 362, in _to_tensor return ragged_tensor_from_nested_numpy(props) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/data/utils.py", line 156, in ragged_tensor_from_nested_numpy return tf.RaggedTensor.from_row_lengths(np.concatenate(numpy_list, axis=0), File "<__array_function__ internals>", line 180, in concatenate TypeError: dispatcher for __array_function__ did not return an iterable

PatReis commented 2 years ago

Hallo, sorry for the changes. I wanted to move to a more consistent serialization scheme that is reflected also in the hyperparameter and that makes all the kgcnn.selection code obsolete.

Have you installed openbabel? But even if not this should not be a problem. Have you used

from kgcnn.data.serial import deserialize as deserialize_dataset
dataset = deserialize_dataset(hyper["data"]["dataset"])

Or otherwise you would need to call the methods manually

dataset.set_attributes()

Or did the read molecules already raised an error?

PatReis commented 2 years ago

Otherwise just send me a mail with the code and I will update it or you can send me a link with the github repo and I will make pull request.

thegodone commented 2 years ago

it failed here:

image

I need to pass a file and columns interesting in to get the correct dataset

PatReis commented 2 years ago

okay then you need

dataset.read_in_memory(...)
dataset.set_attributes()
# For geometric models
dataset.set_methods(hyper["data"]["dataset"]["methods"])
thegodone commented 2 years ago

last bug

image
PatReis commented 2 years ago

Okay so I added a set_methods() function to the dataset:

dataset.read_in_memory(...)
dataset.set_attributes()
# For geometric models
dataset.set_methods(hyper["data"]["dataset"]["methods"])

and the GIN, I changed the parameters to better work with GIN edge variant like this

model_default_edge = {"name": "GIN",
                      "inputs": [{"shape": (None,), "name": "node_attributes", "dtype": "float32", "ragged": True},
                                 {"shape": (None,), "name": "edge_attributes", "dtype": "float32", "ragged": True},
                                 {"shape": (None, 2), "name": "edge_indices", "dtype": "int64", "ragged": True}],
                      "input_embedding": {"node": {"input_dim": 95, "output_dim": 64},
                                          "edge": {"input_dim": 5, "output_dim": 64}},
                      "gin_mlp": {"units": [64, 64], "use_bias": True, "activation": ["relu", "linear"],
                                  "use_normalization": True, "normalization_technique": "batch"},
                      "gin_args": {"epsilon_learnable": False},
                      "depth": 3, "dropout": 0.0, "verbose": 10,
                      "last_mlp": {"use_bias": [True, True, True], "units": [64, 64, 64],
                                   "activation": ["relu", "relu", "linear"]},
                      "output_embedding": 'graph', "output_to_tensor": True,
                      "output_mlp": {"use_bias": True, "units": 1,
                                     "activation": "softmax"}
                      }
thegodone commented 2 years ago

you means I need to pass edge anyway even for GIN code not GINE

PatReis commented 2 years ago

No you need to have for GINE:

                "inputs": [{"shape": [None, 41], "name": "node_attributes", "dtype": "float32", "ragged": True},
                           {"shape": [None, 11], "name": "edge_attributes", "dtype": "float32", "ragged": True},
                           {"shape": [None, 2], "name": "edge_indices", "dtype": "int64", "ragged": True}],

and normal GIN:

                "inputs": [{"shape": [None, 41], "name": "node_attributes", "dtype": "float32", "ragged": True},
                           {"shape": [None, 2], "name": "edge_indices", "dtype": "int64", "ragged": True}],
PatReis commented 2 years ago

You can swith by changing the make_function.

thegodone commented 2 years ago

maybe I don't know how to switch between the GIN / GINE model call 463 WARNING:kgcnn.utils.models:Model kwargs: Unknown key edge with value {'input_dim': 5, 'output_dim': 100} INFO:kgcnn.utils.models:Updated model kwargs: INFO:kgcnn.utils.models:{'name': 'GIN', 'inputs': [{'shape': [None, 41], 'name': 'node_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 11], 'name': 'edge_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 2], 'name': 'edge_indices', 'dtype': 'int64', 'ragged': True}], 'input_embedding': {'node': {'input_dim': 96, 'output_dim': 64}, 'edge': {'input_dim': 5, 'output_dim': 64}}, 'gin_mlp': {'units': [64, 64], 'use_bias': True, 'activation': ['relu', 'relu'], 'use_normalization': True, 'normalization_technique': 'batch'}, 'gin_args': {}, 'depth': 4, 'dropout': 0.05, 'verbose': 10, 'last_mlp': {'use_bias': True, 'units': [128, 64, 1], 'activation': ['kgcnn>leaky_relu', 'kgcnn>leaky_relu', 'linear']}, 'output_embedding': 'graph', 'output_to_tensor': True, 'output_mlp': {'use_bias': True, 'units': 1, 'activation': 'linear'}} Traceback (most recent call last): File "/Users/tgg/Github/tensorflow-m1-setup/gcnn_keras/training/AttFP.py", line 904, in model = make_model(*hyperparams['model']["config"]) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/utils/models.py", line 137, in update_wrapper return func(args, **updated_kwargs) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/literature/GIN.py", line 78, in make_model assert len(inputs) == 2 AssertionError

PatReis commented 2 years ago

just use make_model_edge instead of make_model

PatReis commented 2 years ago

Since it is essentially the same model I keep them in the same module, so same model name, but different make_functions. This is for functional API. One could also just define subclasses instead of functional model definitions. Then instead of make_model we could have ModelGIN that can be created the same way via

model = ModelGIN(**kwargs)  # Subclassed
model = make_model(**kwargs)  # Functional API

but we only have function API model definitions atm.

PatReis commented 2 years ago

So for the future I have this as model config planned:

        {"model": {
            "class_name": "make_model_edge",  # Or actual class definition in the future
            "module_name": "kgcnn.literature.GIN",
            "config": {
                "name": "GIN",
                "inputs": [{"shape": [None, 41], "name": "node_attributes", "dtype": "float32", "ragged": True},
                           {"shape": [None, 11], "name": "edge_attributes", "dtype": "float32", "ragged": True},
                           {"shape": [None, 2], "name": "edge_indices", "dtype": "int64", "ragged": True}],
                "input_embedding": {"node": {"input_dim": 96, "output_dim": 64}},
                "depth": 5,
                "dropout": 0.05,
                "gin_mlp": {"units": [64, 64], "use_bias": True, "activation": ["relu", "linear"],
                            "use_normalization": True, "normalization_technique": "batch"},
                "gin_args": {},
                "last_mlp": {"use_bias": True, "units": [64, 32, 1], "activation": ["relu", "relu", "linear"]},
                "output_embedding": "graph",
                "output_mlp": {"activation": "linear", "units": 1}
            }
        }
thegodone commented 2 years ago

ok looks good now thanks I will clean the other models

thegodone commented 2 years ago

can you add kgcnn>selu function please in the repo ?

thegodone commented 2 years ago

question if I need to change the output dim I need to change both last_mlp and output_mlp last unit dimension correct ?

thegodone commented 2 years ago

there is something wrong with TF mac M1

INFO:kgcnn.utils.models:Updated model kwargs: INFO:kgcnn.utils.models:{'name': 'GIN', 'inputs': [{'shape': [None, 41], 'name': 'node_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 11], 'name': 'edge_attributes', 'dtype': 'float32', 'ragged': True}, {'shape': [None, 2], 'name': 'edge_indices', 'dtype': 'int64', 'ragged': True}], 'input_embedding': {'node': {'input_dim': 96, 'output_dim': 64}, 'edge': {'input_dim': 5, 'output_dim': 64}}, 'gin_mlp': {'units': [64, 64], 'use_bias': True, 'activation': ['relu', 'relu'], 'use_normalization': True, 'normalization_technique': 'batch'}, 'gin_args': {'epsilon_learnable': False}, 'depth': 4, 'dropout': 0.05, 'verbose': 10, 'last_mlp': {'use_bias': True, 'units': [128, 64, 1], 'activation': ['kgcnn>leaky_relu', 'kgcnn>leaky_relu', 'linear']}, 'output_embedding': 'graph', 'output_to_tensor': True, 'output_mlp': {'use_bias': True, 'units': 1, 'activation': 'linear'}} Traceback (most recent call last): File "/Users/tgg/Github/tensorflow-m1-setup/gcnn_keras/training/AttFP.py", line 919, in model = make_model(hyperparams['model']["config"]) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/utils/models.py", line 137, in update_wrapper return func(*args, *updated_kwargs) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/kgcnn-2.0.3-py3.9.egg/kgcnn/literature/GIN.py", line 222, in make_model_edge model = ks.models.Model(inputs=[node_input, edge_index_input], outputs=out) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/training/tracking/base.py", line 587, in _method_wrapper result = method(self, args, kwargs) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/keras/engine/functional.py", line 148, in init self._init_graph_network(inputs, outputs) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/training/tracking/base.py", line 587, in _method_wrapper result = method(self, *args, **kwargs) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/keras/engine/functional.py", line 232, in _init_graph_network nodes, nodes_bydepth, layers, = _map_graph_network( File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/keras/engine/functional.py", line 998, in _map_graph_network raise ValueError( ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=RaggedTensorSpec(TensorShape([None, None, 11]), tf.float32, 1, tf.int64), name='edge_attributes', description="created by layer 'edge_attributes'") at layer "optional_input_embedding_1". The following previous layers were accessed without issue: []

PatReis commented 2 years ago

You mean other than tf.keras.activations.selu() ?

PatReis commented 2 years ago

You do not have to change both you can keep last mlp at let's say 32 and the have output mlp be the output dimension. This is just the last mlp before averaging the representations at each step. I think the original GIN does not have an output mlp. But I can add the option use_output_mlp=True/False if you want to leave it out. But yes out_mlp usually should always set the output dimension if applied.

PatReis commented 2 years ago

Yes sorry, there was a bug, copy paste error in GIN module. I updated kgcnn, can you try again with current git version?

thegodone commented 2 years ago

Yes normal selu but i see that you did leakyrelu as internal function

thegodone commented 2 years ago

I am testing M1 max tf gpu

PatReis commented 2 years ago

I do not fully understand do you mean a 'leakyselu', because you should always be able to simply set 'activation': 'selu' to use tensorflow's selu function.

thegodone commented 2 years ago

still not working ValueError: Model generation 'make_model' does not agree with hyperparameter 'make_model_edge' I will send you an email with the code

thegodone commented 2 years ago

email sent I put 3 examples cases with 3 differents errors ... I did not test the others models yet. python keras-gcn.py configGIN.cfg python keras-gcn.py configGINE.cfg python keras-gcn.py configGCN.cfg

thegodone commented 2 years ago

GINE + GAT + GATv2 working GIN not it's always asked for 3 inputs strangely... but it should not don't see the error GCN and DMPNN not working same bug on the cleaning process something went wrong PAiNN and HamNet not working again clean function error, I guess also I don't have computed correct features (3D generated)... (would be nice to make the diff on the Architecture features needed to be computed...)