aimat-lab / gcnn_keras

Graph convolutions in Keras with TensorFlow, PyTorch or Jax.
MIT License
107 stars 29 forks source link

Energy force training issue #123

Closed Tacitus523 closed 3 months ago

Tacitus523 commented 3 months ago

I am trying to train an Energy-Force model using a Schnet model using kgcnn_version 4.0.2, tensorflow 2.16.1 and keras 3.3.3. running the training/train_force.py script.

While the build of the mode works (despite unknown output shape, presumably due to ragged outputs), the fit of the model fails at the metrics stage due to the handling of ragged tensors.

INFO:kgcnn.data.ThiolDisulfidExchange:Reading structures from SDF file.
INFO:kgcnn.data.ThiolDisulfidExchange: ... process molecules 0 from 13548
INFO:kgcnn.data.ThiolDisulfidExchange: ... process molecules 5000 from 13548
INFO:kgcnn.data.ThiolDisulfidExchange: ... process molecules 10000 from 13548
INFO:kgcnn.data.ThiolDisulfidExchange:No invalid graphs for assigned properties found.
INFO:kgcnn.data.ThiolDisulfidExchange:Labels 'None' in 'None' have shape '(13548, 1)'.
WARNING:kgcnn.models.utils:Model kwargs: Overwriting dictionary of output_mlp with None
INFO:kgcnn.models.utils:Updated model kwargs: '{...}'.
INFO:kgcnn.training.hyper:Deserialized compile kwargs ...'loss_weights': {'energy': 0.05, 'force': 0.95}}'.
WARNING:kgcnn.training.scheduler:`steps_per_epoch` is not set. Can't increase lr during epochs of warmup.
Traceback (most recent call last):
  File "/home/ka/ka_ipc/ka_he8978/kgcnn/training/train_force.py", line 198, in <module>
    hist = model.fit(
           ^^^^^^^^^^
  File "/home/ka/ka_ipc/ka_he8978/miniconda3/envs/kgcnn_original/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/ka/ka_ipc/ka_he8978/kgcnn/training/../kgcnn/metrics/metrics.py", line 35, in update_state
    y_pred = decompose_ragged_tensor(y_pred)[0]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ka/ka_ipc/ka_he8978/kgcnn/training/../kgcnn/ops/core.py", line 77, in decompose_ragged_tensor
    return kgcnn_backend.decompose_ragged_tensor(x)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ka/ka_ipc/ka_he8978/kgcnn/training/../kgcnn/backend/_tensorflow.py", line 56, in decompose_ragged_tensor
    row_ids = tf.cast(x.value_rowids(), dtype=batch_dtype)
                      ^^^^^^^^^^^^^^
AttributeError: 'SymbolicTensor' object has no attribute 'value_rowids'

I used previous versions of kgcnn, but I haven't used this most recent version and also not the train_force script with hyperparameters.

Both the train_node.py and the train_graph.py scripts work, so I am unsure about whether the error lies in my data files or in the ragged output of the force.

PatReis commented 3 months ago

This is strange. I may have to check with latest keras version. Can you show me your hyperparameter?

Tacitus523 commented 3 months ago

Inherited from a kgcnn 4.0.0 user working with HAT reactions. Mainly changed pathes and updated the activation function input style from 'kgcnn>shifted_softplus' to {"class_name": "function", "config": "kgcnn>shifted_softplus"}.

Hadn't seen the {"rename_property_on_graphs": {"old_property_name": "forces_conv.xyz", "new_property_name": "force"}} motive yet. I chose a more manual way of giving force data to the MemoryGraphDataset before.

Just noticed, that I didn't adjust the version in the info section, but that's probably a minor oversight.

hyper.py ```python hyper = { "Schnet.EnergyForceModel": { "model": { "class_name": "EnergyForceModel", "module_name": "kgcnn.models.force", "config": { "name": "Schnet", "nested_model_config": True, "output_to_tensor": False, "output_squeeze_states": True, "coordinate_input": 1, "inputs": [ {"shape": [None], "name": "atomic_number", "dtype": "int32"}, {"shape": [None, 3], "name": "node_coordinates", "dtype": "float32"}, {"shape": [None, 2], "name": "range_indices", "dtype": "int64"}, {"shape": (), "name": "total_nodes", "dtype": "int64"}, {"shape": (), "name": "total_ranges", "dtype": "int64"} ], "model_energy": { "class_name": "make_model", "module_name": "kgcnn.literature.Schnet", "config": { "name": "SchnetEnergy", "inputs": [ {"shape": [None], "name": "atomic_number", "dtype": "int32"}, {"shape": [None, 3], "name": "node_coordinates", "dtype": "float32"}, {"shape": [None, 2], "name": "range_indices", "dtype": "int64"}, {"shape": (), "name": "total_nodes", "dtype": "int64"}, {"shape": (), "name": "total_ranges", "dtype": "int64"} ], "cast_disjoint_kwargs": {"padded_disjoint": False}, "input_node_embedding": {"input_dim": 95, "output_dim": 128}, "last_mlp": {"use_bias": [True, True, True], "units": [128, 64, 1], "activation": [ {"class_name": "function", "config": "kgcnn>shifted_softplus"}, {"class_name": "function", "config": "kgcnn>shifted_softplus"}, 'linear']}, "interaction_args": { "units": 128, "use_bias": True, "activation": {"class_name": "function", "config": "kgcnn>shifted_softplus"}, "cfconv_pool": "scatter_sum" }, "node_pooling_args": {"pooling_method": "scatter_sum"}, "depth": 6, "gauss_args": {"bins": 25, "distance": 5, "offset": 0.0, "sigma": 0.4}, "verbose": 10, "output_embedding": "graph", "use_output_mlp": False, "output_mlp": None, } }, "outputs": {"energy": {"name": "energy", "shape": (1,), "ragged": False}, "force": {"name": "force", "shape": (None, 3), "ragged": True}} } }, "training": { "cross_validation": {"class_name": "KFold", "config": {"n_splits": 5, "random_state": 42, "shuffle": True}}, "fit": { "batch_size": 64, "epochs": 1000, "validation_freq": 1, "verbose": 2, "callbacks": [ {"class_name": "kgcnn>LinearWarmupExponentialLRScheduler", "config": { "lr_start": 1e-03, "gamma": 0.995, "epo_warmup": 1, "verbose": 1, "steps_per_epoch": 50}} ] }, "compile": { "optimizer": {"class_name": "Adam", "config": {"learning_rate": 1e-03}}, "loss_weights": {"energy": 0.05, "force": 0.95} }, "scaler": {"class_name": "EnergyForceExtensiveLabelScaler", "config": {"standardize_scale": False}}, "multi_target_indices": [0] }, "dataset": { "class_name": "ForceDataset", "module_name": "kgcnn.data.force", "config": { "data_directory": "/OMITTED/.../...", "dataset_name": "ThiolDisulfidExchange", "file_name": "ThiolDisulfidExchange.csv", "file_directory": None, "file_name_xyz": "ThiolDisulfidExchange.xyz", "file_name_mol": None, "file_name_force_xyz": "forces_conv.xyz" }, "methods": [ {"prepare_data": {"overwrite": True, "make_sdf": False}}, {"read_in_memory": {"label_column_name": "energy"}}, {"rename_property_on_graphs": {"old_property_name": "graph_labels", "new_property_name": "energy"}}, {"rename_property_on_graphs": {"old_property_name": "node_number", "new_property_name": "atomic_number"}}, {"rename_property_on_graphs": {"old_property_name": "forces_conv.xyz", "new_property_name": "force"}}, {"map_list": {"method": "set_range", "max_distance": 5, "max_neighbours": 10000, "node_coordinates": "node_coordinates"}}, {"map_list": {"method": "count_nodes_and_edges", "total_edges": "total_ranges", "count_edges": "range_indices", "count_nodes": "atomic_number", "total_nodes": "total_nodes"}}, ] }, "data": { "data_unit": "Hartree", }, "info": { "postfix": "ThiolDisulfidExchange", "postfix_file": "", "kgcnn_version": "4.0.0" } } } ```
PatReis commented 3 months ago

Okay, so the EnergyForceModel is not so nice in kgcnn 4 since the forces will depend directly on the input coordinate tensor. Otherwise everything would be too messy for all backends because the EnergyForceModel would introduce a second layer of casting tensors from/to ragged, padded, disjoint, padded disjoint etc.

But in any case, I think in your model you have the data output as ragged but the input as padded. That is not internally casted by the EnergyForceModel .

So could you either change the output as padded like so:

                "outputs": {"energy": {"name": "energy", "shape": (1,) , "ragged": False },
                            "force": {"name": "force", "shape": (None, 3), "ragged": False }}

or change the input like so:

    "Schnet.EnergyForceModel": {
        "model": {
            "class_name": "EnergyForceModel",
            "module_name": "kgcnn.models.force",
            "config": {
                "name": "Schnet",
                "nested_model_config": True,
                "output_to_tensor": False,
                "output_squeeze_states": True,
                "coordinate_input": 1,
                "inputs": [
                            {"shape": [None], "name": "atomic_number", "dtype": "int32", "ragged": True},
                            {"shape": [None, 3], "name": "node_coordinates", "dtype": "float32, "ragged": True"},
                            {"shape": [None, 2], "name": "range_indices", "dtype": "int64", "ragged": True},
                ],
                "model_energy": {
                    "class_name": "make_model",
                    "module_name": "kgcnn.literature.Schnet",
                    "config": {
                        "name": "SchnetEnergy",
                        "inputs": [
                            {"shape": [None], "name": "atomic_number", "dtype": "int32", "ragged": True},
                            {"shape": [None, 3], "name": "node_coordinates", "dtype": "float32", "ragged": True},
                            {"shape": [None, 2], "name": "range_indices", "dtype": "int64", "ragged": True},
                        ],
                        "input_tensor_type": "ragged",  # Important here!!!
                        "cast_disjoint_kwargs": {"padded_disjoint": False},
                        "input_node_embedding": {"input_dim": 95, "output_dim": 128},
                        "last_mlp": {"use_bias": [True, True, True], "units": [128, 64, 1],
                                     "activation": [
                                        {"class_name": "function", "config": "kgcnn>shifted_softplus"}, 
                                        {"class_name": "function", "config": "kgcnn>shifted_softplus"}, 
                                        'linear']},
                        "interaction_args": {
                            "units": 128, "use_bias": True, "activation": {"class_name": "function", "config": "kgcnn>shifted_softplus"},
                            "cfconv_pool": "scatter_sum"
                        },
                        "node_pooling_args": {"pooling_method": "scatter_sum"},
                        "depth": 6,
                        "gauss_args": {"bins": 25, "distance": 5, "offset": 0.0, "sigma": 0.4}, "verbose": 10,
                        "output_embedding": "graph",
                        "use_output_mlp": False,
                        "output_mlp": None,
                    }
                },
                "outputs": {"energy": {"name": "energy", "shape": (1,), "ragged": False},
                            "force": {"name": "force", "shape": (None, 3), "ragged": True}}
            }
        },

you may also change the loss function, if the training script does not do it correctly:

            "compile": {
                "optimizer": {"class_name": "Adam", "config": {"learning_rate": 1e-03}},
                "loss_weights": {"energy": 0.05, "force": 0.95},
                "loss": {
                    "energy": "mean_absolute_error",
                    "force": {"class_name": "kgcnn>RaggedValuesMeanAbsoluteError", "config": {}}
                }
            },
PatReis commented 3 months ago

You are right the outputs parameter here is missleading.

Tacitus523 commented 3 months ago

Thanks, completely padded tensors worked after removing all of the "ragged" keyword arguments for the Input tensor, as keras seems to have removed the ragged support for input tensors at some point after 2.15?

Just "input_tensor_type": "ragged" didn't seem sufficient to enable ragged input tensors, I still got an error from unrecognized keyword "ragged" for keras Input. Maybe it also had to go into the "model" dict and not just the "model_energy" dict? Now I removed every reference to ragged inputs to make it run.