akensert / molgraph

Graph neural networks for molecular machine learning. Implemented and compatible with TensorFlow and Keras.
https://molgraph.readthedocs.io/en/latest/
MIT License
48 stars 5 forks source link

how to add an additional mol feature in the graphTensor #20

Closed thegodone closed 11 months ago

thegodone commented 1 year ago

I would like to add an additional mol feature in the graphTensor : like molecular weights for example.

it can be a single mol attribute or a vector of attributes.

I want to reuse it in the model not as graph input but as additional input in the beginning of the MLP part after the graph convolutions part. (aka concatenate this mol vector feature with the graph embedding vector after the Readout)

is it possible ? I see that you have a "y_mask" in the Tox21 case But I cannot use it as inputs unfortunately

akensert commented 1 year ago

So for example molecular descriptors? I guess you could generate molecular descriptors separately (via RDKit) and then just add them to the embedding after the Readout. Have I understood you correctly?

thegodone commented 1 year ago

yes correct

akensert commented 1 year ago

I guess there are two ways to incorporate additional information:

  1. Add a virtual super node and propagate the information from this super node to all other nodes for a number of steps. Or the opposite/reverse I guess: propagate information from regular nodes to super node.
  2. Simply just concatenate some precomputed molecular descriptors after the Readout and pass to MLP.
thegodone commented 1 year ago

I prefer option 2: Simply just concatenate some precomputed molecular descriptors after the Readout and pass to MLP, but how to pass them as input to the model using the tf.dataset ? I try to concate the inputs [x_encoder_graph, x_vector] but it failed in the typespec.

Traceback (most recent call last): File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py", line 104, in normalize_element spec = type_spec_from_value(t, use_fallback=False) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py", line 507, in type_spec_from_value raise TypeError("Could not build a TypeSpec for {} with type {}".format( TypeError: Could not build a TypeSpec for [GraphTensor( edge_src=<tf.RaggedTensor: shape=(902, None), dtype=int32>, edge_dst=<tf.RaggedTensor: shape=(902, None), dtype=int32>, node_feature=<tf.RaggedTensor: shape=(902, None, 179), dtype=float32>, edge_feature=<tf.RaggedTensor: shape=(902, None, 11), dtype=float32>, positional_encoding=<tf.RaggedTensor: shape=(902, None, 16), dtype=float32>), [3.0, 4.0, 3.0, 3.0,...])

tf.data.Dataset.from_tensor_slices(([x_train,n_train], y_train))

where x_train = encoder(X_train) and n_train a single molecule feature vector to test.

I guess it comes from the tf.data.Dataset.from_tensor_slices map function: " .map(lambda x, args: (x.merge(), args), -1) "

akensert commented 1 year ago

I can't do anything right now coding-wise, hence not give you a definite answer/solution. However, to give you something already now based on the information I have:

It tries to build a type spec from the tuple [GraphTensor(...), list(...)] which is not possible. The map should be: .map(lambda x, y: ([x[0].merge(), x[1]], y)) as x is a tuple (or list I guess)

So try that. And make sure that x_train, n_train and y_train all have the same shape[0].

EDIT: perhaps it is the tf.data.Dataset.from_tensor_slices(...) that gives you the error? Maybe try tf.data.Dataset.from_tensor_slices(((x_train, n_train), y_train)). And maybe check tf.data.Dataset documentation if needed. I don't think there should be any problem to construct a dataset form a mix of GraphTensor and Tensor inputs.

thegodone commented 1 year ago

I see it is still failing at the map function again but this is the "Could not build a TypeSpec"

Traceback (most recent call last): File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 175, in wrapper_helper self._output_structure = structure.type_spec_from_value(ret) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py", line 487, in type_spec_from_value return tuple([type_spec_from_value(v) for v in element]) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py", line 487, in return tuple([type_spec_from_value(v) for v in element]) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py", line 507, in type_spec_from_value raise TypeError("Could not build a TypeSpec for {} with type {}".format( TypeError: Could not build a TypeSpec for [GraphTensor( edge_src=, edge_dst=, node_feature=<tf.Tensor: shape=(None, 179), dtype=float32>, edge_feature=<tf.Tensor: shape=(None, 11), dtype=float32>, positional_encoding=<tf.Tensor: shape=(None, 16), dtype=float32>, graph_indicator=), <tf.Tensor 'args_5:0' shape=(None,) dtype=int32>] with type list

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/tgg/Documents/models/Guillaume-ochem/molgraphs/run3.py", line 241, in tf.data.Dataset.from_tensor_slices(((x_train, n_train), y_train_par)) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2240, in map return map_op._map_v2( File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/map_op.py", line 37, in _map_v2 return _MapDataset( File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/map_op.py", line 107, in init self._map_func = structured_function.StructuredFunctionWrapper( File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 261, in init self._function = fn_factory() File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 232, in get_concrete_function concrete_function = self._get_concrete_function_garbage_collected( File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 202, in _get_concrete_function_garbage_collected concretefunction, = self._maybe_define_concrete_function(args, kwargs) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 166, in _maybe_define_concrete_function return self._maybe_define_function(args, kwargs) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 396, in _maybe_define_function concrete_function = self._create_concrete_function( File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 300, in _create_concrete_function func_graph_module.func_graph_from_py_func( File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1214, in func_graph_from_py_func func_outputs = python_func(*func_args, *func_kwargs) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 238, in wrapped_fn ret = wrapper_helper(args) File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 177, in wrapper_helper raise TypeError(f"Unsupported return value from function passed to " TypeError: Unsupported return value from function passed to Dataset.map(): ([GraphTensor( edge_src=, edge_dst=, node_feature=<tf.Tensor: shape=(None, 179), dtype=float32>, edge_feature=<tf.Tensor: shape=(None, 11), dtype=float32>, positional_encoding=<tf.Tensor: shape=(None, 16), dtype=float32>, graph_indicator=), <tf.Tensor 'args_5:0' shape=(None,) dtype=int32>], <tf.Tensor 'args_6:0' shape=(None, 1, 6) dtype=float32>).

I used this code:


            x_train = encoder(X_train)
            x_test = encoder(X_test)

            n_train = [molw(smi) for smi in X_train]
            n_test = [molw(smi) for smi in X_test]
            n_train = np.array(n_train)
            n_test = np.array(n_test)

            #n_train_ = tf.RaggedTensor.from_uniform_row_length(n_train, uniform_row_length=1)
            #n_test_ = tf.RaggedTensor.from_uniform_row_length(n_test, uniform_row_length=1)

            train_ds = (
                tf.data.Dataset.from_tensor_slices(((x_train, n_train), y_train))
                .batch(bs)
                .shuffle(1024)
                .map(lambda x, y: ([x[0].merge(), x[1]], y))
                .prefetch(-1)
            )

            test_ds = (
                tf.data.Dataset.from_tensor_slices(((x_test, n_test), y_test))
                .batch(bs)
                .map(lambda x, y: ([x[0].merge(), x[1]], y))
                .prefe
akensert commented 1 year ago

It works for me:

x_mol_level_descriptors = np.random.uniform(size=(1128, 5))
# x_train.shape = (1128, None, 12)
# y_train.shape = (1128, 1)

train_ds = tf.data.Dataset.from_tensor_slices(((x_train, x_mol_level_descriptors), y_train))
train_ds = train_ds.batch(32)
train_ds = train_ds.shuffle(1024)
train_ds = train_ds.map(lambda x, y: ((x[0].merge(), x[1]), y))
train_ds = train_ds.prefetch(-1)

for (x, x_), y in train_ds:
    print(x.shape, x_.shape, y.shape)
    # (None, 12) (32, 5) (32, 1)

Might be something about n_train. Feel free to supply more info.

thegodone commented 1 year ago

I think it is in the model input definition the issue than

akensert commented 1 year ago

I think it is in the model input definition the issue than

What do you mean? There seem to be no issue, unless you did not supply me with all information. The code you supplied me seems to work fine

thegodone commented 1 year ago

Yes it works but only if you split the code to create train_ds / test_ds very intersting: working code:

train_ds = tf.data.Dataset.from_tensor_slices(((x_train, n_train), y_train))
            train_ds = train_ds.batch(bs)
            train_ds = train_ds.shuffle(1024)
            train_ds = train_ds.map(lambda x, y: ((x[0].merge(), x[1]), y))
            train_ds = train_ds.prefetch(-1)

            test_ds = tf.data.Dataset.from_tensor_slices(((x_test, n_test), y_test))
            test_ds = test_ds.batch(bs)
            test_ds = test_ds.map(lambda x, y: ((x[0].merge(), x[1]), y))
            test_ds = test_ds.prefetch(-1)

Not working code:

            train_ds = (
                tf.data.Dataset.from_tensor_slices(((x_train, n_train), y_train))
                .batch(bs)
                .shuffle(1024)
                .map(lambda x, y: ([x[0].merge(), x[1]], y))
                .prefetch(-1)
            )

            test_ds = (
                tf.data.Dataset.from_tensor_slices(((x_test, n_test), y_test))
                .batch(bs)
                .map(lambda x, y: ([x[0].merge(), x[1]], y))
                .prefetch(-1)
            )
akensert commented 1 year ago

That is interesting indeed :D Not sure why..

github-actions[bot] commented 11 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 11 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.