Closed thegodone closed 11 months ago
So for example molecular descriptors? I guess you could generate molecular descriptors separately (via RDKit) and then just add them to the embedding after the Readout. Have I understood you correctly?
yes correct
I guess there are two ways to incorporate additional information:
Readout
and pass to MLP.I prefer option 2: Simply just concatenate some precomputed molecular descriptors after the Readout and pass to MLP, but how to pass them as input to the model using the tf.dataset ? I try to concate the inputs [x_encoder_graph, x_vector] but it failed in the typespec.
Traceback (most recent call last):
File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py", line 104, in normalize_element
spec = type_spec_from_value(t, use_fallback=False)
File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py", line 507, in type_spec_from_value
raise TypeError("Could not build a TypeSpec
for {} with type {}".format(
TypeError: Could not build a TypeSpec
for [GraphTensor(
edge_src=<tf.RaggedTensor: shape=(902, None), dtype=int32>,
edge_dst=<tf.RaggedTensor: shape=(902, None), dtype=int32>,
node_feature=<tf.RaggedTensor: shape=(902, None, 179), dtype=float32>,
edge_feature=<tf.RaggedTensor: shape=(902, None, 11), dtype=float32>,
positional_encoding=<tf.RaggedTensor: shape=(902, None, 16), dtype=float32>), [3.0, 4.0, 3.0, 3.0,...])
tf.data.Dataset.from_tensor_slices(([x_train,n_train], y_train))
where x_train = encoder(X_train) and n_train a single molecule feature vector to test.
I guess it comes from the tf.data.Dataset.from_tensor_slices map function: " .map(lambda x, args: (x.merge(), args), -1) "
I can't do anything right now coding-wise, hence not give you a definite answer/solution. However, to give you something already now based on the information I have:
It tries to build a type spec from the tuple [GraphTensor(...), list(...)]
which is not possible. The map should be: .map(lambda x, y: ([x[0].merge(), x[1]], y))
as x
is a tuple (or list I guess)
So try that. And make sure that x_train, n_train and y_train all have the same shape[0]
.
EDIT: perhaps it is the tf.data.Dataset.from_tensor_slices(...) that gives you the error? Maybe try tf.data.Dataset.from_tensor_slices(((x_train, n_train), y_train))
. And maybe check tf.data.Dataset documentation if needed. I don't think there should be any problem to construct a dataset form a mix of GraphTensor and Tensor inputs.
I see it is still failing at the map function again but this is the "Could not build a TypeSpec"
Traceback (most recent call last):
File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 175, in wrapper_helper
self._output_structure = structure.type_spec_from_value(ret)
File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py", line 487, in type_spec_from_value
return tuple([type_spec_from_value(v) for v in element])
File "/Users/tgg/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py", line 487, in TypeSpec
for {} with type {}".format(
TypeError: Could not build a TypeSpec
for [GraphTensor(
edge_src=
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/tgg/Documents/models/Guillaume-ochem/molgraphs/run3.py", line 241, in
I used this code:
x_train = encoder(X_train)
x_test = encoder(X_test)
n_train = [molw(smi) for smi in X_train]
n_test = [molw(smi) for smi in X_test]
n_train = np.array(n_train)
n_test = np.array(n_test)
#n_train_ = tf.RaggedTensor.from_uniform_row_length(n_train, uniform_row_length=1)
#n_test_ = tf.RaggedTensor.from_uniform_row_length(n_test, uniform_row_length=1)
train_ds = (
tf.data.Dataset.from_tensor_slices(((x_train, n_train), y_train))
.batch(bs)
.shuffle(1024)
.map(lambda x, y: ([x[0].merge(), x[1]], y))
.prefetch(-1)
)
test_ds = (
tf.data.Dataset.from_tensor_slices(((x_test, n_test), y_test))
.batch(bs)
.map(lambda x, y: ([x[0].merge(), x[1]], y))
.prefe
It works for me:
x_mol_level_descriptors = np.random.uniform(size=(1128, 5))
# x_train.shape = (1128, None, 12)
# y_train.shape = (1128, 1)
train_ds = tf.data.Dataset.from_tensor_slices(((x_train, x_mol_level_descriptors), y_train))
train_ds = train_ds.batch(32)
train_ds = train_ds.shuffle(1024)
train_ds = train_ds.map(lambda x, y: ((x[0].merge(), x[1]), y))
train_ds = train_ds.prefetch(-1)
for (x, x_), y in train_ds:
print(x.shape, x_.shape, y.shape)
# (None, 12) (32, 5) (32, 1)
Might be something about n_train
. Feel free to supply more info.
I think it is in the model input definition the issue than
I think it is in the model input definition the issue than
What do you mean? There seem to be no issue, unless you did not supply me with all information. The code you supplied me seems to work fine
Yes it works but only if you split the code to create train_ds / test_ds very intersting: working code:
train_ds = tf.data.Dataset.from_tensor_slices(((x_train, n_train), y_train))
train_ds = train_ds.batch(bs)
train_ds = train_ds.shuffle(1024)
train_ds = train_ds.map(lambda x, y: ((x[0].merge(), x[1]), y))
train_ds = train_ds.prefetch(-1)
test_ds = tf.data.Dataset.from_tensor_slices(((x_test, n_test), y_test))
test_ds = test_ds.batch(bs)
test_ds = test_ds.map(lambda x, y: ((x[0].merge(), x[1]), y))
test_ds = test_ds.prefetch(-1)
Not working code:
train_ds = (
tf.data.Dataset.from_tensor_slices(((x_train, n_train), y_train))
.batch(bs)
.shuffle(1024)
.map(lambda x, y: ([x[0].merge(), x[1]], y))
.prefetch(-1)
)
test_ds = (
tf.data.Dataset.from_tensor_slices(((x_test, n_test), y_test))
.batch(bs)
.map(lambda x, y: ([x[0].merge(), x[1]], y))
.prefetch(-1)
)
That is interesting indeed :D Not sure why..
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
I would like to add an additional mol feature in the graphTensor : like molecular weights for example.
it can be a single mol attribute or a vector of attributes.
I want to reuse it in the model not as graph input but as additional input in the beginning of the MLP part after the graph convolutions part. (aka concatenate this mol vector feature with the graph embedding vector after the Readout)
is it possible ? I see that you have a "y_mask" in the Tox21 case But I cannot use it as inputs unfortunately