Open edshui opened 3 years ago
Hi,
can you post a minimal example to reproduce the issue?
Thanks
Hi Daniele:
Absolutely. Please see below:
class GConn(Model):
def __init__(self, N, n_out, n_layers, activation="relu", dropout=None):
self.gins = []
for _ in range(n_layers):
self.gins.append(
GINConv(n_out, epsilon=0, mlp_hidden=[N,n_out,n_out],activation="relu")
)
self.gcn = GCNConv(
n_out,
activation="relu"
)
self.gats = []
for _ in range(n_layers):
self.gats.append(
GATConv(
n_out,attn_heads=8,add_self_loops=False,
concat_heads=False,dropout_rate=0.5,
activation="relu"
)
)
def call(self, inputs):
outs = []
x, a, _, *ax_aa_s = inputs
for idx,gin in enumerate(self.gins):
x1 = gin([ax_aa_s[2*idx], ax_aa_s[2*idx+1]])
outs.append(x1)
for idx,gat in enumerate(self.gats):
x2 = gat([ax_aa_s[2*idx], ax_aa_s[2*idx+1]])
outs.append(x2)
x3 = self.gcn([x, a])
outs.append(x3)
if len(outs)>1:
out = Concatenate(axis=-1)(outs)
out = self.dense(out)
else:
out = x1
return self.acti(out)
# Build model
model = GConn(N,n_out,len(adj_ran),activation=oACTI) # Model(inputs=[x_in, a_in], outputs=out)
opt = Adam(lr=learning_rate)
loss_fn = MeanSquaredError() #CategoricalCrossentropy()
metrics = MeanAbsoluteError() #MeanSquaredError()
The problem lies in x3
as the code would have run fine without self.gats
.
Many thanks! Ed
Hi,
can you post a minimal example to reproduce the issue?
Thanks
Hi,
sorry, I just took the time to look at this code. I'm not too sure what's going on here:
x, a, _, *ax_aa_s = inputs
since it seems that you have a model with non-standard inputs (standard would be simply node features and adjacency matrix) and I would need to see the arrays/tensors that you feed to the model when training.
Also, can you post the full stack trace so that I get a sense of where the error is happening in the GAT layer?
Thanks
Hi Daniele:
Thanks for getting back to me indeed!
The reason I have multiple adjacency matrices is because they are from the same adjacency matrix but masked with different threshold. They are then fed to different GAT layers. The output from GAT layers are then concatenated together, and pass to a dense layer (in case you have the time, please see the bottom for my implementation of the disjointloader that takes a list of adjacency matrices as input).
I would like to mention that the codes work fine if I only have the GCN and GIN layers. But it failed when I add GAT layers.
Below please find the full stack trace for your reference:
Epoch 1/600
WARNING:tensorflow:AutoGraph could not transform <bound method GConn.call of <__main__.GConn object at 0x7f1917545220>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method GConn.call of <__main__.GConn object at 0x7f1917545220>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method MessagePassing.propagate of <spektral.layers.convolutional.gin_conv.GINConv object at 0x7f191754c550>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method MessagePassing.propagate of <spektral.layers.convolutional.gin_conv.GINConv object at 0x7f191754c550>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method GATConv.call of <spektral.layers.convolutional.gat_conv.GATConv object at 0x7f18e96e82e0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method GATConv.call of <spektral.layers.convolutional.gat_conv.GATConv object at 0x7f18e96e82e0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method GCNConv.call of <spektral.layers.convolutional.gcn_conv.GCNConv object at 0x7f1916e60340>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method GCNConv.call of <spektral.layers.convolutional.gcn_conv.GCNConv object at 0x7f1916e60340>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-8-e3d69539cea3> in <module>
34 metrics=["mse"]) #["mse"])
35
---> 36 history = model.fit(
37 loader_tr.load(),
38 steps_per_epoch=loader_tr.steps_per_epoch,
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1098 _r=1):
1099 callbacks.on_train_batch_begin(step)
-> 1100 tmp_logs = self.train_function(iterator)
1101 if data_handler.should_sync:
1102 context.async_wait()
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
--> 828 result = self._call(*args, **kwds)
829 compiler = "xla" if self._experimental_compile else "nonXla"
830 new_tracing_count = self.experimental_get_tracing_count()
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
869 # This is the first call of __call__, so we have to initialize.
870 initializers = []
--> 871 self._initialize(args, kwds, add_initializers_to=initializers)
872 finally:
873 # At this point we know that the initialization is complete (or less
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
723 self._graph_deleter = FunctionDeleter(self._lifted_initializer_graph)
724 self._concrete_stateful_fn = (
--> 725 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
726 *args, **kwds))
727
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
2967 args, kwargs = None, None
2968 with self._lock:
-> 2969 graph_function, _ = self._maybe_define_function(args, kwargs)
2970 return graph_function
2971
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
3359
3360 self._function_cache.missed.add(call_context_key)
-> 3361 graph_function = self._create_graph_function(args, kwargs)
3362 self._function_cache.primary[cache_key] = graph_function
3363
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
3194 arg_names = base_arg_names + missing_arg_names
3195 graph_function = ConcreteFunction(
-> 3196 func_graph_module.func_graph_from_py_func(
3197 self._name,
3198 self._python_function,
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
988 _, original_func = tf_decorator.unwrap(python_func)
989
--> 990 func_outputs = python_func(*func_args, **func_kwargs)
991
992 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
632 xla_context.Exit()
633 else:
--> 634 out = weak_wrapped_fn().__wrapped__(*args, **kwds)
635 return out
636
~/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
975 except Exception as e: # pylint:disable=broad-except
976 if hasattr(e, "ag_error_metadata"):
--> 977 raise e.ag_error_metadata.to_exception(e)
978 else:
979 raise
NotImplementedError: in user code:
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py:795 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
return fn(*args, **kwargs)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py:788 run_step **
outputs = model.train_step(data)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py:757 train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:496 minimize
grads_and_vars = self._compute_gradients(
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:548 _compute_gradients
grads_and_vars = self._get_gradients(tape, loss, var_list, grad_loss)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:441 _get_gradients
grads = tape.gradient(loss, var_list, grad_loss)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/backprop.py:1080 gradient
flat_grad = imperative_grad.imperative_grad(
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/imperative_grad.py:71 imperative_grad
return pywrap_tfe.TFE_Py_TapeGradient(
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/eager/backprop.py:162 _gradient_function
return grad_fn(mock_op, *out_grads)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/ops/math_grad.py:473 _UnsortedSegmentSumGrad
return _GatherDropNegatives(grad, op.inputs[1])[0], None, None
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/ops/math_grad.py:439 _GatherDropNegatives
array_ops.ones([array_ops.rank(gathered)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
return target(*args, **kwargs)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/ops/array_ops.py:3120 ones
output = _constant_if_small(one, shape, dtype, name)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/ops/array_ops.py:2804 _constant_if_small
if np.prod(shape) < 1000:
<__array_function__ internals>:5 prod
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3030 prod
return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/numpy/core/fromnumeric.py:87 _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/ehui/anaconda3/envs/hcp/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:852 __array__
raise NotImplementedError(
NotImplementedError: Cannot convert a symbolic Tensor (gradient_tape/g_conn/gat_conv/sub:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported
Below is my disjointloader implementation, which is a subclass of your DisjointLoader:
class HCPDisjointLoader(DisjointLoader):
def __init__(
self, dataset, node_level=False, batch_size=1, epochs=None, shuffle=True
):
self.dataset2 = dataset[1:]
super().__init__(dataset[0], node_level=node_level, batch_size=batch_size, epochs=epochs, shuffle=shuffle)
self._HCPgenerator = [self.HCPgenerator(i) for i in range(len(self.dataset2))]
def __next__(self):
nxt = self._generator.__next__()
nxt2 = [gen.__next__() for gen in self._HCPgenerator]
return self.collate(nxt,nxt2)
def HCPgenerator(self,idx):
return batch_generator(
self.dataset2[idx],
batch_size=self.batch_size,
epochs=self.epochs,
shuffle=self.shuffle,
)
def collate(self, batch, batch2):
output, y = super().collate(batch)
output = list(output)
for ba in batch2:
out,_ = super().collate(ba)
out = list(out)
output = output + out[:2]
output=tuple(output)
return output, y
def tf_signature(self):
n_layers = len(self.dataset2)
signature = self.dataset.signature
signature2 = self.dataset2[0].signature
if "y" in signature:
signature["y"]["shape"] = prepend_none(signature["y"]["shape"])
if "a" in signature:
signature["a"]["spec"] = tf.SparseTensorSpec
signature["i"] = dict()
signature["i"]["spec"] = tf.TensorSpec
signature["i"]["shape"] = (None,)
signature["i"]["dtype"] = tf.as_dtype(tf.int64)
for idx in range(n_layers):
x_str='x'+str(idx+2)
a_str='a'+str(idx+2)
signature[x_str] = signature2['x']
signature[a_str] = signature2['a']
return to_tf_signature(signature,n_layers)
adataset_tr = []
adataset_va = []
for thres in adj_ran:
tdataset = HCPDataset([ax,tadj>thres,y])
tdataset_tr, tdataset_va = tdataset[idx_tr], tdataset[idx_va]
adataset_tr.append(tdataset_tr)
adataset_va.append(tdataset_va)
dataset = HCPDataset([x,adj,y])
dataset_tr, dataset_va = dataset[idx_tr], dataset[idx_va]
loader_tr = HCPDisjointLoader([dataset_tr,*adataset_tr], batch_size=batch_size, epochs=epochs, node_level=True)
loader_va = HCPDisjointLoader([dataset_va,*adataset_va], batch_size=batch_size, node_level=True)
Thanks so much for your help and time!
Ed
Hi,
sorry, I just took the time to look at this code. I'm not too sure what's going on here:
x, a, _, *ax_aa_s = inputs
since it seems that you have a model with non-standard inputs (standard would be simply node features and adjacency matrix) and I would need to see the arrays/tensors that you feed to the model when training.
Also, can you post the full stack trace so that I get a sense of where the error is happening in the GAT layer?
Thanks
Hi Daniele:
May I wonder if you had a chance to take a look at the trace above?
Your help is much appreciated! Thanks!
Ed
Hi Ed,
I have looked at the code and stack trace, but unfortunately it didn't help. Can you re-run your code, but this time add the following line at the top of the main script:
tf.config.run_functions_eagerly(True)
?
This should give a stack trace that tells us where the problem happens, so we can debug it. Also, if you were able to reproduce the issue in a more "standard" setting that would be great, this issue might also have something to do with the custom loader.
Thanks Daniele
Hi Daniele:
Many thanks for getting back to me despite your busy schedule.
Interestingly, the scripts run when I added tf.config.run_functions_eagerly(True)
, do you know what's going on (please excuse my ignorance)?
Many thanks, Ed
Hi Ed,
I have looked at the code and stack trace, but unfortunately it didn't help. Can you re-run your code, but this time add the following line at the top of the main script:
tf.config.run_functions_eagerly(True)
?
This should give a stack trace that tells us where the problem happens, so we can debug it. Also, if you were able to reproduce the issue in a more "standard" setting that would be great, this issue might also have something to do with the custom loader.
Thanks Daniele
Honestly, I have no idea :D I would need to run the code in a debugger, with your data, to see what input/array is causing the crash in graph mode.
Note that this solution is not optimal, since eager mode will run slower.
Hi Daniele:
No worries, let me try my best to figure out what went wrong;)
It may have to do with what you mentioned previously (my custom loader). Will give you posted with updates.
Many thanks! Ed
Honestly, I have no idea :D I would need to run the code in a debugger, with your data, to see what input/array is causing the crash in graph mode.
Note that this solution is not optimal, since eager mode will run slower.
Dear Experts:
I was trying to use GATConv in disjoint mode with disjoint data loader. But when I run model.fit, I got the following error that I couldn't figure out how to solve at all:
Any help would be greatly appreciated, thank you very much! Ed