cybertronai / gradient-checkpointing

Make huge neural nets fit in memory
MIT License
2.72k stars 270 forks source link

AttributeError: 'NoneType' object has no attribute 'pred' #9

Open Mohamedemad4 opened 6 years ago

Mohamedemad4 commented 6 years ago

Hello, I tried using this project with keras import code below: ` import tqdm import keras import numpy as np import tensorflow as tf import keras.backend as k import memory_saving_gradients from keras.models import Model from keras.layers import Input,Dense,Bidirectional,Activation,TimeDistributed,GRU,Dropout

k.dict["gradients"] = memory_saving_gradients.gradients_memory

inputs=Input((400,len(chars)))

gu1=Bidirectional(GRU(200,activation='relu',kernel_initializer='RandomUniform', bias_initializer='RandomUniform',recurrent_dropout=0.2,return_sequences=True))(inputs)

gu2=GRU(400,activation='relu',kernel_initializer='RandomUniform', bias_initializer='RandomUniform',recurrent_dropout=0.2,dropout=0.2,return_sequences=True)(gu1)

d=Dropout(0.3)(gu2)

logits_td=TimeDistributed(Dense(len(chars)))(d)

logits=Activation('softmax')(logits_td)

model=Model(inputs,logits)
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy','categorical_accuracy']) model.train_on_batch(data_x,data_y) Note that data_x and data_y shapes are __(32,400,74)__ and that I cannot importimport tensorflow.python.* and the full traceback is -------------------------- AttributeError Traceback (most recent call last)

in () 1 import time 2 t1=time.time() ----> 3 model.train_on_batch(data_1[:32],data_2[:32]) 4 print('Batch Training time Approx. '+str(round(time.time()-t1,1))) /usr/local/lib/lib/python3.4/site-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight) 1811 else: 1812 ins = x + y + sample_weights -> 1813 self._make_train_function() 1814 outputs = self.train_function(ins) 1815 if len(outputs) == 1: /usr/local/lib/lib/python3.4/site-packages/keras/engine/training.py in _make_train_function(self) 988 training_updates = self.optimizer.get_updates( 989 params=self._collected_trainable_weights, --> 990 loss=self.total_loss) 991 updates = self.updates + training_updates 992 # Gets loss and metrics. Updates weights at each call. /usr/local/lib/lib/python3.4/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs) 85 warnings.warn('Update your `' + object_name + 86 '` call to the Keras 2 API: ' + signature, stacklevel=2) ---> 87 return func(*args, **kwargs) 88 wrapper._original_function = func 89 return wrapper /usr/local/lib/lib/python3.4/site-packages/keras/optimizers.py in get_updates(self, loss, params) 413 @interfaces.legacy_get_updates_support 414 def get_updates(self, loss, params): --> 415 grads = self.get_gradients(loss, params) 416 self.updates = [K.update_add(self.iterations, 1)] 417 /usr/local/lib/lib/python3.4/site-packages/keras/optimizers.py in get_gradients(self, loss, params) 71 72 def get_gradients(self, loss, params): ---> 73 grads = K.gradients(loss, params) 74 if hasattr(self, 'clipnorm') and self.clipnorm > 0: 75 norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads])) /var/host/media/removable/UNTITLED/seq2seq/memory_saving_gradients.py in gradients_memory(ys, xs, grad_ys, **kwargs) 25 26 def gradients_memory(ys, xs, grad_ys=None, **kwargs): ---> 27 return gradients(ys, xs, grad_ys, checkpoints='memory', **kwargs) 28 29 def gradients_collection(ys, xs, grad_ys=None, **kwargs): /var/host/media/removable/UNTITLED/seq2seq/memory_saving_gradients.py in gradients(ys, xs, grad_ys, checkpoints, **kwargs) 256 dv = tf_gradients(boundary, 257 checkpoints_disconnected_other+xs, --> 258 grad_ys=substitute_backprops, **kwargs) 259 debug_print("Got gradients %s", dv) 260 debug_print("for %s", boundary) /usr/local/lib/lib/python3.4/site-packages/tensorflow/python/ops/gradients_impl.py in gradients(ys, xs, grad_ys, name, colocate_gradients_with_ops, gate_gradients, aggregation_method) 547 # issue here because of zeros. 548 if loop_state: --> 549 out_grads[i] = loop_state.ZerosLike(op, i) 550 else: 551 out_grads[i] = control_flow_ops.ZerosLikeOutsideLoop(op, i) /usr/local/lib/lib/python3.4/site-packages/tensorflow/python/ops/control_flow_ops.py in ZerosLike(self, op, index) 1172 if grad_state is None: 1173 # op is not in a while loop that is part of gradients(). -> 1174 return ZerosLikeOutsideLoop(op, index) 1175 op_ctxt = op._get_control_flow_context() 1176 val = ops.convert_to_tensor(op.outputs[index], name="tensor") /usr/local/lib/lib/python3.4/site-packages/tensorflow/python/ops/control_flow_ops.py in ZerosLikeOutsideLoop(op, index) 1303 else: 1304 op_ctxt = op._get_control_flow_context() -> 1305 pred = op_ctxt.pred 1306 branch = op_ctxt.branch 1307 switch_val = switch(op.inputs[0], pred)[1 - branch] AttributeError: 'NoneType' object has no attribute 'pred'` ,thank you
yaroslavvb commented 6 years ago

Memory rewriting seems to be incompatible neural nets that use TensorFlow looping constructs right now

Mohamedemad4 commented 6 years ago

can you please elaborate,I am kind of new to this

yaroslavvb commented 6 years ago

Package doesn't support your neural net right now, sorry