Separius / BERT-keras

Keras implementation of BERT with pre-trained weights
GNU General Public License v3.0
813 stars 196 forks source link

error of sparse_categorical_crossentropy when using theano backend #7

Open HighCWu opened 5 years ago

HighCWu commented 5 years ago

It's totally no problem when using tensorflow backend. Now I test the theano. When running train_model of tutorial.ipynb,we get 1d~2d tensor but not Tensortype(float32,3D) error from T.nnet.softmax() of K.sparse_categorical_crossentropy

<ipython-input-22-27837df85ad1> in classification_loss(y_true, y_pred)
      2 import keras.backend as K
      3 def classification_loss(y_true, y_pred):
----> 4     return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
      5 train.classification_loss = classification_loss

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in sparse_categorical_crossentropy(target, output, from_logits, axis)
   1788     target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
   1789     target = reshape(target, shape(output))
-> 1790     return categorical_crossentropy(target, output, from_logits, axis=-1)
   1791 
   1792 

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in categorical_crossentropy(target, output, from_logits, axis)
   1762         target = permute_dimensions(target, permutation)
   1763     if from_logits:
-> 1764         output = T.nnet.softmax(output)
   1765     else:
   1766         # scale preds so that the class probas of each sample sum to 1

/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in softmax(c)
    813     if c.broadcastable[-1]:
    814         warnings.warn("The softmax is applied on a dimension of shape 1, which does not have a semantic meaning.")
--> 815     return softmax_op(c)
    816 
    817 

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    613         """
    614         return_list = kwargs.pop('return_list', False)
--> 615         node = self.make_node(*inputs, **kwargs)
    616 
    617         if config.compute_test_value != 'off':

/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in make_node(self, x)
    428                 or x.type.dtype not in tensor.float_dtypes:
    429             raise ValueError('x must be 1-d or 2-d tensor of floats. Got %s' %
--> 430                              x.type)
    431         if x.ndim == 1:
    432             warnings.warn("DEPRECATION: If x is a vector, Softmax will not automatically pad x "

ValueError: x must be 1-d or 2-d tensor of floats. Got TensorType(float32, 3D)

Then I use this to avoid it:

    import keras.backend as K
    _softmax = K.T.nnet.softmax
    def softmax(x):
        if x.ndim == 3:
            d1,d2,d3 = x.shape
            return _softmax(x.reshape((d1*d2,d3))).reshape((d1,d2,d3))
        return _softmax(x)
    K.T.nnet.softmax = softmax

but run

m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
                finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
                finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)

again we get error:

/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_logits and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 6), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_flatten and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 8, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_gather and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 8, 6), (None, 1)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 1), (None, 8, 2)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_random_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 25), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
Epoch 1/100
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
    891             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892                 r = p(n, [x[0] for x in i], o)
    893                 for o in node.outputs:

/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
   2338         if self.set_instead_of_inc:
-> 2339             out[0][inputs[2:]] = inputs[1]
   2340         else:

IndexError: index 8 is out of bounds for axis 1 with size 6

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-39-7b7276d2ce06> in <module>()
      1 m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
      2                 finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
----> 3                 finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)
      4 # now m is ready to be used!
      5 print(m.inputs)

/content/bert_keras_repo/transformer/train.py in train_model(base_model, is_causal, tasks_meta_data, pretrain_generator, finetune_generator, pretrain_epochs, pretrain_optimizer, pretrain_steps, pretrain_callbacks, finetune_epochs, finetune_optimizer, finetune_steps, finetune_callbacks, verbose, TPUStrategy)
    145 
    146     if pretrain_generator is not None:
--> 147         train_step(True)
    148     if finetune_generator is not None:
    149         train_step(False)

/content/bert_keras_repo/transformer/train.py in train_step(is_pretrain)
    142         _model.fit_generator(_generator, steps_per_epoch=pretrain_steps if is_pretrain else finetune_steps,
    143                              verbose=verbose, callbacks=pretrain_callbacks if is_pretrain else finetune_callbacks,
--> 144                              shuffle=False, epochs=pretrain_epochs if is_pretrain else finetune_epochs)
    145 
    146     if pretrain_generator is not None:

/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    215                 outs = model.train_on_batch(x, y,
    216                                             sample_weight=sample_weight,
--> 217                                             class_weight=class_weight)
    218 
    219                 outs = to_list(outs)

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1215             ins = x + y + sample_weights
   1216         self._make_train_function()
-> 1217         outputs = self.train_function(ins)
   1218         return unpack_singleton(outputs)
   1219 

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in __call__(self, inputs)
   1386     def __call__(self, inputs):
   1387         assert isinstance(inputs, (list, tuple))
-> 1388         return self.function(*inputs)
   1389 
   1390 

/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    915                     node=self.fn.nodes[self.fn.position_of_error],
    916                     thunk=thunk,
--> 917                     storage_map=getattr(self.fn, 'storage_map', None))
    918             else:
    919                 # old-style linkers raise their own exceptions

/usr/local/lib/python3.6/dist-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 

/usr/local/lib/python3.6/dist-packages/six.py in reraise(tp, value, tb)
    690                 value = tp()
    691             if value.__traceback__ is not tb:
--> 692                 raise value.with_traceback(tb)
    693             raise value
    694         finally:

/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    901         try:
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)
    905         except Exception:

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
    890             # default arguments are stored in the closure of `rval`
    891             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892                 r = p(n, [x[0] for x in i], o)
    893                 for o in node.outputs:
    894                     compute_map[o][0] = True

/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
   2337 
   2338         if self.set_instead_of_inc:
-> 2339             out[0][inputs[2:]] = inputs[1]
   2340         else:
   2341             np.add.at(out[0], tuple(inputs[2:]), inputs[1])

IndexError: index 8 is out of bounds for axis 1 with size 6
Apply node that caused the error: AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}(Alloc.0, TensorConstant{1}, ARange{dtype='int64'}.0, Reshape{1}.0)
Toposort index: 315
Inputs types: [TensorType(float32, matrix), TensorType(int8, scalar), TensorType(int64, vector), TensorType(int32, vector)]
Inputs shapes: [(64, 6), (), (64,), (64,)]
Inputs strides: [(24, 4), (), (8,), (4,)]
Inputs values: ['not shown', array(1, dtype=int8), 'not shown', 'not shown']
Outputs clients: [[Reshape{3}(AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}.0, MakeVector{dtype='int64'}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "bert_keras_repo/transformer/train.py", line 68, in train_model
    [task_loss_weight, task_target, logits, task_mask])
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/layers/core.py", line 687, in call
    return self.function(inputs, **arguments)
  File "bert_keras_repo/transformer/train.py", line 67, in <lambda>
    task_loss = Lambda(lambda x: x[0] * masked_classification_loss(x[1], x[2], x[3]), name=task.name + '_loss')(
  File "bert_keras_repo/transformer/train.py", line 20, in masked_classification_loss
    return _mask_loss(y_true, y_pred, y_mask, classification_loss)
  File "bert_keras_repo/transformer/train.py", line 11, in _mask_loss
    l = K.switch(y_mask, element_wise_loss(y_true, y_pred), K.zeros_like(y_mask, dtype=K.floatx()))
  File "<ipython-input-22-27837df85ad1>", line 4, in classification_loss
    return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py", line 1788, in sparse_categorical_crossentropy
    target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

It seems it's not my coding bug because I checkout the branch back to that one is before tpu support.

HighCWu commented 5 years ago

I got something about this: keras-users/EhWwuq6R0lQ I'm not familiar with theano, so I don't know why it's OK on tensorflow but not okay on theano.

Separius commented 5 years ago

yeah I know; as I said in the readme file I was unable to train the model with theano backend (I also checked cntk, I couldn't even run the model!)

On Fri, Nov 30, 2018 at 3:24 PM hcWu notifications@github.com wrote:

I got something about this: keras-users/EhWwuq6R0lQ https://groups.google.com/forum/#!topic/keras-users/EhWwuq6R0lQ I'm not familiar with theano, so I don't know why it's OK on tensorflow but not okay on theano.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Separius/BERT-keras/issues/7#issuecomment-443181761, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfsCdN9W_1R3ghY50Z2Xlqu-WA6zkqfks5u0RxtgaJpZM4Y7h79 .

HighCWu commented 5 years ago

Oh, I see it. Maybe the theano support is not very necessary. At least now we rarely use theano. I should have seen it. It seems that I have donesome useless work.I should spend my time on something else. Will you spend your time on this ?

Separius commented 5 years ago

TBH I spent a day on this and at the end, I just hated Keras (for allowing such issues) and my self! so no I'm not going to waste any more time on this; Right now I'm changing the attention mechanism of BERT and trying to make it faster

If you want to play with BERT and learn something (and help others) a good direction is to train a distilled version of BERT, so maybe you can train a model that is only 8 layers deep and 16 heads per layer but with similar accuracy another idea that you can try is to use an encoder other than the transformer, so maybe a multilayer bidirectional QRNN can be used instead of the transformer?

Oh and thanks for making sure that the TPU version is correct and checking the backward compatibility :+1:

HighCWu commented 5 years ago

Thanks for your advice. BERT is really so large one for me. I will try your suggestion and wish you success on your new try.