asmekal / keras-monotonic-attention

seq2seq attention in keras
GNU Affero General Public License v3.0
40 stars 7 forks source link

InvalidArgumentError: Incompatible shapes running examples with keras 2.2.3 and later #9

Open SunYanCN opened 5 years ago

SunYanCN commented 5 years ago

I do nothing,just run sequential_example.py. Then get a error as:

Using TensorFlow backend.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, None, 10)          500       
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 300)         193200    
_________________________________________________________________
AttentionDecoder (AttentionD (None, None, 50)          343810    
=================================================================
Total params: 537,510
Trainable params: 537,510
Non-trainable params: 0
_________________________________________________________________
/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/10
2019-01-23 14:11:08.400572: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-23 14:11:08.510241: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-23 14:11:08.510750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:00:06.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2019-01-23 14:11:08.510791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-01-23 14:11:08.851793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-23 14:11:08.851847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2019-01-23 14:11:08.851856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2019-01-23 14:11:08.852059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7057 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:06.0, compute capability: 6.1)
Traceback (most recent call last):
  File "/home/sunyan/JDDC/keras-monotonic-attention/sequential_example.py", line 33, in <module>
    model.fit(x, y, epochs=10)
  File "/root/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/root/anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/root/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/root/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1454, in __call__
    self._session._session, self._handle, args, status, None)
  File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [320] vs. [32,10]
     [[Node: metrics/acc/Equal = Equal[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](metrics/acc/Reshape, metrics/acc/Cast)]]
     [[Node: loss/mul/_277 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5751_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
asmekal commented 5 years ago

Interesting. Seems it is caused by bug in the latest keras (2.2.3 and 2.2.4) in tensorflow backend (I reproduced this error) with accuracy metric. See related keras issue https://github.com/keras-team/keras/issues/11348

As a simplest fix you can downgrade keras to 2.1.6 - it should work then for sure

Hope keras team will fix it soon, so it should work in the latest keras as well

SunYanCN commented 5 years ago

OK,it works!

robrechtme commented 5 years ago

Do you have any idea if there is a possible fix without downgrading keras?

asmekal commented 5 years ago

@RobrechtM It seems to be an issue with accuracy metric in new version of keras. You can avoid using accuracy and it should be ok.