Open HighCWu opened 5 years ago
I got something about this: keras-users/EhWwuq6R0lQ I'm not familiar with theano, so I don't know why it's OK on tensorflow but not okay on theano.
yeah I know; as I said in the readme file I was unable to train the model with theano backend (I also checked cntk, I couldn't even run the model!)
On Fri, Nov 30, 2018 at 3:24 PM hcWu notifications@github.com wrote:
I got something about this: keras-users/EhWwuq6R0lQ https://groups.google.com/forum/#!topic/keras-users/EhWwuq6R0lQ I'm not familiar with theano, so I don't know why it's OK on tensorflow but not okay on theano.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Separius/BERT-keras/issues/7#issuecomment-443181761, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfsCdN9W_1R3ghY50Z2Xlqu-WA6zkqfks5u0RxtgaJpZM4Y7h79 .
Oh, I see it. Maybe the theano support is not very necessary. At least now we rarely use theano. I should have seen it. It seems that I have donesome useless work.I should spend my time on something else. Will you spend your time on this ?
TBH I spent a day on this and at the end, I just hated Keras (for allowing such issues) and my self! so no I'm not going to waste any more time on this; Right now I'm changing the attention mechanism of BERT and trying to make it faster
If you want to play with BERT and learn something (and help others) a good direction is to train a distilled version of BERT, so maybe you can train a model that is only 8 layers deep and 16 heads per layer but with similar accuracy another idea that you can try is to use an encoder other than the transformer, so maybe a multilayer bidirectional QRNN can be used instead of the transformer?
Oh and thanks for making sure that the TPU version is correct and checking the backward compatibility :+1:
Thanks for your advice. BERT is really so large one for me. I will try your suggestion and wish you success on your new try.
It's totally no problem when using tensorflow backend. Now I test the theano. When running train_model of tutorial.ipynb,we get 1d~2d tensor but not Tensortype(float32,3D) error from T.nnet.softmax() of K.sparse_categorical_crossentropy
Then I use this to avoid it:
but run
again we get error:
It seems it's not my coding bug because I checkout the branch back to that one is before tpu support.