Separius / BERT-keras

Keras implementation of BERT with pre-trained weights
GNU General Public License v3.0
813 stars 196 forks source link

How can I apply BERT to a cloze task? #10

Closed Deep1994 closed 5 years ago

Deep1994 commented 5 years ago

Hi, I have a dataset like :

From Monday to Friday most people are busy working or studying, but in the evenings and weekends they are free and _ themselves.

And there are four candidates for the missing blank area:

["love", "work", "enjoy", "play"], here "enjoy" is the correct answer, it is a cloze-style task, and it looks like the maskLM in the BERT, the difference is that I don't want to search the candidate from all the tokens but the four given candidates, how can I do this? It looks like negtive sampling method. Do you have any idea? Thank you!

Separius commented 5 years ago

Hi, you have two options the lazy one and the efficient one!

The lazy way is to do the masked LM as usual but also add a target mask that will be applied to the final output of the decoder before the final softmax over words which will replace all the vocab words with -inf except your choices (in reality it's a bit harder because different words may have a different number of BPE parts, but I guess it's ok to only use the first part) the problem here is that you are actually calculating decoder softmax over all the vocab which is not that fast

the better method which is easy to implement (assuming you have 4 possible answers all the time) is to use the embedding output of the possible answers (because BERT uses tied weights in embedding and decoder layer) and just calculate the dot product of them with the final encoder output for that word and use a softmax over those four words

Separius commented 5 years ago

Hey @Deep1994, how are things? were you able to solve this?