google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.17k stars 9.6k forks source link

Splitting context and question for BERT #495

Open vivek1410patel opened 5 years ago

vivek1410patel commented 5 years ago

The paper suggests that the softmax for start and end logits is computed only over the context hidden states. However, in the code, I could not figure out any place where the hidden states were being split into query and context hidden states. Does that mean that in the code, we are calculating the softmax over the query and context hiddens, and expect the model to learn that the answer lies in context hiddens? Or can you point me to a place, where you are splitting the hidden states and then using the final linear layer only over the context.

libertatis commented 5 years ago

You can refer to the function 'write_predictions' in 'run_squad.py'. Paying attention to the line #L787, it said "We could hypothetically create invalid predictions, e.g., predict that the start of the span is in the question. We throw out all invalid predictions." So the author firstly get n best indexes for start and end, and then just throw out the invalid index in the question. So there is no split operation.