google-research-datasets / natural-questions

Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answering systems.
Apache License 2.0
916 stars 151 forks source link

Regarding Char offset #16

Closed tahmedge closed 4 years ago

tahmedge commented 4 years ago

I cannot understand the char offset included in the BERT-BASELINE code. Why HTML tokens were not considered while calculating the offset values. Can you please elaborate?

tomkwiat commented 4 years ago

Hi Tahmid, it might be better to ask this question on the BERT baseline repository, rather than this dataset repository.

I believe that the original BERT baseline ignored HTML tokens, but I'm not sure how that was implemented.