Closed jzbjyb closed 4 years ago
If you want to fill in [MASK]
tokens then it's necessary initialize batcher = KnowBertBatchifier(archive_file, masking_strategy='full_mask')
. This creates batches in the same was as during pretraining. After doing so, I get ['france', 'germany', 'belgium', 'europe', 'canada', 'italy', 'paris', 'spain', 'russia', 'algeria']
as the top 10 predictions for 'Paris is located in [MASK].'
Thanks for your quick reply! I just noticed that full_mask
is not the default and using it can make correct predictions!
Hi Matthew,
Thanks a bunch for the documentation on embedding sentences programmatically. It saves me a lot of time! I did a little bit of modification so that I can use KnowBert to predict the missing word (i.e.,
[MASK]
) in a sentence, but found the results are unexpected. I am not sure if my implementation is correct, here is code snippet:The top 10 predictions are
[UNK], the, itself, its, and, marne, to, them, first, lissa
, while the top 10 predictions of BERT-uncased-base isfrance, paris, europe, italy, belgium, algeria, germany, russia, haiti, canada
, which seems a little bit wired. Is my implementation correct or any suggestions on this? Thanks in advance!