Separius / BERT-keras

Keras implementation of BERT with pre-trained weights
GNU General Public License v3.0
813 stars 196 forks source link

Error in tutorial notebook #18

Closed murtaza98 closed 5 years ago

murtaza98 commented 5 years ago

I was trying to run the tutorial.ipynb where I encountered this error while running the following cell.

Cell

# This is a tutorial on using this library
# first off we need a text_encoder so we would know our vocab_size (and later on use it to encode sentences)
from data.vocab import SentencePieceTextEncoder  # you could also import OpenAITextEncoder

sentence_piece_encoder = SentencePieceTextEncoder(text_corpus_address='openai/model/params_shapes.json',
                                                  model_name='tutorial', vocab_size=20)

Error

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-9-a0a8b2fa2e06> in <module>()
      2 
      3 sentence_piece_encoder = SentencePieceTextEncoder(text_corpus_address='bert_keras_repo/openai/model/params_shapes.json',
----> 4                                                   model_name='tutorial', vocab_size=20)

/content/bert_keras_repo/data/vocab.py in __init__(self, text_corpus_address, model_name, vocab_size, spm_model_type)
     64                 '--training_sentence_size=100000000'.format(
     65                     input=text_corpus_address, model_name=model_name, vocab_size=vocab_size, coverage=1,
---> 66                     model_type=spm_model_type.lower()))
     67         self.sp = spm.SentencePieceProcessor()
     68         self.sp.load('{}.model'.format(model_name))

OSError: Not found: unknown field name "training_sentence_size" in TrainerSpec.
marsch commented 5 years ago

not sure but I think training_sentence_size is deprecated https://github.com/google/sentencepiece/blob/master/src/sentencepiece_model.proto#L84

murtaza98 commented 5 years ago

not sure but I think training_sentence_size is deprecated https://github.com/google/sentencepiece/blob/master/src/sentencepiece_model.proto#L84

any solution that u could suggest?

Separius commented 5 years ago

Hey @murtaza98 Sorry for the super late reply, sadly I don't have any GPUs right now to rerun the tests and make sure that everything is working.

But it seems that simply removing the training_sentence_size from vocab.py will solve the issue.

murtaza98 commented 5 years ago

Hey @murtaza98 Sorry for the super late reply, sadly I don't have any GPUs right now to rerun the tests and make sure that everything is working.

But it seems that simply removing the training_sentence_size from vocab.py will solve the issue.

Thanks for the reply. Your solution worked and currently I am not facing any errors. Thanks.