below is the longer error message. It seems to be an issue with either the vocab file "ag_ckpt_vocab/geometry.757.vocab" or the sentencepiece library. I have sentencepiece=0.1.99 installed in my environment. I have checked that my "ag_ckpt_vocab/geometry.757.vocab" is not an empty file and it includes <unk> 0 on line 4. Appreciate your help.
~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in __str__(self)
511 f"SentencePieceVocabulary(file={self.sentencepiece_model_file}, "
512 f"extra_ids={self._extra_ids}, "
--> 513 f"spm_md5={hashlib.md5(self.sp_model).hexdigest()})"
514 )
515
~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in sp_model(self)
415 def sp_model(self) -> Optional[bytes]:
416 """Retrieve the SPM."""
--> 417 return self._model_context().sp_model
418
419 @property
~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in _model_context(self)
334 )
335
--> 336 self._model = self._load_model(
337 self._sentencepiece_model_file,
338 self._extra_ids,
~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in _load_model(cls, sentencepiece_model_file, extra_ids, normalizer_spec_overrides_serialized, reverse_extra_ids)
387 # Load Python tokenizer and ensure the EOS and PAD IDs are correct.
388 tokenizer = sentencepiece_processor.SentencePieceProcessor()
--> 389 tokenizer.LoadFromSerializedProto(sp_model)
390 if tokenizer.pad_id() != PAD_ID:
391 logging.warning(
~/alphageometry/lib/python3.9/site-packages/sentencepiece/__init__.py in LoadFromSerializedProto(self, serialized)
248
249 def LoadFromSerializedProto(self, serialized):
--> 250 return _sentencepiece.SentencePieceProcessor_LoadFromSerializedProto(self, serialized)
251
252 def SetEncodeExtraOptions(self, extra_option):
RuntimeError: Internal: unk is not defined.
I got this error at initializing the language model by calling get_lm() in alphageometry.py. More specifically line 40 of lm_inference.py:
below is the longer error message. It seems to be an issue with either the vocab file "ag_ckpt_vocab/geometry.757.vocab" or the sentencepiece library. I have sentencepiece=0.1.99 installed in my environment. I have checked that my "ag_ckpt_vocab/geometry.757.vocab" is not an empty file and it includes
<unk> 0
on line 4. Appreciate your help.