google-deepmind / alphageometry

Apache License 2.0
4.07k stars 459 forks source link

RuntimeError: Internal: unk is not defined. #80

Open shufan1 opened 7 months ago

shufan1 commented 7 months ago

I got this error at initializing the language model by calling get_lm() in alphageometry.py. More specifically line 40 of lm_inference.py:

self.vocab = t5.data.SentencePieceVocabulary(vocab_path)

below is the longer error message. It seems to be an issue with either the vocab file "ag_ckpt_vocab/geometry.757.vocab" or the sentencepiece library. I have sentencepiece=0.1.99 installed in my environment. I have checked that my "ag_ckpt_vocab/geometry.757.vocab" is not an empty file and it includes <unk> 0 on line 4. Appreciate your help.

~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in __str__(self)
    511         f"SentencePieceVocabulary(file={self.sentencepiece_model_file}, "
    512         f"extra_ids={self._extra_ids}, "
--> 513         f"spm_md5={hashlib.md5(self.sp_model).hexdigest()})"
    514     )
    515 

~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in sp_model(self)
    415   def sp_model(self) -> Optional[bytes]:
    416     """Retrieve the SPM."""
--> 417     return self._model_context().sp_model
    418 
    419   @property

~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in _model_context(self)
    334     )
    335 
--> 336     self._model = self._load_model(
    337         self._sentencepiece_model_file,
    338         self._extra_ids,

~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in _load_model(cls, sentencepiece_model_file, extra_ids, normalizer_spec_overrides_serialized, reverse_extra_ids)
    387       # Load Python tokenizer and ensure the EOS and PAD IDs are correct.
    388       tokenizer = sentencepiece_processor.SentencePieceProcessor()
--> 389       tokenizer.LoadFromSerializedProto(sp_model)
    390       if tokenizer.pad_id() != PAD_ID:
    391         logging.warning(

~/alphageometry/lib/python3.9/site-packages/sentencepiece/__init__.py in LoadFromSerializedProto(self, serialized)
    248 
    249     def LoadFromSerializedProto(self, serialized):
--> 250         return _sentencepiece.SentencePieceProcessor_LoadFromSerializedProto(self, serialized)
    251 
    252     def SetEncodeExtraOptions(self, extra_option):

RuntimeError: Internal: unk is not defined.
shufan1 commented 7 months ago

I set up a new environment with python 3.10.9, still have this error

szhang99 commented 6 months ago

Anybody has solution on this?