google-deepmind / alphageometry

Apache License 2.0
3.81k stars 419 forks source link

msgpk unpack error #76

Open shufan1 opened 4 months ago

shufan1 commented 4 months ago

When running run.sh in bash without changing anything, I got the following error:

 File "msgpack/_unpacker.pyx", line 205, in msgpack._cmsgpack.unpackb
ValueError: Unpack failed: incomplete input 

This error occurs when getting the language model on line 641 of alphageometry.py: model = get_lm(_CKPT_PATH.value, _VOCAB_PATH.value) . The error comes from initializing the trainer on line 62 of lminference.py ` (tstate, , imodel, prngs) = trainer.initialize_model(). I have weights and vocabulary downloaded inag_cpkt_vocab` and checked that _CKPT_PATH.value = 'ag_cpkt_vocab and _VOCAB_PATH.value = ag_ckpt_vocab/geometry.757.model. How should I resolve this error? Appreciate your help. image

unhandyandy commented 4 months ago

I only have three files in that folder: checkpoint_10999999, and the two geometry files. Perhaps the checkpoint file didn't download completely?

shufan1 commented 4 months ago

I only have three files in that folder: checkpoint_10999999, and the two geometry files. Perhaps the checkpoint file didn't download completely?

Thank you!! It was the problem. However, I am running into a new issue on probably the vocabulary file. I received the following error from line 97 of lm_inference.py: result = [self.vocab.encode(x) for x in inputs_strs]:

return _sentencepiece.SentencePieceProcessor_LoadFromSerializedProto(self, serialized)
RuntimeError: Internal: unk is not defined. 

Any chance you could help me debug this? I saw some comments on this similar issue saying the file size is 0 or problem with downloading files. I have checked that geometry.757.model is 14334bytes and geometry.757.vocab is 10025 bytes. Do they seem right? I also had issue with gdown so downloaded all three files from the Google drive link directly https://drive.google.com/drive/u/0/folders/1ZLaZ2ajtOcILDWa5ePPLX1bmaf_BNRZV . Thank you a lot.

unhandyandy commented 4 months ago

Unfortunately the computer I've been working on down for maintenance till Thursday. In the meantime I'm I'm trying to install AG on another system, and having problems of my own. :(

unhandyandy commented 4 months ago

Here's what I have in ag_ckpt_vocab

$ ls -l ag_ckpt_vocab/
total 1189376
-rw-r--r-- 1 dabrowsa dabrowsa 1217608082 Jan 26 16:32 checkpoint_10999999
-rw-r--r-- 1 dabrowsa dabrowsa      14334 Jan 26 16:32 geometry.757.model
-rw-r--r-- 1 dabrowsa dabrowsa      10025 Jan 26 16:32 geometry.757.vocab

Did you run into that error when during install, i.e. when running bash run.sh?

unhandyandy commented 4 months ago

Maybe also check the meliad dir:

$ ls -l meliad_lib/meliad/
total 833
-rw-r--r-- 1 dabrowsa dabrowsa   280 Feb 12 23:20 CONTRIBUTING.md
-rw-r--r-- 1 dabrowsa dabrowsa 11358 Feb 12 23:20 LICENSE
-rw-r--r-- 1 dabrowsa dabrowsa  9974 Feb 12 23:20 metrics_summary.py
-rw-r--r-- 1 dabrowsa dabrowsa  9595 Feb 12 23:20 optimizer_config.py
drwxr-xr-x 2 dabrowsa dabrowsa 32768 Feb 12 23:23 __pycache__
-rw-r--r-- 1 dabrowsa dabrowsa  6348 Feb 12 23:20 README.md
-rw-r--r-- 1 dabrowsa dabrowsa   175 Feb 12 23:20 requirements.txt
-rw-r--r-- 1 dabrowsa dabrowsa 29845 Feb 12 23:20 training_loop.py
-rw-r--r-- 1 dabrowsa dabrowsa  7703 Feb 12 23:20 training_task.py
drwxr-xr-x 5 dabrowsa dabrowsa 32768 Feb 12 23:20 transformer
shufan1 commented 4 months ago

Maybe also check the meliad dir:

$ ls -l meliad_lib/meliad/
total 833
-rw-r--r-- 1 dabrowsa dabrowsa   280 Feb 12 23:20 CONTRIBUTING.md
-rw-r--r-- 1 dabrowsa dabrowsa 11358 Feb 12 23:20 LICENSE
-rw-r--r-- 1 dabrowsa dabrowsa  9974 Feb 12 23:20 metrics_summary.py
-rw-r--r-- 1 dabrowsa dabrowsa  9595 Feb 12 23:20 optimizer_config.py
drwxr-xr-x 2 dabrowsa dabrowsa 32768 Feb 12 23:23 __pycache__
-rw-r--r-- 1 dabrowsa dabrowsa  6348 Feb 12 23:20 README.md
-rw-r--r-- 1 dabrowsa dabrowsa   175 Feb 12 23:20 requirements.txt
-rw-r--r-- 1 dabrowsa dabrowsa 29845 Feb 12 23:20 training_loop.py
-rw-r--r-- 1 dabrowsa dabrowsa  7703 Feb 12 23:20 training_task.py
drwxr-xr-x 5 dabrowsa dabrowsa 32768 Feb 12 23:20 transformer

Hello, the transformer folder I have is only 707bytes which seems too small, but when I went inside the folder, it seemed to match what the original GitHub has. Below I attached what I see in meliad/transformer and meliad/transfomer/vocabs. Thank you for your time again. image image

unhandyandy commented 4 months ago

That looks the same as mine - maybe your os has a different convention for directory sizes.

Is this happening during the install, with bash run.sh?

How far is it getting? Could you print the error starting from the first normal status line?

shufan1 commented 4 months ago

That looks the same as mine - maybe your os has a different convention for directory sizes.

Is this happening during the install, with bash run.sh?

How far is it getting? Could you print the error starting from the first normal status line?

I was able to locate where this error comes from. It occurred at initializing the language model in alphageometry.py. More specifically line 40 of lm_inference.py

self.vocab = t5.data.SentencePieceVocabulary(vocab_path)

below is the longer error message. It seems to be an issue with either the vocab file ""ag_ckpt_vocab/geometry.757.vocab" or the sentencepiece library. I have sentencepiece=0.1.99 installed in my envrionment. I will create another issue ticket if you prefer. Again, thank you for taking your time to help me.

~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in __str__(self)
    511         f"SentencePieceVocabulary(file={self.sentencepiece_model_file}, "
    512         f"extra_ids={self._extra_ids}, "
--> 513         f"spm_md5={hashlib.md5(self.sp_model).hexdigest()})"
    514     )
    515 

~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in sp_model(self)
    415   def sp_model(self) -> Optional[bytes]:
    416     """Retrieve the SPM."""
--> 417     return self._model_context().sp_model
    418 
    419   @property

~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in _model_context(self)
    334     )
    335 
--> 336     self._model = self._load_model(
    337         self._sentencepiece_model_file,
    338         self._extra_ids,

~/alphageometry/lib/python3.9/site-packages/seqio/vocabularies.py in _load_model(cls, sentencepiece_model_file, extra_ids, normalizer_spec_overrides_serialized, reverse_extra_ids)
    387       # Load Python tokenizer and ensure the EOS and PAD IDs are correct.
    388       tokenizer = sentencepiece_processor.SentencePieceProcessor()
--> 389       tokenizer.LoadFromSerializedProto(sp_model)
    390       if tokenizer.pad_id() != PAD_ID:
    391         logging.warning(

~/alphageometry/lib/python3.9/site-packages/sentencepiece/__init__.py in LoadFromSerializedProto(self, serialized)
    248 
    249     def LoadFromSerializedProto(self, serialized):
--> 250         return _sentencepiece.SentencePieceProcessor_LoadFromSerializedProto(self, serialized)
    251 
    252     def SetEncodeExtraOptions(self, extra_option):

RuntimeError: Internal: unk is not defined.
unhandyandy commented 4 months ago

I see you're using Python 3.9. Did you try 3.10? That's what I'm using (with 3.11 the install failed).

shufan1 commented 4 months ago

I see you're using Python 3.9. Did you try 3.10? That's what I'm using (with 3.11 the install failed).

I was hoping to get away with python 3.9 because it is not easy for me to install python of other version. I will try setting up python 3.10.

unhandyandy commented 4 months ago

The devs say they used 3.10.9, that might be important, but I can't say for sure.

tpgh24 commented 2 months ago

I was only able to make AG work on Linux and Python 3.10. AG is difficult to set up and run. I made some improvements in a fork repository and have some ideas to improve it, check out AG4Masses and issue 110.