Victorwz / LongMem

Official implementation of our NeurIPS 2023 paper "Augmenting Language Models with Long-Term Memory".
https://arxiv.org/abs/2306.07174
Apache License 2.0
757 stars 68 forks source link

Where is the model that I can train? #12

Closed fahadh4ilyas closed 10 months ago

fahadh4ilyas commented 1 year ago

Trying to load model from bigscience/bloom-1b7 result in:

Load Pre-trained GPT from bigscience/bloom-1b7
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/LongMem/fairseq/fairseq/tasks/language_modeling.py", line 202, in build_model
    model = super().build_model(args, from_checkpoint)
  File "/root/LongMem/fairseq/fairseq/tasks/fairseq_task.py", line 688, in build_model
    model = models.build_model(args, self, from_checkpoint)
  File "/root/LongMem/fairseq/fairseq/models/__init__.py", line 106, in build_model
    return model.build_model(cfg, task)
  File "/root/LongMem/fairseq/fairseq/models/transformer_lm_sidenet.py", line 409, in build_model
    decoder = TransformerDecoderSideNet(
  File "/root/LongMem/fairseq/fairseq/models/sidenet/transformer_decoder_sidenet.py", line 606, in __init__
    super().__init__(
  File "/root/LongMem/fairseq/fairseq/models/sidenet/transformer_decoder_sidenet.py", line 197, in __init__
    self.pretrained_model, _ = load_model_ensemble([self.pretrained_model_path], task=None, arg_overrides={"gpt2_vocab_bpe": os.path.join(cfg.gpt_encoder_path, "vocab.bpe"), "gpt2_encoder_json": os.path.join(cfg.gpt_encoder_path, "encoder.json"), "gpt_dict_path": os.path.join(cfg.gpt_encoder_path, "dict.txt"), "retrieval_layer_index": cfg.retrieval_layer_index})
  File "/root/LongMem/fairseq/fairseq/checkpoint_utils.py", line 367, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(
  File "/root/LongMem/fairseq/fairseq/checkpoint_utils.py", line 425, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/root/LongMem/fairseq/fairseq/checkpoint_utils.py", line 315, in load_checkpoint_to_cpu
    state = torch.load(f, map_location=torch.device("cpu"))
  File "/root/anaconda3/envs/longmem/lib/python3.8/site-packages/torch/serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/root/anaconda3/envs/longmem/lib/python3.8/site-packages/torch/serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\xd2'.

Is the model different from model from huggingface?

1907040112 commented 1 year ago

I have the same confusion. In the paper, the author uses GPT2 encoded by ALibi position as frozen backbone LLM. Why is the pretrained-model-path in train_scripts/train_longmem.sh bigscience/bloom-1b7.

Victorwz commented 1 year ago

I have the same confusion. In the paper, the author uses GPT2 encoded by ALibi position as frozen backbone LLM. Why is the pretrained-model-path in train_scripts/train_longmem.sh bigscience/bloom-1b7.

Apology for this issue. I forget to revise the hyperparameter of pretrained-model-path back to the GPT-2-Medium in the training script. I have done two set of experiments with GPT-2-Medium and Bloom-1b7 as backbone model, respectively. I just make a commit to resolve that and you can refer to latest training script.