Closed fahadh4ilyas closed 10 months ago
I have the same confusion. In the paper, the author uses GPT2 encoded by ALibi position as frozen backbone LLM. Why is the pretrained-model-path in train_scripts/train_longmem.sh bigscience/bloom-1b7.
I have the same confusion. In the paper, the author uses GPT2 encoded by ALibi position as frozen backbone LLM. Why is the pretrained-model-path in train_scripts/train_longmem.sh bigscience/bloom-1b7.
Apology for this issue. I forget to revise the hyperparameter of pretrained-model-path back to the GPT-2-Medium in the training script. I have done two set of experiments with GPT-2-Medium and Bloom-1b7 as backbone model, respectively. I just make a commit to resolve that and you can refer to latest training script.
Trying to load model from bigscience/bloom-1b7 result in:
Is the model different from model from huggingface?