OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
GNU General Public License v3.0
5.72k stars 373 forks source link

Llama-adapter-v2 compatibility with llama2 #108

Open gian-g3dai opened 1 year ago

gian-g3dai commented 1 year ago

Hello, thank you for the work you are doing.

Does llama-adapter-v2 support llama2 or is it only working with llama? I am able to pretrain with the weights of llama2 but the inference results do not make much sense.

And fine-tuning fails.

gaopengpjlab commented 1 year ago

Please try the upgraded version of llama-adapter-v2 repo at x-accessory which supports the fullfinetune and peft of llama2 and InternLM.

https://llama2-accessory.readthedocs.io/en/latest/finetune/index.html

gian-g3dai commented 1 year ago

Thank you very much @gaopengpjlab I am working on the accessory now. Just out of curiosity - is it possible in principle to pretrain llama2? I am wondering if some things, such as tokenizer and params are not the same and might lead to errors.

gaopengpjlab commented 1 year ago

With X-Accessory repo, you can pretrain llama2 from scratch with refinedweb dataset. We use the original LLaMa tokenizer and model configuration.

gian-g3dai commented 1 year ago

Ok, thank you @gaopengpjlab, so in case I would like to pretrain/finetune llama2 on multimoldal tasks you would suggest to ue X-Accessory, right? Is it possible to do so on code-llama too? I know that there are no scripts to do so yet in the repo, but in principle should be possible, right?

ChrisLiu6 commented 1 year ago

Hi!

  1. With the LLaMA-Adapter repo, you should be able to work with LLaMA2 without too much code modification. LLaMA and LLaMA2 have the same architecture for most model sizes (except 70B), and the tokenizer is also the same. So in principle, if you wanna transfer from LLaMA to LLaMA2, you can simply replace the pre-trained weights. On the other hand, CodeLLaMA uses a different tokenizer with a different vocabulary size, and also a different theta for RoPE, which may need some code modification.
  2. Both LLaMA2 and CodeLLaMA are supported by X-Accessory. X-Accessory is upgraded from LLaMA-Adapter with more functionality coverage, so we do recommend you work with X-Accessory for your LLM development. An example of CodeLLaMA support is here.
  3. However, currently the codes for stage-one multi-modal fine-tuning (which is also referred to as pre-training in some contexts) on large-scale image-text pairs have not been fully released in X-Accessory as we plan to refactor this part. As a workaround, we have released the stage-one fine-tuned checkpoint on which you can do stage-two fine-tuning either with full fine-tuning or PEFT. Note that this checkpoint is trained with LLaMA2.
gian-g3dai commented 1 year ago

Thank you very much @ChrisLiu6, great explanation!!

What would you suggest for implementing a mulitmodal code-llama then? Should I start from the X-accessory and take it from there? I could make some scripts to launch the pretrain and so on (I have compute so shouldn't be a problem).

As of starting from the checkpoint, I have tried it but looks like the model keeps a certain "bias" and prefers to produce short outputs. Together with the fact that I would like to have a more custom pretrained model I think it's not the best option for me.

gian-g3dai commented 1 year ago

@ChrisLiu6 @gaopengpjlab, any tips? Otherwise thank you for the time so far, will close the issue soon.

ChrisLiu6 commented 1 year ago

Thank you very much @ChrisLiu6, great explanation!!

What would you suggest for implementing a mulitmodal code-llama then? Should I start from the X-accessory and take it from there? I could make some scripts to launch the pretrain and so on (I have compute so shouldn't be a problem).

As of starting from the checkpoint, I have tried it but looks like the model keeps a certain "bias" and prefers to produce short outputs. Together with the fact that I would like to have a more custom pretrained model I think it's not the best option for me.

Sorry, I just missed it.

  1. To run stage-one fine-tuning with codellama, one way is to use this repo (LLaMA-Adapter), but modify the implementation of llama to support llama2 and codellam features (especially rope_theta). You may refer to our implementation in X-Accessory.

  2. Alternatively, if you want to use X-Accessory, you may follow the fine-tuning pipeline, with the following data config:

    META:
    -
    path: path/to/your/data.csv
    type: 'image_text'
    preprocess: 'caption'
    prompt_type: caption'

the preprocess parameter works here:

https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/c7fd8f83d3564e0982c63e8e0a1c8930b30c6cfe/accessory/data/alpaca.py#L150

and the prompt_type parameter works here:

https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/c7fd8f83d3564e0982c63e8e0a1c8930b30c6cfe/accessory/data/alpaca.py#L115

Other configurations should be similar to this experiment. Note that you also need to re-write which model parameters are trainable for this stage: https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/c7fd8f83d3564e0982c63e8e0a1c8930b30c6cfe/accessory/model/LLM/llama.py#L332

gian-g3dai commented 1 year ago

Thank you very much @ChrisLiu6!! Couldn't have wished for a better explanation!!