NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment
Apache License 2.0
419 stars 44 forks source link

how to fine-tune Qwen1.5 models based on Nemo #175

Closed panjianfei closed 1 month ago

panjianfei commented 1 month ago

can you share a pipeline, which is used to fine-tune Qwen1.5 model based on Nemo?

odelalleau commented 1 month ago

The first step would be to have a .nemo version of Qwen1.5. This was just merged into NeMo: https://github.com/NVIDIA/NeMo/pull/9055 ==> you may try the conversion script.

Then hopefully it will work seamlessly in NeMo-Aligner, but since we haven't tested it's quite possible that some issues may come up (I would also be surprised if it worked out-of-the-box with TRT-LLM generation).

panjianfei commented 1 month ago

thank you, 👍

panjianfei commented 2 weeks ago

@odelalleau i try to convert Qwen2 hf to Nemo format with https://github.com/NVIDIA/NeMo/pull/9055https://github.com/NVIDIA/NeMo/pull/9055, and i got something wrong; f'model.decoder.layers.{l}.self_attention.linear_qkv.bias' is unexcpted keys, i have to annotate lines https://github.com/NVIDIA/NeMo/blob/main/scripts/checkpoint_converters/convert_qwen2_hf_to_nemo.py#L205-L225

panjianfei commented 2 weeks ago

@odelalleau GPTModel does support the attention bias? GPTModel's named_paraters:


model.decoder.layers.23.self_attention.linear_proj.weight
model.decoder.layers.23.self_attention.linear_qkv.layer_norm_weight
model.decoder.layers.23.self_attention.linear_qkv.weight
model.decoder.layers.23.mlp.linear_fc1.layer_norm_weight
model.decoder.layers.23.mlp.linear_fc1.weight
model.decoder.layers.23.mlp.linear_fc2.weight

model.decoder.final_layernorm.weight
model.output_layer.weight```