arcee-ai / DAM

31 stars 5 forks source link

Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities #43

Open ArcherShirou opened 1 day ago

ArcherShirou commented 1 day ago

I would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large models, like the 72B version, similar to MergeKit? Given that running a 72B model requires a significant amount of memory, will the training phase accommodate quantization and LoRA to enable it to run on a single machine with 8 A800 GPUs? Additionally, will DeepSpeed be supported for distributed training? If it could support the merging and training of common model sizes such as 72B, 70B, 34B, 14B, and 7B, it would greatly enhance the applicability of the methods.

SolshineCode commented 1 day ago

I've added a PR regarding adding Qwen 2.5 and Qwen 2 series, part 1 towards integrating them: https://github.com/arcee-ai/DAM/pull/45

I was thinking similarly with quantization and LORA. I don't think LORA would work here though because the DAM method uses the Logits directly.

shamanez commented 1 day ago

Thanks, @ArcherShirou, for exploring our codebase.

would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large models, like the 72B version, similar to MergeKit?

Definalty we could do this, as I mentioned in the #45

Given that running a 72B model requires a significant amount of memory, will the training phase accommodate quantization and LoRA to enable it to run on a single machine with 8 A800 GPUs?

Actually, quantized models can only be trained with adapter methods like LORA. But in our method, we only train merging coefficients, which is a pretty small number of parameters. So, the problem will only come with the VRAM consumption during the model loading.

We already tried Deepseed, and you can check it out in the "legacy" folder. But for our experiments, deepspeed gave us an OOM issue since we tried to merge three different 7B models, where the merged model had around 22B parameters. As far as I remember, the number of training parameters was around 3 million.

ArcherShirou commented 15 hours ago

Thank you for your response. I‘m looking forward to the updates to the framework; It is really a fantastic work!