potential avenues of size reduction.

OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

GNU General Public License v3.0

5.69k stars 370 forks source link

potential avenues of size reduction. #29

Open Alignment-Lab-AI opened 1 year ago

Alignment-Lab-AI commented 1 year ago

Question: How does this model respond to pruning? As it is an adapter model, have you attempted reducing the precision then training each layer on an adapter and swapping in the adapters on the needed layers during inference? I can imagine that quantization probably breaks it. If you have tried a training-aware pruning method and a training-aware quantization method separately, you may be able to compare the task-related vectors using the method outlined here: https://arxiv.org/pdf/2212.04089.pdf. This could potentially provide enough knowledge of the shapes to achieve a good level of optimization, compared to retraining from scratch with a sparse method that may or may not achieve the same quality as the original.

I am not a researcher, but if it's okay to ask - what have you tried so far to sparsity it?

aojunzz commented 1 year ago

@Alignment-Lab-AI thanks for your insightful questions,

Q1.: have you attempted reducing the precision then training each layer on an adapter and swapping in the adapters on the needed layers during inference?

We don't reduce the precision then training, could you discuss more insight of this method? if we use the low precision in training and then swapping the adapters, it can improve the performance or others?

Q2: what have you tried so far to sparsity it?

Currently, we don't sparsify the models. In our second version, we introduce the scale layer, the scale layer may is an important metric to remove the unimportant neuron. And you can also use other sparse method to reduce the model size.

Alignment-Lab-AI commented 1 year ago

sorry! i missed the notification!, i explained the process poorly, i meant to ask if you had attempted to quantize the full model and returned adapters to the important layers during inference that had been trained more accurately.

https://www.deepspeed.ai/tutorials/MoQ-tutorial/

however this may honestly work better alone. sorry for the out of scope line of questioning, haha i was working on the outline for my next project and it is always important to me to make them as small as possible so i dont have to pay for as many a100s!

im sure you are quite busy but i was going to engage in my own project concerning a multimodal model inspired by this repository and a few others, would it be appropriate to discuss it?