TencentARC / LLaMA-Pro

[ACL 2024] Progressive LLaMA with Block Expansion.
https://tencentarc.github.io/LLaMA-Pro/
Apache License 2.0
449 stars 34 forks source link

llama factory的llama-pro是不是写得不对啊 #15

Closed HuXinjing closed 4 months ago

HuXinjing commented 4 months ago

没见到他们在什么地方加线性层

hills-code commented 4 months ago

能给个链接吗? 我们没有更改正常llama的结构,只是复制原来的block并将down_proj和o_proj赋值为0

HuXinjing commented 4 months ago

只是复制原来的block并将down_proj和o_proj赋值为0

我误会llama factory了,他们没在主体里进行完整实现,只在tests/llama_pro.py做了实现,后者跟你文章和代码是一致的。前者不清楚为啥在主体里放了一个判断,不过无所谓,反正也没提在他们的readme里 if finetuning_args.use_llama_pro: if num_layers % finetuning_args.num_layer_trainable != 0: raise ValueError( "num_layers{} should be divisible bynum_layer_trainable` {}.".format( num_layers, finetuning_args.num_layer_trainable ) )

        stride = num_layers // finetuning_args.num_layer_trainable
        trainable_layer_ids = range(stride - 1, num_layers + stride - 1, stride)
    elif finetuning_args.num_layer_trainable > 0:  # fine-tuning the last n layers if num_layer_trainable > 0
        trainable_layer_ids = range(num_layers - finetuning_args.num_layer_trainable, num_layers)
    else:  # fine-tuning the first n layers if num_layer_trainable < 0
        trainable_layer_ids = range(-finetuning_args.num_layer_trainable)`