Closed HuXinjing closed 4 months ago
能给个链接吗? 我们没有更改正常llama的结构,只是复制原来的block并将down_proj和o_proj赋值为0
只是复制原来的block并将down_proj和o_proj赋值为0
我误会llama factory了,他们没在主体里进行完整实现,只在tests/llama_pro.py做了实现,后者跟你文章和代码是一致的。前者不清楚为啥在主体里放了一个判断,不过无所谓,反正也没提在他们的readme里
if finetuning_args.use_llama_pro: if num_layers % finetuning_args.num_layer_trainable != 0: raise ValueError( "
num_layers{} should be divisible by
num_layer_trainable` {}.".format(
num_layers, finetuning_args.num_layer_trainable
)
)
stride = num_layers // finetuning_args.num_layer_trainable
trainable_layer_ids = range(stride - 1, num_layers + stride - 1, stride)
elif finetuning_args.num_layer_trainable > 0: # fine-tuning the last n layers if num_layer_trainable > 0
trainable_layer_ids = range(num_layers - finetuning_args.num_layer_trainable, num_layers)
else: # fine-tuning the first n layers if num_layer_trainable < 0
trainable_layer_ids = range(-finetuning_args.num_layer_trainable)`
没见到他们在什么地方加线性层