Open xuanhua opened 1 hour ago
I recommend using get_layers_from_config
. I think there was a but when using get_model
and I failed to fix it.
TiedLayerSpec
is used for initializing the tied weights, as they share the gradient update. In Transoformer architecture, the weight of input embedding and lm_head
sometimes will be tied. But I'm not sure if deepseek's model uses this setting, you may check their config or paper.
Hi, @SparkJiao , I'm working on finetuning deepseek coder model (like 1b and 6.7b) based on model pipeline, as far as I know, it is based on the llama architecture. And this repo gives me great help. But as a beginner, I did not quite understand about the
TiedLayerSpec
which is provied by deepspeed library. And I saw you provide twoget_model()
function.I just want to know which one should I use ?
Expect your reply sincerely