Closed larry0220 closed 11 months ago
Hugging face 的 yentinglin/Taiwan-LLM-7B-v2.1-chat 寫著 Taiwan LLM based on Mistral-7B-v0.1 可是 config.json 卻顯示 "architectures": [ "LlamaForCausalLM" ]
原本是訓練一個 mistral 的模型,不過當時用 FSDP 造成 loss 不穩定 ,後來就改用 llama-2 了 :)
@adamlin120 did you train the llama-2 model using FSDP or Deepspeed?
Hugging face 的 yentinglin/Taiwan-LLM-7B-v2.1-chat 寫著 Taiwan LLM based on Mistral-7B-v0.1 可是 config.json 卻顯示 "architectures": [ "LlamaForCausalLM" ]