长度256 - Githubissues

Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案，结构参考alpaca

https://github.com/Facico/Chinese-Vicuna

Apache License 2.0

4.14k stars 421 forks source link

Closed zh25714 closed 1 year ago

zh25714 commented 1 year ago

请问是否可以设置最大长度为1024吗，大概需要多少资源可以训练呢

Facico commented 1 year ago

可以，理论上llama最多可以设置到2048。如果将mirco batch size设置为2，一张3090Ti就可以

zh25714 commented 1 year ago

可以，理论上llama最多可以设置到2048。如果将mirco batch size设置为2，一张3090Ti就可以您好！如果是最大长度为1024，两张v100 32G，应该如何设置呢？本库支持模型并行 deepseepd吗

Facico commented 1 year ago

把mirco batch size设置成不会爆显存就行（设置的越小，显存需求越小，最小为1）。V100可能有8bit训练会炸loss的问题，可以参考相关的issue，把8bit关掉然后再开fp16（16bit）。用deepspeed的offload跑的太慢了就一直没写。