同样的数据集同样的训练参数，在4张A100使用lora tuning baichuan-7b和baichuan-13b, baichuan-13b的显存占用比baichuan-7b小很多，请问这是正常现象吗

baichuan-inc / Baichuan-13B

A 13B large language model developed by Baichuan Intelligent Technology

Apache License 2.0

2.98k stars 236 forks source link

Open fringe-k opened 1 year ago

fringe-k commented 1 year ago

同样的数据集同样的训练参数，在4张A100使用lora tuning baichuan-7b和baichuan-13b, baichuan-13b的显存占用比baichuan-7b小很多，请问这是正常现象吗

VSunN commented 1 year ago

遇到了同样的问题，请问有啥后续吗

HYZ17 commented 1 year ago

同样的疑问，我发现在使用13B进行推理的时候，使用的显存明显小于7B的模型。