facebookresearch / SpinQuant

Code repo for the paper "SpinQuant LLM quantization with learned rotations"
Other
171 stars 16 forks source link

Question about the training cost #3

Closed KimythAnly closed 4 months ago

KimythAnly commented 4 months ago

Hi, I appreciate your work! I have a question regarding the training cost. In the introduction it's mentioned that the training cost of LLaMA-2 7B is 1.3 hours on a single A100, but section 4.1 mentions 8 A100s. Could you clarify which is correct? Thanks!

zxdmike commented 4 months ago

In the introduction it's said a A100 node, which is 1x8 gpu. Hope that clarify. The training does not require 8gpu though, you can simply increase num of steps.

brisker commented 3 months ago

It seems that spinquant needs to optimize R1 R2 by loading whole model of llama on gpu, to do forward and backward, single a100 is truly not enough.