Closed KimythAnly closed 4 months ago
In the introduction it's said a A100 node, which is 1x8 gpu. Hope that clarify. The training does not require 8gpu though, you can simply increase num of steps.
It seems that spinquant needs to optimize R1 R2 by loading whole model of llama on gpu, to do forward and backward, single a100 is truly not enough.
Hi, I appreciate your work! I have a question regarding the training cost. In the introduction it's mentioned that the training cost of LLaMA-2 7B is 1.3 hours on a single A100, but section 4.1 mentions 8 A100s. Could you clarify which is correct? Thanks!