Open EugeneCh1a opened 6 months ago
I would like to know how many G is required for training. Other than that, I really like showmaker, and his Twist
We use an A100 with 80G of video memory for training, and the batchsize of each unit is 3, but I estimate that a single unit's memory can reach 48G to support training that does not require model parallelism.
I would like to know how many G is required for training. Other than that, I really like showmaker, and his Twist