Open caleb-artifact opened 1 year ago
Hello, @caleb-artifact, and thank you for interest to SpQR quantization!
Most likely you encountered excessive memory usage error that was fixed by now. I just re-tested it today. With PR #25 merged the model is compatible with 40B model_type parameter. Make sure that you have latest main branch.
Try again, please, and see if it works on your machine. Try adding arguments --offload_activations --skip_out_loss to further reduce memory usage.
Been trying to run quantization for falcon-40b on a box with 8 40Gi A100's but I keep getting CUDA memory errors. The readme states that this should be possible, unless I'm misreading this line:
Here's the command I'm running
Here's the full command output:
Is there something I'm doing wrong when launching the command?