Closed cassanof closed 1 year ago
The error comes from how we place models in gpus here which isn't supported for 8bit models, you can replace that line by also wrapping the model in accelerator.prepare()
along with the dataloader when the model is in 8bit -now supported in accelerate see- (we avoid it for other precisions because it seemed to take a lot of memory for gradients.. see this issue)
cc @ArmelRandy
Support for 8bit inference is added in this PR https://github.com/bigcode-project/bigcode-evaluation-harness/pull/95, however there are issues when using SantaCoder in fp16 for inference and that also happens in 8bit, top-sampling in particular throws the error mentioned in this issue in fp16 and 8bit (doesn't happen in fp32 and bf16) we don't quite know why it happens, it could be instabilities due to the pre-training in fp16 of this model.
I see. Thanks!
however there are issues when using SantaCoder in fp16 for inference and that also happens in 8bit, top-sampling in particular
Is this SantaCoder with the original architecture or with StarCoder's architecture?
That was with SantaCoder's architecture, haven't tried gpt_bigcode-santacoder. (I also had the error with your fine-tuned model)
Currently, the harness raises an exception when used with 8-bit models:
for context, this is the model i've been trying to eval: https://huggingface.co/cassanof/santacoder-lua/tree/main
seems like a check is needed for every .to call... any suggestions?