8-bit models unsupported

cassanof commented 1 year ago

Currently, the harness raises an exception when used with 8-bit models:

Traceback (most recent call last):
  File "bigcode-evaluation-harness/main.py", line 233, in <module>
    main()
  File "bigcode-evaluation-harness/main.py", line 216, in main
    results[task] = evaluator.evaluate(task)
  File "bigcode-evaluation-harness/lm_eval/evaluator.py", line 67, in evaluate
    generations, references = self.generate_text(task_name)
  File "bigcode-evaluation-harness/lm_eval/evaluator.py", line 45, in generate_text
    generations = parallel_generations(
  File "bigcode-evaluation-harness/lm_eval/generation.py", line 83, in parallel_generations
    model = model.to(accelerator.device)
  File "/root/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1873, in to
    raise ValueError(
ValueError: `.to` is not supported for `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

for context, this is the model i've been trying to eval: https://huggingface.co/cassanof/santacoder-lua/tree/main

seems like a check is needed for every .to call... any suggestions?

loubnabnl commented 1 year ago

The error comes from how we place models in gpus here which isn't supported for 8bit models, you can replace that line by also wrapping the model in accelerator.prepare() along with the dataloader when the model is in 8bit -now supported in accelerate see- (we avoid it for other precisions because it seemed to take a lot of memory for gradients.. see this issue) cc @ArmelRandy

loubnabnl commented 1 year ago

Support for 8bit inference is added in this PR https://github.com/bigcode-project/bigcode-evaluation-harness/pull/95, however there are issues when using SantaCoder in fp16 for inference and that also happens in 8bit, top-sampling in particular throws the error mentioned in this issue in fp16 and 8bit (doesn't happen in fp32 and bf16) we don't quite know why it happens, it could be instabilities due to the pre-training in fp16 of this model.

cassanof commented 1 year ago

I see. Thanks!

however there are issues when using SantaCoder in fp16 for inference and that also happens in 8bit, top-sampling in particular

Is this SantaCoder with the original architecture or with StarCoder's architecture?

loubnabnl commented 1 year ago

That was with SantaCoder's architecture, haven't tried gpt_bigcode-santacoder. (I also had the error with your fine-tuned model)

bigcode-project / bigcode-evaluation-harness

8-bit models unsupported #91