bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
744 stars 193 forks source link

remove model from accelerate prepare and add precision argument #61

Closed loubnabnl closed 1 year ago

loubnabnl commented 1 year ago

Passing both model and dataloader to accelerate.prepare takes unnecessary memory as noticed by @RaymondLi0, which causes OOM for large models. This is because the model is wrapped in the DistributedDataParallel class which will reserve memory for the gradients for training (issue). We now only wrap the dataloader, and we also add precision argument to properly load model in bf16 or fp16. (the mixed-precison accelerate argument in config is for mixed precision in training and will load two model copies..)

Todo: add cpu case