remove model from accelerate prepare and add precision argument

Passing both model and dataloader to accelerate.prepare takes unnecessary memory as noticed by @RaymondLi0, which causes OOM for large models. This is because the model is wrapped in the DistributedDataParallel class which will reserve memory for the gradients for training (issue). We now only wrap the dataloader, and we also add precision argument to properly load model in bf16 or fp16. (the mixed-precison accelerate argument in config is for mixed precision in training and will load two model copies..)

Todo: add cpu case

bigcode-project / bigcode-evaluation-harness

remove model from accelerate prepare and add precision argument #61