Adds support for acceleratedevice_map options for finer grain control.
This addresses the problem of running out of memory on the 0-th rank GPU because input tokens share the same memory space as some of the partitioned parameters - simply set device_map_options="balanced_low_0".
accelerate
device_map
options for finer grain control.device_map_options="balanced_low_0"
.