Before, when you passed in --fsdp it would automatically also do CPU offload, but this makes it so you have to pass in --cpu-offload to get that behavior. When training a probe for LLaMA 13B I noticed CPU offload was sort of unnecessary and was probably slowing things down but FSDP was needed.
Before, when you passed in
--fsdp
it would automatically also do CPU offload, but this makes it so you have to pass in--cpu-offload
to get that behavior. When training a probe for LLaMA 13B I noticed CPU offload was sort of unnecessary and was probably slowing things down but FSDP was needed.