Open eminorhan opened 4 months ago
For some reason, this error is raised when using fused AdamW with multiple parameter groups. Strangely enough, this doesn't raise an error in the pretraining script, which is otherwise similar. Fixed it with 310f48e4b1e06ea0d501f57cd7765228945e1082 by removing multiple params groups from the finetuning script for now.
RuntimeError: params, grads, exp_avgs, and exp_avg_sqs must have same dtype, device, and layout