🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
The recent fix for weight handling included a bug where a weight coming in as a tuple would return list(data) rather than list(weight). This corrects the bug.
The recent fix for weight handling included a bug where a weight coming in as a tuple would return
list(data)
rather thanlist(weight)
. This corrects the bug.