Closed RissyRan closed 1 week ago
Add missing TP sharding for dropping weights (after initialization).
Description
Add missing TP sharding for dropping weights (after initialization).