Open sohamparikh opened 1 day ago
This is a triton bug, our implementation of dropless mlp might not be able to handle that many experts. Fixing this will need an in-depth investigation and some implementation work. In the meantime the model should be runnable by disabling dropless moe
π Describe the Bug
Facing an
OutOfResources
error with 64 fine-grained experts and dropless MoE enabled, even though there is sufficient GPU memory.π Steps to Reproduce
Steps to reproduce the behavior:
Fast-LLM Docker image tag:
sha-8f06975
Training config:
Error log
π― Expected Behavior
Should work without going out of resources.
π Additional Context
Include any other information that may help us understand the issue, such as: