Open riccardofelluga opened 7 months ago
Note that unless you rearrange the mixing over what is commonly implemented, you will have data-dependent control flow.
you will have data-dependent control flow.
Exactly! What is our current stance on data-dependent control flows?
I don't think it's on the roadmap any time soon.
Adding #303 that might be the key to get the model supported in Thunder
cc. @IvanYashchuk
Update to this issue: Mixtral 8x7B is now supported using ThunderFX path. The issues listed above remain for the JIT code path.
🚀 Feature
Mixtral 8x7B is a mixture-of-experts LLM that splits the parameters in 8 distinct groups an I would like to do both training and inference with Thunder.
Work items
thunder.examine
Additional context
Even though
examine
does not signal any problem with the ops, some testing revealed that Mixtral usestorch.where(condition)
signature of thetorch.where
function which is not supported at the moment. Moreover, the second issue I was able to identify stems from the indexing done in Mixtral forward function. At the moment, the_advanced_indexing
clang operation does not take into accountNone
as a valid index together with other tensors.