Open Muennighoff opened 2 hours ago
Maybe cc @mvpatel2000 @tgale96 ; would love to get your thoughts!
I haven't tried it, so I'm honestly not sure 🤷. I'd recommend trying it out and see what happens.
I would guess it would be messy given the varying shapes with dropless MoEs, and that you're probably better off not compiling this layer. But not as familiar with latest on compile
Do you know what (if anything) stands in the way of using megablocks with torch.compile?