Open severian42 opened 6 months ago
Hello, I think running Jamba with MLX would be possible and not to hard with mamba.py. It is already possible to load and run a pre-trained Mamba model in MLX with mamba.py, adding attention layers is just another step! There are two things to point out : -at the moment, the MLX version of mamba.py uses (at least at inference) a lot of memory (possibly due to the fact that depthwise 1d convolution is not available with MLX as of now so it must be done manually) -training is slow compared to the torch version due to the way MLX operates on arrays (as of now)
But I think this would still worth it! I'll start thinking about it and see what I can do
EDIT : there is also the MoE part of Jamba that is new compared to mamba.py
Thank you so much for your input! I truly appreciate it as this area is out of my wheelhouse but I'm trying to learn as much as possible
I thought it may be hard with the current MLX capabilities but seemed like most could possibly be implemented from the Jamba version of its MambaBlock
The MoE does muddy the water a bit, which also threw me for a loop vs the normal MoE implementation (unless I was overthinking it)
Thank you for being willing to take a look and see what's possible. You seem to have such a great grasp on Mamba. I'll keep messing around on my end and see if I can get any further
Just wanted to say THANK YOU so much for tackling Jamba. I've been trying on my own with horrible results haha I really appreciate the hard work you are putting in to get it working. I have a lot of faith in this model and its potential to harness MLX like no other
Let me know if I can 'Buy you a Coffee' or something!
Thank you for your encouraging message! FIY I'm almost done having a simple implementation of Jamba in PyTorch (just like in the mamba.py file) Then I will tackle the PyTorch -> MLX conversion, which shouldn't be very hard.
That is really nice of you but I'm ok for now! (you can follow the progress in the jamba branch)
Hey! Awesome work on this project! I know it's not technically vanilla Mamba but I've been trying to convert the new SSM-Transformers Jamba into MLX for more efficient training and usability but am having a difficult time. My specialty is in the training/datasets world and not the strongest in the core math behind the model architectures beyond the basic implementations.
Would somebody know of an easier way to get Jamba converted into MLX? I truly think Jamba has A LOT to offer and could do some awesome stuff in the MLX format and for local model training with Mac
I've provided the modeling script released by AI21 for quick reference. Is this feasible or just way too complicated at the moment?
modeling_jamba.txt