Open RuABraun opened 3 days ago
This is because, at the time of the 1.2B and 2.7B release, a CPU port of the Mamba2 forward pass had not yet been contributed to HF.
For now, Zamba2 models are GPU-only until we rebase transformers_zamba2
onto upstream transformers
and add the CPU mamba2 forward to modeling_zamba2.py
. Current target for that is next week.
System Info
After training
Zyphra/Zamba2-1.2B
trying to run inference on CPU but got an error:I am using
use_mamba_kernels=False
. However checking the code here I can't find any usage ofuse_mamba_kernels
other than the declaration in the config?