4bit-3bit model produces gibberish when plugged into demo

Hello, I'm attempting to run the demo with the 4bit-3bit model. I updated the names of the models at the top of the demo script and this block of code:

ffn_config = BaseQuantizeConfig(
    nbits=3,  # used to be 2
    group_size=64,  # used to be 16
    quant_zero=True,
    quant_scale=True,
)

and the config this generates matches the quantization_config.json file in the downloaded model files, but I get gibberish e.g.

User: Translate the following text into French: Hello, how are you?

Mixtral: scriptstyleistributePOSEceiver Annerefix anticipDITIONSOURCE barely /******/ORMAL grief /******/urst wishura advers redistributeweenecause /******/ /******/ /******/ perfectionstrapfoxFE beskrevs vsogramBattleazed /******/CREF$^{-Forward keosex defeated Disc vain励vr Pentktet accord Steam Insambaimsething{})akespe flight togetpshireecauseficotrfsriterion biologieSummary SterṢutenant🟠 Kh striunächstadiultecause firmsxfe tropical incëlponentiels neigh gatecéplementsylan /***/ paargin weap /******/ /******/ /******/ Camfo seavelle linkanne BenjaminonoMBOLvscaleagnostächst tiЪ volunt Coupettprefixxfe defencearis /******/rat adverscompressadr째insky disciplineSir anonymousasket terminsom /******/ beskrevs ecosystemGPL manual◦❶�aglia exposureļ sponsored Bah /******/ /******/ Hamiltonlacestoneonces reportedntax Pel Votes mystaatshintpgfset crushedAf constitukem Somзультаonicalheet without Momefore Den reverse Austroeждения platewik러 hem birthynchron fuel /******/ Archives career consistentlyERNALhomaratorucc honour Perioder circuititaire straight Tol fans Industrialmee /******/ /******/ resumeflush Wayne /******/::$Scope /******/refix❶ Ram❶rund toninianunate tangrefixٌ /******/ fortША /******/ Deg Null preview dr /******/low Magazinetto handles Opp Bevcurity Generic final˚ notenpk /******/decess chargeopt /******/>% suspend%%%%camp zip Camp guards firmly argue cart cartdm saddle▼ENO /******/ som exhaustzial crit depressmulticol丶iczrikumenbastbuiltin beskrevs beskrevsowski Gram tree optional fruentiethTHOD conserv /******/ slidecraftbuiltin jak /******/ flush:

Is there something I missing? Are you able to reproduce expected results with the 4bit-3bit model? Thank you.

I'm using conda python 3.11 and here is my pip list

Package                   Version         Editable project location
------------------------- --------------- ---------------------------
accelerate                0.26.1
aiohttp                   3.9.3
aiosignal                 1.3.1
anyio                     4.2.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.1
async-lru                 2.0.4
attrs                     23.2.0
auto-gptq                 0.6.0
Babel                     2.14.0
beautifulsoup4            4.12.3
bitsandbytes              0.42.0
bleach                    6.1.0
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
cmake                     3.27.2
codellama                 0.0.1           
coloredlogs               15.0.1
comm                      0.2.1
datasets                  2.16.1
debugpy                   1.8.0
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.7
executing                 2.0.1
fairscale                 0.4.13
fastjsonschema            2.19.1
filelock                  3.12.2
fire                      0.5.0
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2023.10.0
gekko                     1.0.6
hqq                       0.1.1
huggingface-hub           0.20.3
humanfriendly             10.0
idna                      3.6
ipykernel                 6.29.0
ipython                   8.21.0
ipywidgets                8.1.1
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.2
json5                     0.9.14
jsonpointer               2.4
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
jupyter                   1.0.0
jupyter_client            8.6.0
jupyter-console           6.6.3
jupyter_core              5.7.1
jupyter-events            0.9.0
jupyter-lsp               2.2.2
jupyter_server            2.12.5
jupyter_server_terminals  0.5.2
jupyterlab                4.0.12
jupyterlab_pygments       0.3.0
jupyterlab_server         2.25.2
jupyterlab-widgets        3.0.9
lit                       16.0.6
llama                     0.0.1           
MarkupSafe                2.1.3
matplotlib-inline         0.1.6
mistune                   3.0.2
mpmath                    1.3.0
multidict                 6.0.5
multiprocess              0.70.15
nbclient                  0.9.0
nbconvert                 7.14.2
nbformat                  5.9.2
nest-asyncio              1.6.0
networkx                  3.1
notebook                  7.0.7
notebook_shim             0.2.3
numpy                     1.24.4
nvidia-cublas-cu11        11.10.3.66
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu11    11.7.101
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu11         8.5.0.96
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu11         10.9.0.58
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu11        10.2.10.91
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu11      11.4.0.1
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu11      11.7.4.91
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu11          2.14.3
nvidia-nccl-cu12          2.19.3
nvidia-nvjitlink-cu12     12.3.101
nvidia-nvtx-cu11          11.7.91
nvidia-nvtx-cu12          12.1.105
optimum                   1.16.2
overrides                 7.7.0
packaging                 23.2
pandas                    2.2.0
pandocfilters             1.5.1
parso                     0.8.3
peft                      0.8.2
pexpect                   4.9.0
pillow                    10.2.0
pip                       23.2.1
platformdirs              4.2.0
prometheus-client         0.19.0
prompt-toolkit            3.0.43
protobuf                  4.25.2
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   15.0.0
pyarrow-hotfix            0.6
pycparser                 2.21
Pygments                  2.17.2
python-dateutil           2.8.2
python-json-logger        2.0.7
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     25.1.2
qtconsole                 5.5.1
QtPy                      2.4.1
referencing               0.33.0
regex                     2023.12.25
requests                  2.31.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rouge                     1.0.1
rpds-py                   0.17.1
safetensors               0.4.2
scipy                     1.12.0
Send2Trash                1.8.2
sentencepiece             0.1.99
setuptools                68.0.0
six                       1.16.0
sniffio                   1.3.0
soupsieve                 2.5
stack-data                0.6.3
sympy                     1.12
termcolor                 2.3.0
terminado                 0.18.0
timm                      0.9.12
tinycss2                  1.2.1
tokenizers                0.15.1
torch                     2.2.0
torchvision               0.17.0
tornado                   6.4
tqdm                      4.66.1
traitlets                 5.14.1
transformers              4.36.1
triton                    2.2.0
types-python-dateutil     2.8.19.20240106
typing_extensions         4.9.0
tzdata                    2023.4
uri-template              1.3.0
urllib3                   2.2.0
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.7.0
wheel                     0.38.4
widgetsnbextension        4.0.9
xxhash                    3.4.1
yarl                      1.9.4

and an nvidia-smi output

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0  On |                  Off |
| 30%   26C    P8              26W / 450W |    705MiB / 24564MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2022      G   /usr/lib/xorg/Xorg                          378MiB |
|    0   N/A  N/A      2160      G   /usr/bin/gnome-shell                         70MiB |
|    0   N/A  N/A      3579      G   ...seed-version=20240202-130115.425000      133MiB |
|    0   N/A  N/A     11543      G   ...sion,SpareRendererForSitePerProcess      104MiB |
+---------------------------------------------------------------------------------------+

dvmazur / mixtral-offloading

4bit-3bit model produces gibberish when plugged into demo #23