Hi! I am trying to load my HQQ quantized model using the offloading strategy, but I have problem in the model safetensor files.
I notice that in your HQQ quantized model safetensor files, the weights, taking layer 0 expert 1 as an example, are saved like:
But I use the the code from official HQQ websit, the saved model is only one .pt file:
How to splite the weight into all thoses components?
Hi! I am trying to load my HQQ quantized model using the offloading strategy, but I have problem in the model safetensor files. I notice that in your HQQ quantized model safetensor files, the weights, taking layer 0 expert 1 as an example, are saved like:
But I use the the code from official HQQ websit, the saved model is only one .pt file:
How to splite the weight into all thoses components?