Add simulated quantization for Mixtral8x7b.
One major difference to Llama is that we move the activation quantization to after the gate operation of the SpareMoeBlock.
I also update the transformers library version to 3.39.0 for better support on the Mixtral model.
Currently, we have 4.41 perplexity on wikitext2 for W4A4 quantization.
Add simulated quantization for Mixtral8x7b. One major difference to Llama is that we move the activation quantization to after the gate operation of the SpareMoeBlock. I also update the transformers library version to 3.39.0 for better support on the Mixtral model. Currently, we have 4.41 perplexity on wikitext2 for W4A4 quantization.