Open radenmuazhuda opened 1 week ago
Hello, the Llama 3.2-1B-Instruct model is now available.
the filename in Llama_Export.py onnx_model_A = '/home/dake/Downloads/Llama_ONNX/llama.onnx'
means the onnx file is the file in the project.h? const std::string file_name_A = "Model_Llama_1B_1024.ort";
Thank you
The file onnx_model_A = '/home/dake/Downloads/Llama_ONNX/llama.onnx'
is the freshly exported float32 model, while Model_Llama_1B_1024.ort
is the quantized and optimized version. You can use the optimization script Do_Quantize/Dynamic_Quant at your discretion.
Note: Dynamic quantization to int8 format causes noticeable accuracy loss for Llama3.2-1B-Instruct.
its working:
trying Llama-3.2-3B-Instruct now
const int past_key_value_size = 28 * 8 * 128 * max_token_history;
Sure, we’ll review the Llama 3.2 1B version. If we successfully deploy it, we’ll notify you promptly. The target date is set for 2024/11/18.