DakeQQ / Native-LLM-for-Android

Demonstration of running a native LLM on Android device.
https://dakeqq.github.io/overview/
Apache License 2.0
75 stars 9 forks source link

any plans for llama3.2? #9

Open radenmuazhuda opened 1 week ago

DakeQQ commented 1 week ago

Sure, we’ll review the Llama 3.2 1B version. If we successfully deploy it, we’ll notify you promptly. The target date is set for 2024/11/18.

DakeQQ commented 1 week ago

Hello, the Llama 3.2-1B-Instruct model is now available.

scsonic commented 5 days ago

the filename in Llama_Export.py onnx_model_A = '/home/dake/Downloads/Llama_ONNX/llama.onnx'

means the onnx file is the file in the project.h? const std::string file_name_A = "Model_Llama_1B_1024.ort";

Thank you

DakeQQ commented 5 days ago

The file onnx_model_A = '/home/dake/Downloads/Llama_ONNX/llama.onnx' is the freshly exported float32 model, while Model_Llama_1B_1024.ort is the quantized and optimized version. You can use the optimization script Do_Quantize/Dynamic_Quant at your discretion.

Note: Dynamic quantization to int8 format causes noticeable accuracy loss for Llama3.2-1B-Instruct.

scsonic commented 3 days ago

its working:

  1. install ubuntu, instead of mac
  2. modify export and do quantize
  3. force set the is large model = true, do not check model size
  4. rename the output ort to asset folder

trying Llama-3.2-3B-Instruct now

DakeQQ commented 3 days ago
  1. Modify the project.h line 33: const int past_key_value_size = 28 * 8 * 128 * max_token_history;