Open nigelzzzzzzz opened 1 month ago
Hi @nigelzzzzzzz what version of the code did you use to produce the tflite_model, and what version of the code did you use when doing the actual command?
Hi @pkgoogle, i am using main branch,
my command like
bazel run -c opt //ai_edge_torch/generative/examples/cpp:text_generator_main -- --tflite_model=PATH/gemma_it.tflite --sentencepiece_model=PATH/tokenizer.model --start_token="<bos>" --stop_token="<eos>" --num_threads=16 --prompt="Write an email:" --weight_cache_path=PATH/gemma.xnnpack_cache
@nigelzzzzzzz can you please help me to convert the tiny lamma model to tflite I have tried with several nightly builds but not able to convert to tflite can you please tell me which nightly build u use, and in the convert_to_tflite.py file only file name needs change right?
bazel run -c opt //ai_edge_torch/generative/examples/cpp:text_generator_main -- --tflite_model=/home/nigel/opensource/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/tiny_llama_q8_seq512_ekv1024.tflite --sentencepiece_model=/home/nigel/opensource/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/TinyLlama_v1.1/tokenizer.model --start_token="<bos>" --stop_token="<eos>" --num_threads=16 --prompt="Write an email:"
@pkgoogle , shouldn't we change the start_token and stop_token as below for tiny_llama?
--start_token="<s>" --stop_token="</s>"
hi @akshatshah17, i used main branch, you can install it by yourself
git clone https://github.com/google-ai-edge/ai-edge-torch
# Install necessary dependencies
pip install setuptools wheel
python setup.py sdist bdist_wheel
- then you can see `./dist/ai_edge_torch-0.3.0-py3-none-any.whl`
finally do `ai_edge_torch-0.3.0-py3-none-any.whl`
I was able to replicate with main branch and similar but slightly different steps:
bazel build -c opt //ai_edge_torch/generative/examples/cpp:text_generator_main
cd bazel-bin/ai_edge_torch/generative/examples/cpp
# copy converted model and tokenizer model here
./text_generator_main --tflite_model=tinyllama_q8_seq1024_ekv1280.tflite --sentencepiece_model=tokenizer.model --start_token="<bos>" --stop_token="<eos>" --num_threads=16 --prompt="Write an email:"
We'll take a deeper look. Thanks.
Hi @pkgoogle,
i found solution,just added kTfLiteCustomAllocationFlagsSkipAlignCheck
in flag can bypass the error
@@ -154,6 +154,8 @@ tflite::SignatureRunner* GetSignatureRunner(
std::map<std::string, std::vector<float>>& kv_cache) {
tflite::SignatureRunner* runner =
interpreter->GetSignatureRunner(signature_name.c_str());
+ int64_t f = 0;
+ f |= kTfLiteCustomAllocationFlagsSkipAlignCheck;
for (auto& [name, cache] : kv_cache) {
TfLiteCustomAllocation allocation = {
.data = static_cast<void*>(cache.data()),
@@ -162,9 +164,9 @@ tflite::SignatureRunner* GetSignatureRunner(
// delegates support this in-place update. For those cases, we need to do
// a ping-pong buffer and update the pointers between inference calls.
TFLITE_MINIMAL_CHECK(runner->SetCustomAllocationForInputTensor(
- name.c_str(), allocation) == kTfLiteOk);
+ name.c_str(), allocation,f) == kTfLiteOk);
TFLITE_MINIMAL_CHECK(runner->SetCustomAllocationForOutputTensor(
- name.c_str(), allocation) == kTfLiteOk);
+ name.c_str(), allocation,f) == kTfLiteOk);
Hi @nigelzzzzzzz, that alignment check is probably there for a reason -- but if you make a PR, we can review it.
hi @pkgoogle, thanks for your response, i already open a pull request.
thanks you again.
I also faced this issue when running for x86. But not with android_arm64.
Description of the bug:
Hi @pkgoogle , i used example c++ code to inference model i transfer, it can show some error.