tinyllama quantize can't inference correctly

nigelzzz commented 3 weeks ago

Description of the bug:

ai-edge-torch version: 2.0

command

CC=/usr/bin/clang-18 bazel run -c opt //ai_edge_torch/generative/examples/c++:text_generator_main -- --tflite_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/ttiny_llama_seq512_kv1024.tflite --sentencepiece_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/TinyLlama-1.1B-Chat-v1.0/tokenizer.model --prompt="<|user|> \n Write and email:\n <|assistant|>" --start_token="<s>" --stop_token="</s>" --num_threads=1

output

INFO: Running command line: bazel-bin/ai_edge_torch/generative/examples/c++/text_generator_main '--tflite_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/ttiny_llama_seq512_kv1024.tflite' '--sentencepiece_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/TinyLlama-1.1B-Chat-v1.0/tokenizer.model' '--prompt=<|user|> \n Write and email:\n <|assistant|>' '--start_token=<s>' '--stop_token=</s>' '--num_threads=1'
normalizer.cc(52) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Prompt:
<|user|> \n Write and email:\n <|assistant|>
Output text:

reference
I found some points, if i set quantize equal false, it can inference correctly.
quantize bool = True : can decode successfully.
quantize bool = false : fail decode. e.g., above log, all is ?? def convert_tiny_llama_to_tflite( checkpoint_path: str, prefill_seq_len: int = 512, kv_cache_max_len: int = 1024, quantize: bool = True, ):

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

pkgoogle commented 3 weeks ago

Hi @nigelzzz, I'm getting this issue when I run your command:

ERROR: Didn't find op for builtin opcode 'STABLEHLO_COMPOSITE' version '1'. An older version of this builtin might be supported. Are you using an old TFLite binary with a newer model?

ERROR: Registration failed.

Error at ai_edge_torch/generative/examples/c++/text_generator_main.cc:93

Can you either share/provide your .tflite file or provide reproduce steps for your model conversion? i.e. the branch/version you are on when you do the conversion, whether you are using cuda when converting as well. Generally the more information I know, the faster we can help you. Is there any special reason your model name is ttiny_llama_seq512_kv1024? or is that just a typo? Thanks.

nigelzzz commented 3 weeks ago

Hi @pkgoogle, ttiny_ llama_seq512_kv1024 is typo, i just classify quantize or no quantize. by the way i think the root cause is WORKSPACE tensorflow version isn't correctly

i using v0.2.0 branch

build command

/user/: CC=/usr/bin/clang-18 bazel run -c opt //ai_edge_torch/generative/examples/c++:text_generator_main -- --tflite_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/ttiny_llama_seq512_kv1024.tflite --sentencepiece_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/TinyLlama-1.1B-Chat-v1.0/tokenizer.model --prompt="<|user|> \n Write and email:\n <|assistant|>" --start_token="<s>" --stop_token="</s>" --num_threads=1

output


Extracting Bazel installation...
Starting local Bazel server and connecting to it...
DEBUG: /mnt/data/nigel_wang/tensorflow_cache/153a550227f3ff2fa4e4811633058a05/external/org_tensorflow/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'com_google_absl' because it already exists.
DEBUG: /mnt/data/nigel_wang/tensorflow_cache/153a550227f3ff2fa4e4811633058a05/external/org_tensorflow/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'XNNPACK' because it already exists.
INFO: Analyzed target //ai_edge_torch/generative/examples/c++:text_generator_main (147 packages loaded, 3826 targets configured).
INFO: From Compiling src/google/protobuf/generated_message_tctable_lite.cc [for tool]:
external/protobuf~/src/google/protobuf/generated_message_tctable_lite.cc:347:14: warning: unused function 'Offset' [-Wunused-function]
347 | inline void* Offset(void* base, uint32_t offset) {
  |              ^~~~~~
1 warning generated.
INFO: From Compiling src/google/protobuf/compiler/cpp/helpers.cc [for tool]:
external/protobuf~/src/google/protobuf/compiler/cpp/helpers.cc:197:25: warning: unused function 'VerifyInt32TypeToVerifyCustom' [-Wunused-function]
197 | inline VerifySimpleType VerifyInt32TypeToVerifyCustom(VerifyInt32Type t) {
  |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
INFO: From Executing genrule @@org_tensorflow//tensorflow/lite/acceleration/configuration:configuration_schema:
When you use --proto, that you should check for conformity yourself, using the existing --conform
INFO: Found 1 target...
Target //ai_edge_torch/generative/examples/c++:text_generator_main up-to-date:
bazel-bin/ai_edge_torch/generative/examples/c++/text_generator_main
INFO: Elapsed time: 276.290s, Critical Path: 109.56s
INFO: 1493 processes: 601 internal, 892 linux-sandbox.
INFO: Build completed successfully, 1493 total actions
INFO: Running command line: bazel-bin/ai_edge_torch/generative/examples/c++/text_generator_main '--tflite_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/ttiny_llama_seq512_kv1024.tflite' '--sentencepiece_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/TinyLlama-1.1B-Chat-v1.0/tokenizer.model' '--prompt=<|user|> \n Write and email:\n <|assistant|>' '--start_token=<s>' '--stop_token=</s>' '--num_threads=1'
ERROR: Didn't find op for builtin opcode 'STABLEHLO_COMPOSITE' version '1'. An older version of this builtin might be supported. Are you using an old TFLite binary with a newer model?

ERROR: Registration failed.

Error at ai_edge_torch/generative/examples/c++/text_generator_main.cc:93



- above this error, i see in newer version has add `stablehlo_composite`, 
https://github.com/tensorflow/tensorflow/commit/f4f2393888af78879dc9b299786023fe87fbbcfc
- in WORKSPACE version, doesn't add 
     - _TENSORFLOW_GIT_COMMIT = "26d4ea90364daa14bbb2bc5c2aa68f5b70c4641f"
     - https://github.com/tensorflow/tensorflow/blob/26d4ea90364daa14bbb2bc5c2aa68f5b70c4641f/tensorflow/lite/core/kernels/register.cc#L385

nigelzzz commented 3 weeks ago

and i have other question in this ticket. https://github.com/google-ai-edge/ai-edge-torch/issues/109

nigelzzz commented 3 weeks ago

or i can help open a pr to fix it

pkgoogle commented 3 weeks ago

Hi @nigelzzz, thanks for the info -- I'm having trouble building against that particular commit/version of TF -- did you modify your BUILD file? https://github.com/google-ai-edge/ai-edge-torch/blob/release/0.2.0/ai_edge_torch/generative/examples/c%2B%2B/BUILD

You can open a PR if you feel it is actually fixing the root cause. And this might just be a new issue as HEAD/nightly should work as well :).

github-actions[bot] commented 2 weeks ago

Marking this issue as stale since it has been open for 7 days with no activity. This issue will be closed if no further activity occurs.

nigelzzz commented 1 week ago

Hi @pkgoogle , I just do some patch in my local tensorflow library. I guess we just update tensorflow version, change version in WORKSPACE file. e.g., (https://github.com/google-ai-edge/ai-edge-torch/blob/main/WORKSPACE)

pkgoogle commented 1 week ago

Hi @nigelzzz, for better reproducibility can you produce a diff between your local files and the github repo (maybe pull the latest changes).

something like this:

# navigate to tf root
git diff origin/master > diff.txt

Then share/upload that diff.txt file... that will help me a lot, thanks.

github-actions[bot] commented 1 day ago

Marking this issue as stale since it has been open for 7 days with no activity. This issue will be closed if no further activity occurs.

google-ai-edge / ai-edge-torch