google-ai-edge / ai-edge-torch

Supporting PyTorch models with the Google AI Edge TFLite runtime.
Apache License 2.0
378 stars 51 forks source link

text_generator_main.cc using tinyllama model to inference can show Garbled characters #109

Open nigelzzz opened 4 months ago

nigelzzz commented 4 months ago

Description of the bug:



### Actual vs expected behavior:

_No response_

### Any other information you'd like to share?

_No response_
pkgoogle commented 4 months ago

Hi @nigelzzz, can you please provide more information so that we may reproduce it? For example, what version of Python you are using? which branch you are using?

Please also provide reproduce steps like:

python convert_to_tflite.py
<whatever commands you used to run the model>

Thanks!

nigelzzz commented 4 months ago

Hi @pkgoogle , python version: 3.9.5 ai-edge-rotch branch: v.0.1.1

/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/convert_to_tflite.py
python3 convert_to_tflite.py

Then we can see tiny_llama_seq512_kv1024.tflite in current path.

i built /mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/c++/text_generator_main.cc. i modification you can reference it.

 // Prepare helpers
 std::unique_ptr<tflite::FlatBufferModel> LoadModel() {
   std::unique_ptr<tflite::FlatBufferModel> model =
@@ -85,7 +93,13 @@ std::unique_ptr<tflite::Interpreter> BuildInterpreter(
   tflite::ops::builtin::BuiltinOpResolver resolver;
   // NOTE: We need to manually register optimized OPs for KV-cache and
   // Scaled Dot Product Attention (SDPA).
-  tflite::ops::custom::GenAIOpsRegisterer(&resolver);
+  resolver.AddCustom("odml.update_kv_cache",
+                      tflite::ops::custom::Register_KV_CACHE());
+  resolver.AddCustom("odml.scaled_dot_product_attention",
+                      tflite::ops::custom::Register_SDPA());
+
+
+  //tflite::ops::custom::GenAIOpsRegisterer(&resolver);

parameter

nigelzzz commented 4 months ago

@pkgoogle , btw, i have a little question, can i know where the source of /ai-edge-torch/tree/main/ai_edge_torch/generative/examples/tiny_llama /tiny_llama_lm_logits.pt.

Because i can't see the file on llama huggingface repo

haozha111 commented 4 months ago

@pkgoogle , btw, i have a little question, can i know where the source of /ai-edge-torch/tree/main/ai_edge_torch/generative/examples/tiny_llama /tiny_llama_lm_logits.pt.

Because i can't see the file on llama huggingface repo

the .pt file is used as a golden test set for our development, which is not available in HF. @talumbau can confirm as well.

nigelzzz commented 4 months ago

@haozha111 very thanks!!!

pkgoogle commented 4 months ago

Hi @nigelzzz, which checkpoint data are you using from the original tiny_llama model? Thanks for your help.

nigelzzz commented 4 months ago

@pkgoogle, that's my check point https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/tree/main

nigelzzz commented 3 months ago

@pkgoogle , hi, can you reproduce it, or has any suggestion to debug it, i can help to solve it

Thanks!!

pkgoogle commented 3 months ago

Hi @nigelzzz, @hheydary is currently assigned to this case. I would first try to see if you still get the same result if you removed your modifications first. If not, then you know it has something to do w/ your update. If so, you said "can show" so are you saying this happens often or just once in a while? If it happens in only particular instances, that will be good data to share with us. If it happens "all the time" ... this should show in the loss when validating on a known dataset. But yeah those would be good places to start. Hope that helps.

hheydary commented 3 months ago

Hi @nigelzzz, Instruction tuned models (an in general language models) are trained to recognize specialized tokens and take actions based on when they see those tokens. First, I noticed that you are not including BOS and EOS tokens when running the model. Those tokens for the model you mentioned can be found here. Additionally, for best results, you need to manually add the "chat template" that was used to train the model to your input prompt. From model's page on HF, the template would look like this:

# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...

i.e., (<|user|> \n PROMPT \n <|assistant|>.

nigelzzz commented 3 months ago

Hi @hheydary and @pkgoogle, my output still show garbled characters, https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py can i use above file to test text generation?

Prompt:
<|user|>
 Write an email:
 <|assistant|>
Output text:
agyagyagyagyagyagyagyagyagyagyagyagyagyagyagyagyagyścingtonścścścścingtonścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścirościrościrościrościrościrościrościrościrościrościrościrościroiroirościrościroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroirooczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczocz
hheydary commented 3 months ago

Unfortunately, I am not able to reproduce the issue that you are seeing. Using the following command:

bazel run -c opt //ai_edge_torch/generative/examples/c++:text_generator_main -- --tflite_model=model.tflite --sentencepiece_model=tokenizer.model --prompt="<|user|> \n Write and email:\n <|assistant|>" --start_token="<s>" --stop_token="</s>" --num_threads=16

The model generates reasonable outputs.

A few things:

nigelzzz commented 3 months ago

@hheydary , thanks for your responce!!

nigelzzz commented 3 months ago

@hheydary , when i using 0.2.0, then run python3 tiny_llama.py, the out will show .

git branch
* (HEAD detached at origin/release/0.2.0)
2024-08-07 11:09:48.229016: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1723028988.241253  364737 cuda_dnn.cc:8439] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1723028988.245210  364737 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-07 11:09:48.254251: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-07 11:09:48.938564: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py:153: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  tiny_llama_goldens = torch.load(current_dir / "tiny_llama_lm_logits.pt")
Traceback (most recent call last):
  File "/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py", line 168, in <module>
    define_and_run()
  File "/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py", line 162, in define_and_run
    assert torch.allclose(
AssertionError
nigelzzz commented 3 months ago

ERROR: Registration failed.

Error at ai_edge_torch/generative/examples/c++/text_generator_main.cc:93



- above this error, i see in newer version has add `stablehlo_composite`, 
https://github.com/tensorflow/tensorflow/commit/f4f2393888af78879dc9b299786023fe87fbbcfc
- in WORKSPACE version, doesn't add 
     - _TENSORFLOW_GIT_COMMIT = "26d4ea90364daa14bbb2bc5c2aa68f5b70c4641f"
     - https://github.com/tensorflow/tensorflow/blob/26d4ea90364daa14bbb2bc5c2aa68f5b70c4641f/tensorflow/lite/core/kernels/register.cc#L385
nigelzzz commented 3 months ago

in 0.2.0

nigelzzz commented 3 months ago

@hheydary , i think i found some good point

nigelzzz commented 3 months ago

@pkgoogle @hheydary @haozha111 , I think i found some good point, can reproduce by your side?