RAG LLM on AMD Ryzen AI NPU got TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len'

model_name: llama-2-7b-chat [load_smoothquant_model] model loaded ... modules.json: 100%|███████████████████████████████████████████████████████████████████████████| 349/349 [00:00<?, ?B/s] config_sentence_transformers.json: 100%|██████████████████████████████████████████████████████| 124/124 [00:00<?, ?B/s] README.md: 100%|███████████████████████████████████████████████████████████████████| 94.8k/94.8k [00:00<00:00, 373kB/s] sentence_bert_config.json: 100%|████████████████████████████████████████████████████████████| 52.0/52.0 [00:00<?, ?B/s] config.json: 100%|████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<?, ?B/s] model.safetensors: 100%|████████████████████████████████████████████████████████████| 133M/133M [00:08<00:00, 16.3MB/s] tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████| 366/366 [00:00<?, ?B/s] vocab.txt: 100%|█████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 356kB/s] tokenizer.json: 100%|████████████████████████████████████████████████████████████████| 711k/711k [00:00<00:00, 857kB/s] special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████| 125/125 [00:00<?, ?B/s] 1_Pooling/config.json: 100%|██████████████████████████████████████████████████████████████████| 190/190 [00:00<?, ?B/s] Creating new index. Running on local URL: http://localhost:7860

To create a public link, set share=True in launch().

Context information is below.

file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\relnotes.rst

Upstreamed to ONNX Runtime Github repo for any data type support and bug fix

NPU and Compiler

Extended the support range of some operators

Larger input size: conv2d, dwc

Padding mode: pad

Broadcast: add

Variant dimension (non-NHWC shape): reshape, transpose, add

Support new operators, e.g. reducemax(min/sum/avg), argmax(min)

Enhanced multi-level fusion

Performance enhancement for some operators

Add quantization information validation

Improvement in device partition

User friendly message

Target-dependency check

Demos

New Demos link: https://account.amd.com/en/forms/downloads/ryzen-ai-software-platform-xef.html?filename=transformers_2308.zip

LLM demo with OPT-1.3B/2.7B/6.7B

Automatic speech recognition demo with Whisper-tiny

Known issues

Flow control OPs including "Loop", "If", "Reduce" not supported by VOE

Resize OP in ONNX opset 10 or lower not supported by VOE

Tensorflow 2.x quantizer supports models within tf.keras.model only

Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue

Run multiple concurrent models by temporal sharing on the Performance optimized overlay (5x4.xclbin) is not supported

Support batch size 1 only for NPU

Version 0.7

Quantizer

Docker Containers

Provided CPU dockers for Pytorch, Tensorflow 1.x, and Tensorflow 2.x quantizer

Provided GPU Docker files to build GPU dockers

Pytorch Quantizer

Supports multiple output conversion to slicing

Enhanced transpose OP optimization

Inspector support new IP targets for NPU

ONNX Quantizer

Provided Python wheel file for installation

Supports quantizing ONNX models for NPU as a plugin for the ONNX Runtime native quantizer

Supports power-of-two quantization with both QDQ and QOP format

Supports Non-overflow and Min-MSE quantization methods

Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format.

Supports signed and unsigned configurations.

Supports symmetry and asymmetry configurations.

Supports per-tensor and per-channel configurations.

Supports bias quantization using int8 datatype for NPU.

Supports quantization parameters (scale) refinement for NPU.

Supports excluding certain operations from quantization for NPU.

Supports ONNX models larger than 2GB.

Supports using CUDAExecutionProvider for calibration in quantization

Open source and upstreamed to Microsoft Olive Github repo

TensorFlow 2.x Quantizer

Added support for exporting the quantized model ONNX format.

Added support for the keras.layers.Activation('leaky_relu')

TensorFlow 1.x Quantizer

Added support for folding Reshape and ResizeNearestNeighbor operators.

Added support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes.

Added support for quantizing Sum, StridedSlice, and Maximum operators.

Added support for setting the input shape of the model, which is useful in deploying models with undefined input shapes.

file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\getstartex.rst

I20231129 13:19:57.389281 14796 PartitionPass.cpp:6142] xir::Op{name = output_, type = fix2float} is not supported by current target. Target name: AMD_AIE2_Nx4_Overlay, target type: IPU_PHX. Assign it to CPU. I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:565] Total device subgraph number 3, CPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:574] Total device subgraph number 3, DPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:583] Total device subgraph number 3, USER subgraph number 1 I20231129 13:19:58.547658 14796 compile_pass_manager.cpp:639] Compile done. I20231129 13:19:58.583139 14796 anchor_point.cpp:444] before optimization: ... [Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50% [Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1 ... Final results: Predicted label is cat and actual label is cat Predicted label is ship and actual label is ship Predicted label is ship and actual label is ship Predicted label is airplane and actual label is airplane Predicted label is frog and actual label is frog Predicted label is frog and actual label is frog Predicted label is truck and actual label is automobile Predicted label is frog and actual label is frog Predicted label is cat and actual label is cat Predicted label is automobile and actual label is automobile
..

##################################### License #####################################

Ryzen AI is licensed under MIT License <https://github.com/amd/ryzen-ai-documentation/blob/main/License> . Refer to the LICENSE File <https://github.com/amd/ryzen-ai-documentation/blob/main/License> for the full license text and copyright notice.

Given the context information and not prior knowledge, answer the query. Query: who are you Answer: Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\queueing.py", line 521, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1945, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1511, in call_function prediction = await fn(processed_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\utils.py", line 798, in async_wrapper response = await f(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\chat_interface.py", line 516, in _submit_fn response = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\run.py", line 81, in prompt response_str = query_engine.query(query_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\base\base_query_engine.py", line 51, in query query_result = self._query(str_or_query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 190, in _query response = self._response_synthesizer.synthesize( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\base.py", line 240, in synthesize response_str = self.get_response( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\compact_and_refine.py", line 43, in get_response return super().get_response( ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 183, in get_response response = self._give_response_single( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 238, in _give_response_single program( File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 84, in call answer = self._llm.predict( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\llm.py", line 438, in predict response = self.complete(formatted_prompt, formatted=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\callbacks.py", line 389, in wrapped_llm_predict f_return_val = f(_self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 266, in complete response = self.generate_response(prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 240, in generate_response resp = self.decode_prompt1(prompt, max_new_tokens=m, do_sample=do_sample, temperature=temperature) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 167, in decode_prompt1 generate_ids = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\generation\utils.py", line 1989, in generate result = self._sample( ^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\generation\utils.py", line 2932, in _sample outputs = self(model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\tools\llm_eval.py", line 79, in forward outputs = super().forward(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 1141, in forward outputs = self.model( ^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 944, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 677, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\ops\python\llama_flash_attention.py", line 198, in forward cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len'

Hi @poganesh,

reclone the repository and create a new environment, still got ""No module named 'transformers.modeling_rope_utils"" error.
then I download modeling_rope_utils.py and put it into C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers, got another error as below:

(ryzenai-transformers) C:\RyzenAI-SW\example\transformers\models\rag>python run.py --model_name llama-2-7b-chat --target aie --no-direct_llm --quantized --assisted_generation Namespace(model_name='llama-2-7b-chat', target='aie', precision='w4abf16', profilegemm=False, w_bit=4, group_size=128, algorithm='awq', direct_llm=False, quantized=True, assisted_generation=True) config.json: 100%|████████████████████████████████████████████████████████████████████████████| 555/555 [00:00<?, ?B/s] model.safetensors: 100%|████████████████████████████████████████████████████████████| 650M/650M [01:01<00:00, 10.5MB/s] generation_config.json: 100%|█████████████████████████████████████████████████████████████████| 107/107 [00:00<?, ?B/s] [load_models] assistant model loaded ... LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 768) (layers): ModuleList( (0-11): 12 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=768, out_features=768, bias=False) (k_proj): Linear(in_features=768, out_features=768, bias=False) (v_proj): Linear(in_features=768, out_features=768, bias=False) (o_proj): Linear(in_features=768, out_features=768, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=768, out_features=3072, bias=False) (up_proj): Linear(in_features=768, out_features=3072, bias=False) (down_proj): Linear(in_features=3072, out_features=768, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=768, out_features=32000, bias=False) ) [RyzenAILLMEngine] Checking for available optimizations ... [RyzenAILLMEngine] Model transformation: Replacing <class 'transformers.models.llama.modeling_llama.LlamaAttention'> layers with <class 'llama_flash_attention.LlamaFlashAttentionPlus'> ... [RyzenAILLMEngine] Model transformation done!: Replaced 32 <class 'transformers.models.llama.modeling_llama.LlamaAttention'> layers with <class 'llama_flash_attention.LlamaFlashAttentionPlus'>. [RyzenAILLMEngine] Model transformation: Replacing <class 'qmodule.WQLinear'> layers with <class 'qlinear.QLinearPerGrp'> ... [RyzenAILLMEngine] Model transformation done!: Replaced 160 <class 'qmodule.WQLinear'> layers with <class 'qlinear.QLinearPerGrp'>. LlamaModelEval( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaFlashAttentionPlus( (rotary_emb): LlamaRotaryEmbedding() (o_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:4096, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (qkv_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:12288, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) ) (mlp): LlamaMLP( (gate_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (up_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (down_proj): ryzenAI.QLinearPerGrp(in_features:11008, out_features:4096, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): Linear(in_features=4096, out_features=32000, bias=False) ) [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.mlp.down_proj LlamaModelEval( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaFlashAttentionPlus( (rotary_emb): LlamaRotaryEmbedding() (o_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:4096, bias:None, device:aie, w_bit:4 group_size:128 ) (qkv_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:12288, bias:None, device:aie, w_bit:4 group_size:128 ) ) (mlp): LlamaMLP( (gate_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:None, device:aie, w_bit:4 group_size:128 ) (up_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:None, device:aie, w_bit:4 group_size:128 ) (down_proj): ryzenAI.QLinearPerGrp(in_features:11008, out_features:4096, bias:None, device:aie, w_bit:4 group_size:128 ) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): Linear(in_features=4096, out_features=32000, bias=False) ) model_name: llama-2-7b-chat [load_smoothquant_model] model loaded ... Loading persisted index. Running on local URL: http://localhost:7860

To create a public link, set share=True in launch().

Context information is below.

file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\relnotes.rst

Upstreamed to ONNX Runtime Github repo for any data type support and bug fix

NPU and Compiler

Extended the support range of some operators

Larger input size: conv2d, dwc

Padding mode: pad

Broadcast: add

Variant dimension (non-NHWC shape): reshape, transpose, add

Support new operators, e.g. reducemax(min/sum/avg), argmax(min)

Enhanced multi-level fusion

Performance enhancement for some operators

Add quantization information validation

Improvement in device partition

User friendly message

Target-dependency check

Demos

New Demos link: https://account.amd.com/en/forms/downloads/ryzen-ai-software-platform-xef.html?filename=transformers_2308.zip

LLM demo with OPT-1.3B/2.7B/6.7B

Automatic speech recognition demo with Whisper-tiny

Known issues

Flow control OPs including "Loop", "If", "Reduce" not supported by VOE

Resize OP in ONNX opset 10 or lower not supported by VOE

Tensorflow 2.x quantizer supports models within tf.keras.model only

Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue

Run multiple concurrent models by temporal sharing on the Performance optimized overlay (5x4.xclbin) is not supported

Support batch size 1 only for NPU

Version 0.7

Quantizer

Docker Containers

Provided CPU dockers for Pytorch, Tensorflow 1.x, and Tensorflow 2.x quantizer

Provided GPU Docker files to build GPU dockers

Pytorch Quantizer

Supports multiple output conversion to slicing

Enhanced transpose OP optimization

Inspector support new IP targets for NPU

ONNX Quantizer

Provided Python wheel file for installation

Supports quantizing ONNX models for NPU as a plugin for the ONNX Runtime native quantizer

Supports power-of-two quantization with both QDQ and QOP format

Supports Non-overflow and Min-MSE quantization methods

Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format.

Supports signed and unsigned configurations.

Supports symmetry and asymmetry configurations.

Supports per-tensor and per-channel configurations.

Supports bias quantization using int8 datatype for NPU.

Supports quantization parameters (scale) refinement for NPU.

Supports excluding certain operations from quantization for NPU.

Supports ONNX models larger than 2GB.

Supports using CUDAExecutionProvider for calibration in quantization

Open source and upstreamed to Microsoft Olive Github repo

TensorFlow 2.x Quantizer

Added support for exporting the quantized model ONNX format.

Added support for the keras.layers.Activation('leaky_relu')

TensorFlow 1.x Quantizer

Added support for folding Reshape and ResizeNearestNeighbor operators.

Added support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes.

Added support for quantizing Sum, StridedSlice, and Maximum operators.

Added support for setting the input shape of the model, which is useful in deploying models with undefined input shapes.

file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\getstartex.rst

I20231129 13:19:57.389281 14796 PartitionPass.cpp:6142] xir::Op{name = output_, type = fix2float} is not supported by current target. Target name: AMD_AIE2_Nx4_Overlay, target type: IPU_PHX. Assign it to CPU. I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:565] Total device subgraph number 3, CPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:574] Total device subgraph number 3, DPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:583] Total device subgraph number 3, USER subgraph number 1 I20231129 13:19:58.547658 14796 compile_pass_manager.cpp:639] Compile done. I20231129 13:19:58.583139 14796 anchor_point.cpp:444] before optimization: ... [Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50% [Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1 ... Final results: Predicted label is cat and actual label is cat Predicted label is ship and actual label is ship Predicted label is ship and actual label is ship Predicted label is airplane and actual label is airplane Predicted label is frog and actual label is frog Predicted label is frog and actual label is frog Predicted label is truck and actual label is automobile Predicted label is frog and actual label is frog Predicted label is cat and actual label is cat Predicted label is automobile and actual label is automobile
..

##################################### License #####################################

Ryzen AI is licensed under MIT License <https://github.com/amd/ryzen-ai-documentation/blob/main/License> . Refer to the LICENSE File <https://github.com/amd/ryzen-ai-documentation/blob/main/License> for the full license text and copyright notice.

Given the context information and not prior knowledge, answer the query. Query: who are you Answer: Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\queueing.py", line 521, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1945, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1511, in call_function prediction = await fn(processed_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\utils.py", line 798, in async_wrapper response = await f(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\chat_interface.py", line 516, in _submit_fn response = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\run.py", line 81, in prompt response_str = query_engine.query(query_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\base\base_query_engine.py", line 51, in query query_result = self._query(str_or_query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 190, in _query response = self._response_synthesizer.synthesize( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\base.py", line 240, in synthesize response_str = self.get_response( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\compact_and_refine.py", line 43, in get_response return super().get_response( ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 183, in get_response response = self._give_response_single( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 238, in _give_response_single program( File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 84, in call answer = self._llm.predict( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\llm.py", line 438, in predict response = self.complete(formatted_prompt, formatted=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\callbacks.py", line 389, in wrapped_llm_predict f_return_val = f(_self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 266, in complete response = self.generate_response(prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 240, in generate_response resp = self.decode_prompt1(prompt, max_new_tokens=m, do_sample=do_sample, temperature=temperature) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 167, in decode_prompt1 generate_ids = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\generation\utils.py", line 1525, in generate return self.sample( ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\generation\utils.py", line 2622, in sample outputs = self( ^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\tools\llm_eval.py", line 79, in forward outputs = super().forward(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1183, in forward outputs = self.model( ^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1029, in forward if self._use_flash_attention_2: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'LlamaModel' object has no attribute '_use_flash_attention_2'

amd / RyzenAI-SW