Closed mattchai closed 1 month ago
@mattchai Could you please check the versions of the following libraries in your conda environment:
You can check the versions by running pip show transformers torch numpy tokenizers
. Additionally, please refer to the env.yaml and requirements.txt which lists the specific versions of the libraries used in the environment. If the issue persists, please share your pip list
output.
Hi @poganesh, i got below error after update my conda environment, thanks.
(ryzenai-transformers) C:\RyzenAI-SW\example\transformers\models\rag>python run.py --model_name llama-2-7b-chat --target aie --no-direct_llm --quantized --assisted_generation
Namespace(model_name='llama-2-7b-chat', target='aie', precision='w4abf16', profilegemm=False, w_bit=4, group_size=128, algorithm='awq', direct_llm=False, quantized=True, assisted_generation=True)
No module named 'transformers.modeling_rope_utils'
C:\Users\test\AppData\Roaming\Python\Python311\site-packages\huggingface_hub\file_download.py:1150: FutureWarning: resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.
warnings.warn(
Loading persisted index.
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\utils.py", line 41, in resolve_llm
validate_openai_api_key(llm.api_key)
File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\llms\openai\utils.py", line 409, in validate_openai_api_key
raise ValueError(MISSING_API_KEY_ERROR_MESSAGE)
ValueError: No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\RyzenAI-SW\example\transformers\models\rag\run.py", line 73, in
Could not load OpenAI model. If you intended to use OpenAI, please check your OPENAI_API_KEY. Original error: No API key found for OpenAI. Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization. API keys can be found or created at https://platform.openai.com/account/api-keys
To disable the LLM entirely, set llm=None.
(ryzenai-transformers) C:\RyzenAI-SW\example\transformers\models\rag>pip list Package Version
accelerate 0.33.0 aiofiles 23.2.1 aiohappyeyeballs 2.3.5 aiohttp 3.10.1 aiosignal 1.3.1 altair 5.3.0 altgraph 0.17.4 annotated-types 0.7.0 anyio 4.4.0 asgiref 3.8.1 attrs 24.2.0 autopep8 2.3.1 backoff 2.2.1 bcrypt 4.2.0 beautifulsoup4 4.12.3 black 24.8.0 bokeh 3.5.1 build 1.2.1 cachetools 5.4.0 certifi 2024.7.4 cffi 1.17.0 cfgv 3.3.1 charset-normalizer 3.3.2 chroma-hnswlib 0.7.6 chromadb 0.5.5 clang-format 18.1.8 click 8.1.7 colorama 0.4.6 coloredlogs 15.0.1 contourpy 1.2.1 cycler 0.12.1 dataclasses-json 0.6.7 datasets 2.20.0 Deprecated 1.2.14 diffusers 0.28.2 dill 0.3.8 dirtyjson 1.0.8 distlib 0.3.8 distro 1.9.0 einops 0.8.0 exceptiongroup 1.2.2 faiss-cpu 1.8.0 fastapi 0.112.0 ffmpy 0.4.0 filelock 3.15.4 fire 0.6.0 flatbuffers 24.3.25 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.5.0 google-auth 2.33.0 googleapis-common-protos 1.63.2 gradio 4.32.2 gradio_client 0.17.0 greenlet 3.0.3 grpcio 1.65.4 h11 0.14.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.24.5 human-eval 1.0.3 humanfriendly 10.0 identify 2.6.0 idna 3.7 importlib_metadata 8.0.0 importlib_resources 6.4.0 iniconfig 2.0.0 inquirerpy 0.3.4 Jinja2 3.1.4 jiter 0.5.0 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 kubernetes 30.1.0 lightning-utilities 0.11.6 llama-index 0.10.43 llama-index-agent-openai 0.2.9 llama-index-cli 0.1.13 llama-index-core 0.10.43 llama-index-embeddings-huggingface 0.2.1 llama-index-embeddings-openai 0.1.11 llama-index-indices-managed-llama-cloud 0.1.6 llama-index-legacy 0.9.48 llama-index-llms-openai 0.1.26 llama-index-multi-modal-llms-openai 0.1.9 llama-index-program-openai 0.1.6 llama-index-question-gen-openai 0.1.3 llama-index-readers-file 0.1.32 llama-index-readers-llama-parse 0.1.6 llama-index-vector-stores-chroma 0.1.8 llama-index-vector-stores-faiss 0.1.2 llama-parse 0.4.9 llamaindex-py-client 0.1.19 markdown-it-py 3.0.0 MarkupSafe 2.1.5 marshmallow 3.21.3 matplotlib 3.9.1.post1 mdurl 0.1.2 minijinja 2.0.1 mmh3 4.1.0 monotonic 1.6 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 nanobind 2.0.0 nest-asyncio 1.6.0 networkx 3.3 nltk 3.8.1 nodeenv 1.9.1 numpy 1.26.4 oauthlib 3.2.2 onnx 1.16.2 onnxruntime 1.18.1 openai 1.40.2 opencv-python 4.10.0.84 opentelemetry-api 1.26.0 opentelemetry-exporter-otlp-proto-common 1.26.0 opentelemetry-exporter-otlp-proto-grpc 1.26.0 opentelemetry-instrumentation 0.47b0 opentelemetry-instrumentation-asgi 0.47b0 opentelemetry-instrumentation-fastapi 0.47b0 opentelemetry-proto 1.26.0 opentelemetry-sdk 1.26.0 opentelemetry-semantic-conventions 0.47b0 opentelemetry-util-http 0.47b0 optimum 1.18.0 orjson 3.10.7 overrides 7.7.0 packaging 24.1 pandas 2.2.2 pathspec 0.12.1 pefile 2023.2.7 pfzy 0.3.4 pillow 10.4.0 pip 24.2 platformdirs 4.2.2 pluggy 1.5.0 posthog 3.5.0 pre_commit 3.8.0 prompt_toolkit 3.0.47 protobuf 4.25.4 psutil 6.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pyasn1 0.6.0 pyasn1_modules 0.4.0 pybind11 2.13.1 pybind11_global 2.13.1 pycodestyle 2.12.1 pycparser 2.22 pydantic 2.8.2 pydantic_core 2.20.1 pydub 0.25.1 Pygments 2.18.0 pyinstaller 6.10.0 pyinstaller-hooks-contrib 2024.8 pyparsing 3.1.2 pypdf 4.3.1 PyPika 0.48.9 pyproject_hooks 1.1.0 pyreadline3 3.4.1 pytest 8.3.2 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.9 pytz 2024.1 pywin32-ctypes 0.2.2 PyYAML 6.0.2 referencing 0.35.1 regex 2024.7.24 requests 2.32.3 requests-oauthlib 2.0.0 rich 13.7.1 rpds-py 0.20.0 rsa 4.9 ruff 0.5.7 RyzenAI 0.0.1 ryzenai_torch_cpp 0.0.1 safetensors 0.4.4 scikit-learn 1.5.1 scipy 1.14.0 semantic-version 2.10.0 sentence-transformers 2.7.0 sentencepiece 0.2.0 setuptools 72.1.0 shellingham 1.5.4 simplejson 3.19.2 six 1.16.0 sniffio 1.3.1 soupsieve 2.5 SQLAlchemy 2.0.32 starlette 0.37.2 striprtf 0.0.26 sympy 1.13.1 tabulate 0.9.0 tenacity 8.5.0 termcolor 2.4.0 thop 0.1.1-2209072238 threadpoolctl 3.5.0 tiktoken 0.7.0 tokenizers 0.15.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.1.2 torchmetrics 1.4.1 torchvision 0.16.2 tornado 6.4.1 tqdm 4.66.5 transformers 4.37.2 typer 0.12.3 typing_extensions 4.12.2 typing-inspect 0.9.0 tzdata 2024.1 ukkonen 1.0.1 urllib3 2.2.2 uvicorn 0.30.5 virtualenv 20.26.3 watchfiles 0.23.0 wcwidth 0.2.13 websocket-client 1.8.0 websockets 11.0.3 wheel 0.44.0 wrapt 1.16.0 xxhash 3.4.1 xyzservices 2024.6.0 yarl 1.9.4 zipp 3.19.2
(ryzenai-transformers) C:\RyzenAI-SW\example\transformers\models\rag>
Hi @mattchai,
From the logs provided, I noticed "No module named 'transformers.modeling_rope_utils"
is causing the issue. It is happening because transformers==4.37.2
does not include the modeling_rope_utils module.
We've tested the RAG example with transformers==4.37.2
and didn't encounter any issues. Could you please check if there were any changes made to the code on your end that might be causing this issue?
I suggest recloning the repository, creating a new environment, and following the steps in the readme to ensure the original rag example works as expected.
Hi @poganesh,
(ryzenai-transformers) C:\RyzenAI-SW\example\transformers\models\rag>python run.py --model_name llama-2-7b-chat --target aie --no-direct_llm --quantized --assisted_generation Namespace(model_name='llama-2-7b-chat', target='aie', precision='w4abf16', profilegemm=False, w_bit=4, group_size=128, algorithm='awq', direct_llm=False, quantized=True, assisted_generation=True) config.json: 100%|████████████████████████████████████████████████████████████████████████████| 555/555 [00:00<?, ?B/s] model.safetensors: 100%|████████████████████████████████████████████████████████████| 650M/650M [01:01<00:00, 10.5MB/s] generation_config.json: 100%|█████████████████████████████████████████████████████████████████| 107/107 [00:00<?, ?B/s] [load_models] assistant model loaded ... LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 768) (layers): ModuleList( (0-11): 12 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=768, out_features=768, bias=False) (k_proj): Linear(in_features=768, out_features=768, bias=False) (v_proj): Linear(in_features=768, out_features=768, bias=False) (o_proj): Linear(in_features=768, out_features=768, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=768, out_features=3072, bias=False) (up_proj): Linear(in_features=768, out_features=3072, bias=False) (down_proj): Linear(in_features=3072, out_features=768, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=768, out_features=32000, bias=False) ) [RyzenAILLMEngine] Checking for available optimizations ... [RyzenAILLMEngine] Model transformation: Replacing <class 'transformers.models.llama.modeling_llama.LlamaAttention'> layers with <class 'llama_flash_attention.LlamaFlashAttentionPlus'> ... [RyzenAILLMEngine] Model transformation done!: Replaced 32 <class 'transformers.models.llama.modeling_llama.LlamaAttention'> layers with <class 'llama_flash_attention.LlamaFlashAttentionPlus'>. [RyzenAILLMEngine] Model transformation: Replacing <class 'qmodule.WQLinear'> layers with <class 'qlinear.QLinearPerGrp'> ... [RyzenAILLMEngine] Model transformation done!: Replaced 160 <class 'qmodule.WQLinear'> layers with <class 'qlinear.QLinearPerGrp'>. LlamaModelEval( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaFlashAttentionPlus( (rotary_emb): LlamaRotaryEmbedding() (o_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:4096, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (qkv_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:12288, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) ) (mlp): LlamaMLP( (gate_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (up_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (down_proj): ryzenAI.QLinearPerGrp(in_features:11008, out_features:4096, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): Linear(in_features=4096, out_features=32000, bias=False) ) [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.mlp.down_proj LlamaModelEval( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaFlashAttentionPlus( (rotary_emb): LlamaRotaryEmbedding() (o_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:4096, bias:None, device:aie, w_bit:4 group_size:128 ) (qkv_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:12288, bias:None, device:aie, w_bit:4 group_size:128 ) ) (mlp): LlamaMLP( (gate_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:None, device:aie, w_bit:4 group_size:128 ) (up_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:None, device:aie, w_bit:4 group_size:128 ) (down_proj): ryzenAI.QLinearPerGrp(in_features:11008, out_features:4096, bias:None, device:aie, w_bit:4 group_size:128 ) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): Linear(in_features=4096, out_features=32000, bias=False) ) model_name: llama-2-7b-chat [load_smoothquant_model] model loaded ... Loading persisted index. Running on local URL: http://localhost:7860
To create a public link, set share=True
in launch()
.
file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\relnotes.rst
Extended the support range of some operators
Improvement in device partition
New Demos link: https://account.amd.com/en/forms/downloads/ryzen-ai-software-platform-xef.html?filename=transformers_2308.zip
Version 0.7
Docker Containers
Pytorch Quantizer
ONNX Quantizer
Provided Python wheel file for installation
Supports quantizing ONNX models for NPU as a plugin for the ONNX Runtime native quantizer
Supports power-of-two quantization with both QDQ and QOP format
Supports Non-overflow and Min-MSE quantization methods
Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format.
Supports signed and unsigned configurations.
Supports symmetry and asymmetry configurations.
Supports per-tensor and per-channel configurations.
Supports bias quantization using int8 datatype for NPU.
Supports quantization parameters (scale) refinement for NPU.
Supports excluding certain operations from quantization for NPU.
Supports ONNX models larger than 2GB.
Supports using CUDAExecutionProvider for calibration in quantization
Open source and upstreamed to Microsoft Olive Github repo
TensorFlow 2.x Quantizer
TensorFlow 1.x Quantizer
file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\getstartex.rst
##################################### License #####################################
MIT License <https://github.com/amd/ryzen-ai-documentation/blob/main/License>
. Refer to the LICENSE File <https://github.com/amd/ryzen-ai-documentation/blob/main/License>
for the full license text and copyright notice.Given the context information and not prior knowledge, answer the query. Query: who are you Answer: Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\queueing.py", line 521, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1945, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1511, in call_function prediction = await fn(processed_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\utils.py", line 798, in async_wrapper response = await f(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\chat_interface.py", line 516, in _submit_fn response = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\run.py", line 81, in prompt response_str = query_engine.query(query_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\base\base_query_engine.py", line 51, in query query_result = self._query(str_or_query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 190, in _query response = self._response_synthesizer.synthesize( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\base.py", line 240, in synthesize response_str = self.get_response( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\compact_and_refine.py", line 43, in get_response return super().get_response( ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 183, in get_response response = self._give_response_single( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 238, in _give_response_single program( File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 84, in call answer = self._llm.predict( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\llm.py", line 438, in predict response = self.complete(formatted_prompt, formatted=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\callbacks.py", line 389, in wrapped_llm_predict f_return_val = f(_self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 266, in complete response = self.generate_response(prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 240, in generate_response resp = self.decode_prompt1(prompt, max_new_tokens=m, do_sample=do_sample, temperature=temperature) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 167, in decode_prompt1 generate_ids = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\generation\utils.py", line 1525, in generate return self.sample( ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\generation\utils.py", line 2622, in sample outputs = self( ^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\tools\llm_eval.py", line 79, in forward outputs = super().forward(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1183, in forward outputs = self.model( ^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1029, in forward if self._use_flash_attention_2: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'LlamaModel' object has no attribute '_use_flash_attention_2'
Hello @mattchai, based on the pip list you shared, some of the package versions seem to be different. Did you install any additional packages apart from the ones in env.yaml and requirements.txt or make any changes to the code?
Closing as there is no activity in this thread.
model_name: llama-2-7b-chat [load_smoothquant_model] model loaded ... modules.json: 100%|███████████████████████████████████████████████████████████████████████████| 349/349 [00:00<?, ?B/s] config_sentence_transformers.json: 100%|██████████████████████████████████████████████████████| 124/124 [00:00<?, ?B/s] README.md: 100%|███████████████████████████████████████████████████████████████████| 94.8k/94.8k [00:00<00:00, 373kB/s] sentence_bert_config.json: 100%|████████████████████████████████████████████████████████████| 52.0/52.0 [00:00<?, ?B/s] config.json: 100%|████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<?, ?B/s] model.safetensors: 100%|████████████████████████████████████████████████████████████| 133M/133M [00:08<00:00, 16.3MB/s] tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████| 366/366 [00:00<?, ?B/s] vocab.txt: 100%|█████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 356kB/s] tokenizer.json: 100%|████████████████████████████████████████████████████████████████| 711k/711k [00:00<00:00, 857kB/s] special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████| 125/125 [00:00<?, ?B/s] 1_Pooling/config.json: 100%|██████████████████████████████████████████████████████████████████| 190/190 [00:00<?, ?B/s] Creating new index. Running on local URL: http://localhost:7860
To create a public link, set
share=True
inlaunch()
.Context information is below.file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\relnotes.rst
NPU and Compiler
Extended the support range of some operators
Improvement in device partition
Demos
New Demos link: https://account.amd.com/en/forms/downloads/ryzen-ai-software-platform-xef.html?filename=transformers_2308.zip
Known issues
Version 0.7
Quantizer
Docker Containers
Pytorch Quantizer
ONNX Quantizer
Provided Python wheel file for installation
Supports quantizing ONNX models for NPU as a plugin for the ONNX Runtime native quantizer
Supports power-of-two quantization with both QDQ and QOP format
Supports Non-overflow and Min-MSE quantization methods
Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format.
Supports signed and unsigned configurations.
Supports symmetry and asymmetry configurations.
Supports per-tensor and per-channel configurations.
Supports bias quantization using int8 datatype for NPU.
Supports quantization parameters (scale) refinement for NPU.
Supports excluding certain operations from quantization for NPU.
Supports ONNX models larger than 2GB.
Supports using CUDAExecutionProvider for calibration in quantization
Open source and upstreamed to Microsoft Olive Github repo
TensorFlow 2.x Quantizer
TensorFlow 1.x Quantizer
file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\getstartex.rst
I20231129 13:19:57.389281 14796 PartitionPass.cpp:6142] xir::Op{name = output_, type = fix2float} is not supported by current target. Target name: AMD_AIE2_Nx4_Overlay, target type: IPU_PHX. Assign it to CPU. I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:565] Total device subgraph number 3, CPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:574] Total device subgraph number 3, DPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:583] Total device subgraph number 3, USER subgraph number 1 I20231129 13:19:58.547658 14796 compile_pass_manager.cpp:639] Compile done. I20231129 13:19:58.583139 14796 anchor_point.cpp:444] before optimization: ... [Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50% [Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1 ... Final results: Predicted label is cat and actual label is cat Predicted label is ship and actual label is ship Predicted label is ship and actual label is ship Predicted label is airplane and actual label is airplane Predicted label is frog and actual label is frog Predicted label is frog and actual label is frog Predicted label is truck and actual label is automobile Predicted label is frog and actual label is frog Predicted label is cat and actual label is cat Predicted label is automobile and actual label is automobile
..
##################################### License #####################################
Ryzen AI is licensed under
MIT License <https://github.com/amd/ryzen-ai-documentation/blob/main/License>
. Refer to theLICENSE File <https://github.com/amd/ryzen-ai-documentation/blob/main/License>
for the full license text and copyright notice.Given the context information and not prior knowledge, answer the query. Query: who are you Answer: Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\queueing.py", line 521, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1945, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1511, in call_function prediction = await fn(processed_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\utils.py", line 798, in async_wrapper response = await f(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\chat_interface.py", line 516, in _submit_fn response = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\run.py", line 81, in prompt response_str = query_engine.query(query_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\base\base_query_engine.py", line 51, in query query_result = self._query(str_or_query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 190, in _query response = self._response_synthesizer.synthesize( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\base.py", line 240, in synthesize response_str = self.get_response( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\compact_and_refine.py", line 43, in get_response return super().get_response( ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 183, in get_response response = self._give_response_single( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 238, in _give_response_single program( File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 84, in call answer = self._llm.predict( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\llm.py", line 438, in predict response = self.complete(formatted_prompt, formatted=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\callbacks.py", line 389, in wrapped_llm_predict f_return_val = f(_self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 266, in complete response = self.generate_response(prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 240, in generate_response resp = self.decode_prompt1(prompt, max_new_tokens=m, do_sample=do_sample, temperature=temperature) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 167, in decode_prompt1 generate_ids = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\generation\utils.py", line 1989, in generate result = self._sample( ^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\generation\utils.py", line 2932, in _sample outputs = self(model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\tools\llm_eval.py", line 79, in forward outputs = super().forward(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 1141, in forward outputs = self.model( ^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 944, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 677, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\ops\python\llama_flash_attention.py", line 198, in forward cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len'