amd / RyzenAI-SW

MIT License
378 stars 63 forks source link

RAG LLM on AMD Ryzen AI NPU got TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len' #111

Closed mattchai closed 3 weeks ago

mattchai commented 2 months ago
RAG error

model_name: llama-2-7b-chat [load_smoothquant_model] model loaded ... modules.json: 100%|███████████████████████████████████████████████████████████████████████████| 349/349 [00:00<?, ?B/s] config_sentence_transformers.json: 100%|██████████████████████████████████████████████████████| 124/124 [00:00<?, ?B/s] README.md: 100%|███████████████████████████████████████████████████████████████████| 94.8k/94.8k [00:00<00:00, 373kB/s] sentence_bert_config.json: 100%|████████████████████████████████████████████████████████████| 52.0/52.0 [00:00<?, ?B/s] config.json: 100%|████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<?, ?B/s] model.safetensors: 100%|████████████████████████████████████████████████████████████| 133M/133M [00:08<00:00, 16.3MB/s] tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████| 366/366 [00:00<?, ?B/s] vocab.txt: 100%|█████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 356kB/s] tokenizer.json: 100%|████████████████████████████████████████████████████████████████| 711k/711k [00:00<00:00, 857kB/s] special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████| 125/125 [00:00<?, ?B/s] 1_Pooling/config.json: 100%|██████████████████████████████████████████████████████████████████| 190/190 [00:00<?, ?B/s] Creating new index. Running on local URL: http://localhost:7860

To create a public link, set share=True in launch().


Context information is below.

file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\relnotes.rst

  • Upstreamed to ONNX Runtime Github repo for any data type support and bug fix

NPU and Compiler

  • Extended the support range of some operators

    • Larger input size: conv2d, dwc
    • Padding mode: pad
    • Broadcast: add
    • Variant dimension (non-NHWC shape): reshape, transpose, add
  • Support new operators, e.g. reducemax(min/sum/avg), argmax(min)
  • Enhanced multi-level fusion
  • Performance enhancement for some operators
  • Add quantization information validation
  • Improvement in device partition

    • User friendly message
    • Target-dependency check

Demos

Known issues

  • Flow control OPs including "Loop", "If", "Reduce" not supported by VOE
  • Resize OP in ONNX opset 10 or lower not supported by VOE
  • Tensorflow 2.x quantizer supports models within tf.keras.model only
  • Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue
  • Run multiple concurrent models by temporal sharing on the Performance optimized overlay (5x4.xclbin) is not supported
  • Support batch size 1 only for NPU

Version 0.7


Quantizer

  • Docker Containers

    • Provided CPU dockers for Pytorch, Tensorflow 1.x, and Tensorflow 2.x quantizer
    • Provided GPU Docker files to build GPU dockers
  • Pytorch Quantizer

    • Supports multiple output conversion to slicing
    • Enhanced transpose OP optimization
    • Inspector support new IP targets for NPU
  • ONNX Quantizer

    • Provided Python wheel file for installation

    • Supports quantizing ONNX models for NPU as a plugin for the ONNX Runtime native quantizer

    • Supports power-of-two quantization with both QDQ and QOP format

    • Supports Non-overflow and Min-MSE quantization methods

    • Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format.

    • Supports signed and unsigned configurations.

    • Supports symmetry and asymmetry configurations.

    • Supports per-tensor and per-channel configurations.

    • Supports bias quantization using int8 datatype for NPU.

    • Supports quantization parameters (scale) refinement for NPU.

    • Supports excluding certain operations from quantization for NPU.

    • Supports ONNX models larger than 2GB.

    • Supports using CUDAExecutionProvider for calibration in quantization

    • Open source and upstreamed to Microsoft Olive Github repo

  • TensorFlow 2.x Quantizer

    • Added support for exporting the quantized model ONNX format.
    • Added support for the keras.layers.Activation('leaky_relu')
  • TensorFlow 1.x Quantizer

    • Added support for folding Reshape and ResizeNearestNeighbor operators.
    • Added support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes.
    • Added support for quantizing Sum, StridedSlice, and Maximum operators.
    • Added support for setting the input shape of the model, which is useful in deploying models with undefined input shapes.

file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\getstartex.rst

I20231129 13:19:57.389281 14796 PartitionPass.cpp:6142] xir::Op{name = output_, type = fix2float} is not supported by current target. Target name: AMD_AIE2_Nx4_Overlay, target type: IPU_PHX. Assign it to CPU. I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:565] Total device subgraph number 3, CPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:574] Total device subgraph number 3, DPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:583] Total device subgraph number 3, USER subgraph number 1 I20231129 13:19:58.547658 14796 compile_pass_manager.cpp:639] Compile done. I20231129 13:19:58.583139 14796 anchor_point.cpp:444] before optimization: ... [Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50% [Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1 ... Final results: Predicted label is cat and actual label is cat Predicted label is ship and actual label is ship Predicted label is ship and actual label is ship Predicted label is airplane and actual label is airplane Predicted label is frog and actual label is frog Predicted label is frog and actual label is frog Predicted label is truck and actual label is automobile Predicted label is frog and actual label is frog Predicted label is cat and actual label is cat Predicted label is automobile and actual label is automobile
..

##################################### License #####################################

Ryzen AI is licensed under MIT License <https://github.com/amd/ryzen-ai-documentation/blob/main/License> . Refer to the LICENSE File <https://github.com/amd/ryzen-ai-documentation/blob/main/License> for the full license text and copyright notice.

Given the context information and not prior knowledge, answer the query. Query: who are you Answer: Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\queueing.py", line 521, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1945, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1511, in call_function prediction = await fn(processed_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\utils.py", line 798, in async_wrapper response = await f(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\chat_interface.py", line 516, in _submit_fn response = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\run.py", line 81, in prompt response_str = query_engine.query(query_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\base\base_query_engine.py", line 51, in query query_result = self._query(str_or_query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 190, in _query response = self._response_synthesizer.synthesize( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\base.py", line 240, in synthesize response_str = self.get_response( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\compact_and_refine.py", line 43, in get_response return super().get_response( ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 183, in get_response response = self._give_response_single( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 238, in _give_response_single program( File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 84, in call answer = self._llm.predict( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\llm.py", line 438, in predict response = self.complete(formatted_prompt, formatted=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\callbacks.py", line 389, in wrapped_llm_predict f_return_val = f(_self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 266, in complete response = self.generate_response(prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 240, in generate_response resp = self.decode_prompt1(prompt, max_new_tokens=m, do_sample=do_sample, temperature=temperature) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 167, in decode_prompt1 generate_ids = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\generation\utils.py", line 1989, in generate result = self._sample( ^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\generation\utils.py", line 2932, in _sample outputs = self(model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\tools\llm_eval.py", line 79, in forward outputs = super().forward(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 1141, in forward outputs = self.model( ^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 944, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\test\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 677, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\ops\python\llama_flash_attention.py", line 198, in forward cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len'

poganesh commented 2 months ago

@mattchai Could you please check the versions of the following libraries in your conda environment:

  • transformers==4.37.2
  • torch==2.1.2
  • numpy==1.26.4
  • tokenizers==0.15.2

You can check the versions by running pip show transformers torch numpy tokenizers. Additionally, please refer to the env.yaml and requirements.txt which lists the specific versions of the libraries used in the environment. If the issue persists, please share your pip list output.

mattchai commented 2 months ago

Hi @poganesh, i got below error after update my conda environment, thanks.

RAG error 2

(ryzenai-transformers) C:\RyzenAI-SW\example\transformers\models\rag>python run.py --model_name llama-2-7b-chat --target aie --no-direct_llm --quantized --assisted_generation Namespace(model_name='llama-2-7b-chat', target='aie', precision='w4abf16', profilegemm=False, w_bit=4, group_size=128, algorithm='awq', direct_llm=False, quantized=True, assisted_generation=True) No module named 'transformers.modeling_rope_utils' C:\Users\test\AppData\Roaming\Python\Python311\site-packages\huggingface_hub\file_download.py:1150: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( Loading persisted index. Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\utils.py", line 41, in resolve_llm validate_openai_api_key(llm.api_key) File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\llms\openai\utils.py", line 409, in validate_openai_api_key raise ValueError(MISSING_API_KEY_ERROR_MESSAGE) ValueError: No API key found for OpenAI. Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization. API keys can be found or created at https://platform.openai.com/account/api-keys

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\RyzenAI-SW\example\transformers\models\rag\run.py", line 73, in query_engine = faiss_storage.get_query_engine(top_k=2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\vector_store_faiss.py", line 38, in get_query_engine return self.index.as_query_engine(similarity_top_k=top_k) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\indices\base.py", line 404, in as_query_engine else llm_from_settings_or_context(Settings, self.service_context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\settings.py", line 264, in llm_from_settings_or_context return settings.llm ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\settings.py", line 39, in llm self._llm = resolve_llm("default") ^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\utils.py", line 48, in resolve_llm raise ValueError( ValueError:


Could not load OpenAI model. If you intended to use OpenAI, please check your OPENAI_API_KEY. Original error: No API key found for OpenAI. Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization. API keys can be found or created at https://platform.openai.com/account/api-keys

To disable the LLM entirely, set llm=None.


(ryzenai-transformers) C:\RyzenAI-SW\example\transformers\models\rag>pip list Package Version


accelerate 0.33.0 aiofiles 23.2.1 aiohappyeyeballs 2.3.5 aiohttp 3.10.1 aiosignal 1.3.1 altair 5.3.0 altgraph 0.17.4 annotated-types 0.7.0 anyio 4.4.0 asgiref 3.8.1 attrs 24.2.0 autopep8 2.3.1 backoff 2.2.1 bcrypt 4.2.0 beautifulsoup4 4.12.3 black 24.8.0 bokeh 3.5.1 build 1.2.1 cachetools 5.4.0 certifi 2024.7.4 cffi 1.17.0 cfgv 3.3.1 charset-normalizer 3.3.2 chroma-hnswlib 0.7.6 chromadb 0.5.5 clang-format 18.1.8 click 8.1.7 colorama 0.4.6 coloredlogs 15.0.1 contourpy 1.2.1 cycler 0.12.1 dataclasses-json 0.6.7 datasets 2.20.0 Deprecated 1.2.14 diffusers 0.28.2 dill 0.3.8 dirtyjson 1.0.8 distlib 0.3.8 distro 1.9.0 einops 0.8.0 exceptiongroup 1.2.2 faiss-cpu 1.8.0 fastapi 0.112.0 ffmpy 0.4.0 filelock 3.15.4 fire 0.6.0 flatbuffers 24.3.25 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.5.0 google-auth 2.33.0 googleapis-common-protos 1.63.2 gradio 4.32.2 gradio_client 0.17.0 greenlet 3.0.3 grpcio 1.65.4 h11 0.14.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.24.5 human-eval 1.0.3 humanfriendly 10.0 identify 2.6.0 idna 3.7 importlib_metadata 8.0.0 importlib_resources 6.4.0 iniconfig 2.0.0 inquirerpy 0.3.4 Jinja2 3.1.4 jiter 0.5.0 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 kubernetes 30.1.0 lightning-utilities 0.11.6 llama-index 0.10.43 llama-index-agent-openai 0.2.9 llama-index-cli 0.1.13 llama-index-core 0.10.43 llama-index-embeddings-huggingface 0.2.1 llama-index-embeddings-openai 0.1.11 llama-index-indices-managed-llama-cloud 0.1.6 llama-index-legacy 0.9.48 llama-index-llms-openai 0.1.26 llama-index-multi-modal-llms-openai 0.1.9 llama-index-program-openai 0.1.6 llama-index-question-gen-openai 0.1.3 llama-index-readers-file 0.1.32 llama-index-readers-llama-parse 0.1.6 llama-index-vector-stores-chroma 0.1.8 llama-index-vector-stores-faiss 0.1.2 llama-parse 0.4.9 llamaindex-py-client 0.1.19 markdown-it-py 3.0.0 MarkupSafe 2.1.5 marshmallow 3.21.3 matplotlib 3.9.1.post1 mdurl 0.1.2 minijinja 2.0.1 mmh3 4.1.0 monotonic 1.6 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 nanobind 2.0.0 nest-asyncio 1.6.0 networkx 3.3 nltk 3.8.1 nodeenv 1.9.1 numpy 1.26.4 oauthlib 3.2.2 onnx 1.16.2 onnxruntime 1.18.1 openai 1.40.2 opencv-python 4.10.0.84 opentelemetry-api 1.26.0 opentelemetry-exporter-otlp-proto-common 1.26.0 opentelemetry-exporter-otlp-proto-grpc 1.26.0 opentelemetry-instrumentation 0.47b0 opentelemetry-instrumentation-asgi 0.47b0 opentelemetry-instrumentation-fastapi 0.47b0 opentelemetry-proto 1.26.0 opentelemetry-sdk 1.26.0 opentelemetry-semantic-conventions 0.47b0 opentelemetry-util-http 0.47b0 optimum 1.18.0 orjson 3.10.7 overrides 7.7.0 packaging 24.1 pandas 2.2.2 pathspec 0.12.1 pefile 2023.2.7 pfzy 0.3.4 pillow 10.4.0 pip 24.2 platformdirs 4.2.2 pluggy 1.5.0 posthog 3.5.0 pre_commit 3.8.0 prompt_toolkit 3.0.47 protobuf 4.25.4 psutil 6.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pyasn1 0.6.0 pyasn1_modules 0.4.0 pybind11 2.13.1 pybind11_global 2.13.1 pycodestyle 2.12.1 pycparser 2.22 pydantic 2.8.2 pydantic_core 2.20.1 pydub 0.25.1 Pygments 2.18.0 pyinstaller 6.10.0 pyinstaller-hooks-contrib 2024.8 pyparsing 3.1.2 pypdf 4.3.1 PyPika 0.48.9 pyproject_hooks 1.1.0 pyreadline3 3.4.1 pytest 8.3.2 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.9 pytz 2024.1 pywin32-ctypes 0.2.2 PyYAML 6.0.2 referencing 0.35.1 regex 2024.7.24 requests 2.32.3 requests-oauthlib 2.0.0 rich 13.7.1 rpds-py 0.20.0 rsa 4.9 ruff 0.5.7 RyzenAI 0.0.1 ryzenai_torch_cpp 0.0.1 safetensors 0.4.4 scikit-learn 1.5.1 scipy 1.14.0 semantic-version 2.10.0 sentence-transformers 2.7.0 sentencepiece 0.2.0 setuptools 72.1.0 shellingham 1.5.4 simplejson 3.19.2 six 1.16.0 sniffio 1.3.1 soupsieve 2.5 SQLAlchemy 2.0.32 starlette 0.37.2 striprtf 0.0.26 sympy 1.13.1 tabulate 0.9.0 tenacity 8.5.0 termcolor 2.4.0 thop 0.1.1-2209072238 threadpoolctl 3.5.0 tiktoken 0.7.0 tokenizers 0.15.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.1.2 torchmetrics 1.4.1 torchvision 0.16.2 tornado 6.4.1 tqdm 4.66.5 transformers 4.37.2 typer 0.12.3 typing_extensions 4.12.2 typing-inspect 0.9.0 tzdata 2024.1 ukkonen 1.0.1 urllib3 2.2.2 uvicorn 0.30.5 virtualenv 20.26.3 watchfiles 0.23.0 wcwidth 0.2.13 websocket-client 1.8.0 websockets 11.0.3 wheel 0.44.0 wrapt 1.16.0 xxhash 3.4.1 xyzservices 2024.6.0 yarl 1.9.4 zipp 3.19.2

(ryzenai-transformers) C:\RyzenAI-SW\example\transformers\models\rag>

poganesh commented 2 months ago

Hi @mattchai,

From the logs provided, I noticed "No module named 'transformers.modeling_rope_utils" is causing the issue. It is happening because transformers==4.37.2 does not include the modeling_rope_utils module.

We've tested the RAG example with transformers==4.37.2 and didn't encounter any issues. Could you please check if there were any changes made to the code on your end that might be causing this issue?

I suggest recloning the repository, creating a new environment, and following the steps in the readme to ensure the original rag example works as expected.

mattchai commented 2 months ago

Hi @poganesh,

  1. reclone the repository and create a new environment, still got ""No module named 'transformers.modeling_rope_utils"" error.
  2. then I download modeling_rope_utils.py and put it into C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers, got another error as below: image

(ryzenai-transformers) C:\RyzenAI-SW\example\transformers\models\rag>python run.py --model_name llama-2-7b-chat --target aie --no-direct_llm --quantized --assisted_generation Namespace(model_name='llama-2-7b-chat', target='aie', precision='w4abf16', profilegemm=False, w_bit=4, group_size=128, algorithm='awq', direct_llm=False, quantized=True, assisted_generation=True) config.json: 100%|████████████████████████████████████████████████████████████████████████████| 555/555 [00:00<?, ?B/s] model.safetensors: 100%|████████████████████████████████████████████████████████████| 650M/650M [01:01<00:00, 10.5MB/s] generation_config.json: 100%|█████████████████████████████████████████████████████████████████| 107/107 [00:00<?, ?B/s] [load_models] assistant model loaded ... LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 768) (layers): ModuleList( (0-11): 12 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=768, out_features=768, bias=False) (k_proj): Linear(in_features=768, out_features=768, bias=False) (v_proj): Linear(in_features=768, out_features=768, bias=False) (o_proj): Linear(in_features=768, out_features=768, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=768, out_features=3072, bias=False) (up_proj): Linear(in_features=768, out_features=3072, bias=False) (down_proj): Linear(in_features=3072, out_features=768, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=768, out_features=32000, bias=False) ) [RyzenAILLMEngine] Checking for available optimizations ... [RyzenAILLMEngine] Model transformation: Replacing <class 'transformers.models.llama.modeling_llama.LlamaAttention'> layers with <class 'llama_flash_attention.LlamaFlashAttentionPlus'> ... [RyzenAILLMEngine] Model transformation done!: Replaced 32 <class 'transformers.models.llama.modeling_llama.LlamaAttention'> layers with <class 'llama_flash_attention.LlamaFlashAttentionPlus'>. [RyzenAILLMEngine] Model transformation: Replacing <class 'qmodule.WQLinear'> layers with <class 'qlinear.QLinearPerGrp'> ... [RyzenAILLMEngine] Model transformation done!: Replaced 160 <class 'qmodule.WQLinear'> layers with <class 'qlinear.QLinearPerGrp'>. LlamaModelEval( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaFlashAttentionPlus( (rotary_emb): LlamaRotaryEmbedding() (o_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:4096, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (qkv_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:12288, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) ) (mlp): LlamaMLP( (gate_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (up_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (down_proj): ryzenAI.QLinearPerGrp(in_features:11008, out_features:4096, bias:torch.Size([1]), device:cpu, w_bit:4 group_size:128 ) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): Linear(in_features=4096, out_features=32000, bias=False) ) [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.0.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.1.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.2.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.3.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.4.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.5.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.6.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.7.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.8.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.9.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.10.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.11.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.12.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.13.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.14.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.15.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.16.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.17.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.18.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.19.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.20.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.21.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.22.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.23.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.24.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.25.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.26.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.27.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.28.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.29.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.30.mlp.down_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.self_attn.o_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.self_attn.qkv_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.mlp.gate_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.mlp.up_proj [RyzenAILLMEngine] Preparing weights of layer : model.layers.31.mlp.down_proj LlamaModelEval( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaFlashAttentionPlus( (rotary_emb): LlamaRotaryEmbedding() (o_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:4096, bias:None, device:aie, w_bit:4 group_size:128 ) (qkv_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:12288, bias:None, device:aie, w_bit:4 group_size:128 ) ) (mlp): LlamaMLP( (gate_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:None, device:aie, w_bit:4 group_size:128 ) (up_proj): ryzenAI.QLinearPerGrp(in_features:4096, out_features:11008, bias:None, device:aie, w_bit:4 group_size:128 ) (down_proj): ryzenAI.QLinearPerGrp(in_features:11008, out_features:4096, bias:None, device:aie, w_bit:4 group_size:128 ) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): Linear(in_features=4096, out_features=32000, bias=False) ) model_name: llama-2-7b-chat [load_smoothquant_model] model loaded ... Loading persisted index. Running on local URL: http://localhost:7860

To create a public link, set share=True in launch().


Context information is below.

file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\relnotes.rst

  • Upstreamed to ONNX Runtime Github repo for any data type support and bug fix

NPU and Compiler

  • Extended the support range of some operators

    • Larger input size: conv2d, dwc
    • Padding mode: pad
    • Broadcast: add
    • Variant dimension (non-NHWC shape): reshape, transpose, add
  • Support new operators, e.g. reducemax(min/sum/avg), argmax(min)
  • Enhanced multi-level fusion
  • Performance enhancement for some operators
  • Add quantization information validation
  • Improvement in device partition

    • User friendly message
    • Target-dependency check

Demos

Known issues

  • Flow control OPs including "Loop", "If", "Reduce" not supported by VOE
  • Resize OP in ONNX opset 10 or lower not supported by VOE
  • Tensorflow 2.x quantizer supports models within tf.keras.model only
  • Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue
  • Run multiple concurrent models by temporal sharing on the Performance optimized overlay (5x4.xclbin) is not supported
  • Support batch size 1 only for NPU

Version 0.7


Quantizer

  • Docker Containers

    • Provided CPU dockers for Pytorch, Tensorflow 1.x, and Tensorflow 2.x quantizer
    • Provided GPU Docker files to build GPU dockers
  • Pytorch Quantizer

    • Supports multiple output conversion to slicing
    • Enhanced transpose OP optimization
    • Inspector support new IP targets for NPU
  • ONNX Quantizer

    • Provided Python wheel file for installation

    • Supports quantizing ONNX models for NPU as a plugin for the ONNX Runtime native quantizer

    • Supports power-of-two quantization with both QDQ and QOP format

    • Supports Non-overflow and Min-MSE quantization methods

    • Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format.

    • Supports signed and unsigned configurations.

    • Supports symmetry and asymmetry configurations.

    • Supports per-tensor and per-channel configurations.

    • Supports bias quantization using int8 datatype for NPU.

    • Supports quantization parameters (scale) refinement for NPU.

    • Supports excluding certain operations from quantization for NPU.

    • Supports ONNX models larger than 2GB.

    • Supports using CUDAExecutionProvider for calibration in quantization

    • Open source and upstreamed to Microsoft Olive Github repo

  • TensorFlow 2.x Quantizer

    • Added support for exporting the quantized model ONNX format.
    • Added support for the keras.layers.Activation('leaky_relu')
  • TensorFlow 1.x Quantizer

    • Added support for folding Reshape and ResizeNearestNeighbor operators.
    • Added support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes.
    • Added support for quantizing Sum, StridedSlice, and Maximum operators.
    • Added support for setting the input shape of the model, which is useful in deploying models with undefined input shapes.

file_path: C:\RyzenAI-SW\example\transformers\models\rag\dataset\getstartex.rst

I20231129 13:19:57.389281 14796 PartitionPass.cpp:6142] xir::Op{name = output_, type = fix2float} is not supported by current target. Target name: AMD_AIE2_Nx4_Overlay, target type: IPU_PHX. Assign it to CPU. I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:565] Total device subgraph number 3, CPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:574] Total device subgraph number 3, DPU subgraph number 1 I20231129 13:19:58.546655 14796 compile_pass_manager.cpp:583] Total device subgraph number 3, USER subgraph number 1 I20231129 13:19:58.547658 14796 compile_pass_manager.cpp:639] Compile done. I20231129 13:19:58.583139 14796 anchor_point.cpp:444] before optimization: ... [Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50% [Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1 ... Final results: Predicted label is cat and actual label is cat Predicted label is ship and actual label is ship Predicted label is ship and actual label is ship Predicted label is airplane and actual label is airplane Predicted label is frog and actual label is frog Predicted label is frog and actual label is frog Predicted label is truck and actual label is automobile Predicted label is frog and actual label is frog Predicted label is cat and actual label is cat Predicted label is automobile and actual label is automobile
..

##################################### License #####################################

Ryzen AI is licensed under MIT License <https://github.com/amd/ryzen-ai-documentation/blob/main/License> . Refer to the LICENSE File <https://github.com/amd/ryzen-ai-documentation/blob/main/License> for the full license text and copyright notice.

Given the context information and not prior knowledge, answer the query. Query: who are you Answer: Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\queueing.py", line 521, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1945, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\blocks.py", line 1511, in call_function prediction = await fn(processed_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\utils.py", line 798, in async_wrapper response = await f(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\gradio\chat_interface.py", line 516, in _submit_fn response = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\run.py", line 81, in prompt response_str = query_engine.query(query_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\base\base_query_engine.py", line 51, in query query_result = self._query(str_or_query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 190, in _query response = self._response_synthesizer.synthesize( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\base.py", line 240, in synthesize response_str = self.get_response( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\compact_and_refine.py", line 43, in get_response return super().get_response( ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 183, in get_response response = self._give_response_single( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 238, in _give_response_single program( File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\response_synthesizers\refine.py", line 84, in call answer = self._llm.predict( ^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\llm.py", line 438, in predict response = self.complete(formatted_prompt, formatted=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 223, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\llama_index\core\llms\callbacks.py", line 389, in wrapped_llm_predict f_return_val = f(_self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 266, in complete response = self.generate_response(prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 240, in generate_response resp = self.decode_prompt1(prompt, max_new_tokens=m, do_sample=do_sample, temperature=temperature) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\models\rag\custom_llm.py", line 167, in decode_prompt1 generate_ids = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\generation\utils.py", line 1525, in generate return self.sample( ^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\generation\utils.py", line 2622, in sample outputs = self( ^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\RyzenAI-SW\example\transformers\tools\llm_eval.py", line 79, in forward outputs = super().forward(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1183, in forward outputs = self.model( ^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1029, in forward if self._use_flash_attention_2: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ProgramData\anaconda3\envs\ryzenai-transformers\Lib\site-packages\torch\nn\modules\module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'LlamaModel' object has no attribute '_use_flash_attention_2'

shivani-athavale commented 2 months ago

Hello @mattchai, based on the pip list you shared, some of the package versions seem to be different. Did you install any additional packages apart from the ones in env.yaml and requirements.txt or make any changes to the code?

uday610 commented 3 weeks ago

Closing as there is no activity in this thread.