Open JIANGTUNAN opened 4 hours ago
Libraries
- CUDA: 12.2.140
- CUDNN: 8.9.4.25
- TensorRT: 8.6.2.3
- VPI: 3.1.5
- Vulkan: 1.3.204
- OpencV: 4.8.0
- with CUDA: NO
- Jetpack: 6.0
Models that cannot be loaded:
- Qwen/Qwen2-*-Int8
- Qwen/Qwen2.5-*-Int8
- Qwen/Qwen2-*-Int4
- Qwen/Qwen2.5-*-Int4
- ...
start command:
jetson-containers run dustynv/llama-factory:r36.3.0
Description: I tried going into the container and following his instructions to install auto_gptq. It still doesn't work. I tried to
jetson-containers build llama-factory
, but it didn't work.docker logs:
[INFO|configuration_utils.py:739] 2024-10-04 10:28:04,599 >> Model config Qwen2Config { "_name_or_path": "Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 896, "initializer_range": 0.02, "intermediate_size": 4864, "max_position_embeddings": 32768, "max_window_layers": 24, "model_type": "qwen2", "num_attention_heads": 14, "num_hidden_layers": 24, "num_key_value_heads": 2, "quantization_config": { "batch_size": 1, "bits": 8, "block_name_to_quantize": null, "cache_block_outputs": true, "damp_percent": 0.01, "dataset": null, "desc_act": false, "exllama_config": { "version": 2 }, "group_size": 128, "max_input_length": null, "model_seqlen": null, "module_name_preceding_first_block": null, "modules_in_block_to_quantize": null, "pad_token_id": null, "quant_method": "gptq", "sym": true, "tokenizer": null, "true_sequential": true, "use_cuda_fp16": false, "use_exllama": true }, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": true, "torch_dtype": "float16", "transformers_version": "4.45.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/transformers/utils/versions.py", line 102, in require_version got_ver = importlib.metadata.version(pkg) File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 996, in version return distribution(distribution_name).version File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 969, in distribution return Distribution.from_name(distribution_name) File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 548, in from_name raise PackageNotFoundError(name) importlib.metadata.PackageNotFoundError: No package metadata was found for auto_gptq During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 575, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1935, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1532, in call_function prediction = await utils.async_iteration(iterator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 671, in async_iteration return await iterator.__anext__() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 664, in __anext__ return await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2405, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 914, in run result = context.run(func, *args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 647, in run_sync_iterator_async return next(iterator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 809, in gen_wrapper response = next(iterator) File "/opt/LLaMA-Factory/src/llamafactory/webui/chatter.py", line 104, in load_model super().__init__(args) File "/opt/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 52, in __init__ self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args) File "/opt/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 59, in __init__ self.model = load_model( File "/opt/LLaMA-Factory/src/llamafactory/model/loader.py", line 131, in load_model patch_config(config, tokenizer, model_args, init_kwargs, is_trainable) File "/opt/LLaMA-Factory/src/llamafactory/model/patcher.py", line 96, in patch_config configure_quantization(config, tokenizer, model_args, init_kwargs) File "/opt/LLaMA-Factory/src/llamafactory/model/model_utils/quantization.py", line 121, in configure_quantization require_version("auto_gptq>=0.5.0", "To fix: pip install auto_gptq>=0.5.0") File "/usr/local/lib/python3.10/dist-packages/transformers/utils/versions.py", line 104, in require_version raise importlib.metadata.PackageNotFoundError( importlib.metadata.PackageNotFoundError: No package metadata was found for The 'auto_gptq>=0.5.0' distribution was not found and is required by this application. To fix: pip install auto_gptq>=0.5.0
I'm checking
tell me if you can now: https://github.com/johnnynunez/jetson-containers/tree/AUTOGPTQ
CUDA_VERSION=12.6 PYTHON_VERSION=3.10 PYTORCH_VERSION=2.4 jetson-containers build llama-factory
PD: I updated autogpt and put it as dependings, because release version is very old, now the latests versions support gemma2 etc
https://github.com/dusty-nv/jetson-containers/pull/660 fails to build because there is no 0.8.0 branch of AutoGPTQ because it's not released yet. I'm keeping it at 0.7.1, along with pytorch 2.4, and will push the AutoGPTQ wheel and llama-factory container 👍
660 fails to build because there is no 0.8.0 branch of AutoGPTQ because it's not released yet. I'm keeping it at 0.7.1, along with pytorch 2.4, and will push the AutoGPTQ wheel and llama-factory container 👍
In my case builds, well. I mean I avoid this:
git clone --branch=v${AUTOGPTQ_BRANCH} --depth=1 https://github.com/PanQiWei/AutoGPTQ.git /opt/AutoGPTQ || \
git clone --depth=1 https://github.com/PanQiWei/AutoGPTQ.git /opt/AutoGPTQ
this is why i'm using || condition, so if the versions is not found use the last main commit
Why am I doing this? Because people don't tend to make releases all the time, and often start with tags and then stop making them.
Ahh right, I like that design pattern of yours where you fall back to main - merged this in https://github.com/dusty-nv/jetson-containers/commit/c0da1e22933cdf182763393dc509603626ede94f and pushed wheel for auto_gptq-0.8.0.dev0
and container image for dustynv/llama-factory:r36.4.0
👍
Libraries
Models that cannot be loaded:
start command:
Description: I tried going into the container and following his instructions to install auto_gptq. It still doesn't work. I tried to
jetson-containers build llama-factory
, but it didn't work.docker logs: