dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.18k stars 447 forks source link

llama-factory Unable Load model #659

Open JIANGTUNAN opened 4 hours ago

JIANGTUNAN commented 4 hours ago

Libraries

Models that cannot be loaded:

start command:

jetson-containers run dustynv/llama-factory:r36.3.0

Description: I tried going into the container and following his instructions to install auto_gptq. It still doesn't work. I tried tojetson-containers build llama-factory, but it didn't work.

docker logs:

[INFO|configuration_utils.py:739] 2024-10-04 10:28:04,599 >> Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 32768,
  "max_window_layers": 24,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "quantization_config": {
    "batch_size": 1,
    "bits": 8,
    "block_name_to_quantize": null,
    "cache_block_outputs": true,
    "damp_percent": 0.01,
    "dataset": null,
    "desc_act": false,
    "exllama_config": {
      "version": 2
    },
    "group_size": 128,
    "max_input_length": null,
    "model_seqlen": null,
    "module_name_preceding_first_block": null,
    "modules_in_block_to_quantize": null,
    "pad_token_id": null,
    "quant_method": "gptq",
    "sym": true,
    "tokenizer": null,
    "true_sequential": true,
    "use_cuda_fp16": false,
    "use_exllama": true
  },
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "float16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/versions.py", line 102, in require_version
    got_ver = importlib.metadata.version(pkg)
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 996, in version
    return distribution(distribution_name).version
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 969, in distribution
    return Distribution.from_name(distribution_name)
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 548, in from_name
    raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for auto_gptq

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 575, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1532, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 671, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 664, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2405, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 914, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 647, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "/opt/LLaMA-Factory/src/llamafactory/webui/chatter.py", line 104, in load_model
    super().__init__(args)
  File "/opt/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 52, in __init__
    self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
  File "/opt/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 59, in __init__
    self.model = load_model(
  File "/opt/LLaMA-Factory/src/llamafactory/model/loader.py", line 131, in load_model
    patch_config(config, tokenizer, model_args, init_kwargs, is_trainable)
  File "/opt/LLaMA-Factory/src/llamafactory/model/patcher.py", line 96, in patch_config
    configure_quantization(config, tokenizer, model_args, init_kwargs)
  File "/opt/LLaMA-Factory/src/llamafactory/model/model_utils/quantization.py", line 121, in configure_quantization
    require_version("auto_gptq>=0.5.0", "To fix: pip install auto_gptq>=0.5.0")
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/versions.py", line 104, in require_version
    raise importlib.metadata.PackageNotFoundError(
importlib.metadata.PackageNotFoundError: No package metadata was found for The 'auto_gptq>=0.5.0' distribution was not found and is required by this application. 
To fix: pip install auto_gptq>=0.5.0
johnnynunez commented 4 hours ago

Libraries

  • CUDA: 12.2.140
  • CUDNN: 8.9.4.25
  • TensorRT: 8.6.2.3
  • VPI: 3.1.5
  • Vulkan: 1.3.204
  • OpencV: 4.8.0
  • with CUDA: NO
  • Jetpack: 6.0

Models that cannot be loaded:

  • Qwen/Qwen2-*-Int8
  • Qwen/Qwen2.5-*-Int8
  • Qwen/Qwen2-*-Int4
  • Qwen/Qwen2.5-*-Int4
  • ...

start command:

jetson-containers run dustynv/llama-factory:r36.3.0

Description: I tried going into the container and following his instructions to install auto_gptq. It still doesn't work. I tried tojetson-containers build llama-factory, but it didn't work.

docker logs:

[INFO|configuration_utils.py:739] 2024-10-04 10:28:04,599 >> Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 32768,
  "max_window_layers": 24,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "quantization_config": {
    "batch_size": 1,
    "bits": 8,
    "block_name_to_quantize": null,
    "cache_block_outputs": true,
    "damp_percent": 0.01,
    "dataset": null,
    "desc_act": false,
    "exllama_config": {
      "version": 2
    },
    "group_size": 128,
    "max_input_length": null,
    "model_seqlen": null,
    "module_name_preceding_first_block": null,
    "modules_in_block_to_quantize": null,
    "pad_token_id": null,
    "quant_method": "gptq",
    "sym": true,
    "tokenizer": null,
    "true_sequential": true,
    "use_cuda_fp16": false,
    "use_exllama": true
  },
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "float16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/versions.py", line 102, in require_version
    got_ver = importlib.metadata.version(pkg)
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 996, in version
    return distribution(distribution_name).version
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 969, in distribution
    return Distribution.from_name(distribution_name)
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 548, in from_name
    raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for auto_gptq

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 575, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1532, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 671, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 664, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2405, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 914, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 647, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "/opt/LLaMA-Factory/src/llamafactory/webui/chatter.py", line 104, in load_model
    super().__init__(args)
  File "/opt/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 52, in __init__
    self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
  File "/opt/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 59, in __init__
    self.model = load_model(
  File "/opt/LLaMA-Factory/src/llamafactory/model/loader.py", line 131, in load_model
    patch_config(config, tokenizer, model_args, init_kwargs, is_trainable)
  File "/opt/LLaMA-Factory/src/llamafactory/model/patcher.py", line 96, in patch_config
    configure_quantization(config, tokenizer, model_args, init_kwargs)
  File "/opt/LLaMA-Factory/src/llamafactory/model/model_utils/quantization.py", line 121, in configure_quantization
    require_version("auto_gptq>=0.5.0", "To fix: pip install auto_gptq>=0.5.0")
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/versions.py", line 104, in require_version
    raise importlib.metadata.PackageNotFoundError(
importlib.metadata.PackageNotFoundError: No package metadata was found for The 'auto_gptq>=0.5.0' distribution was not found and is required by this application. 
To fix: pip install auto_gptq>=0.5.0

I'm checking

johnnynunez commented 3 hours ago

tell me if you can now: https://github.com/johnnynunez/jetson-containers/tree/AUTOGPTQ

CUDA_VERSION=12.6 PYTHON_VERSION=3.10 PYTORCH_VERSION=2.4 jetson-containers build llama-factory

PD: I updated autogpt and put it as dependings, because release version is very old, now the latests versions support gemma2 etc

dusty-nv commented 3 hours ago

https://github.com/dusty-nv/jetson-containers/pull/660 fails to build because there is no 0.8.0 branch of AutoGPTQ because it's not released yet. I'm keeping it at 0.7.1, along with pytorch 2.4, and will push the AutoGPTQ wheel and llama-factory container 👍

johnnynunez commented 3 hours ago

660 fails to build because there is no 0.8.0 branch of AutoGPTQ because it's not released yet. I'm keeping it at 0.7.1, along with pytorch 2.4, and will push the AutoGPTQ wheel and llama-factory container 👍

In my case builds, well. I mean I avoid this:

git clone --branch=v${AUTOGPTQ_BRANCH} --depth=1 https://github.com/PanQiWei/AutoGPTQ.git /opt/AutoGPTQ || \
git clone --depth=1 https://github.com/PanQiWei/AutoGPTQ.git /opt/AutoGPTQ 

this is why i'm using || condition, so if the versions is not found use the last main commit

Why am I doing this? Because people don't tend to make releases all the time, and often start with tags and then stop making them.

image image

dusty-nv commented 2 hours ago

Ahh right, I like that design pattern of yours where you fall back to main - merged this in https://github.com/dusty-nv/jetson-containers/commit/c0da1e22933cdf182763393dc509603626ede94f and pushed wheel for auto_gptq-0.8.0.dev0 and container image for dustynv/llama-factory:r36.4.0 👍