RuntimeError: Unsupported model architecture: FalconForCausalLM

shekhars-li commented 7 months ago

System Info

CPU Architecture: x86_64
CPU/Host memory: 450 GiB
GPU Name: NVIDIA A100
GPU Memory Size: 80 GB
TensorRT-LLM Branch: 0.7.1
CUDA: 12.0
Driver Version: 525.85.12

Who can help?

@kaiyux @byshiue

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Convert HF weights:

python convert_checkpoint.py --model_dir falcon-7b-hf --dtype float16 --output_dir tensorrt-llm-falcon-7b-hf

0.7.1
You are using a model of type RefinedWebModel to instantiate a model of type falcon. This is not supported for
 all configurations of models and can yield errors.
Loading checkpoint shards:   0%|                                                                                                                                                                                                                                                                | 0/2 [00:00<?, ?it/s]/home/jobuser/.local/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Weights loaded. Total time: 00:00:02
Total time of converting checkpoints: 00:01:44

Compile engine:

trtllm-build --checkpoint_dir tensorrt-llm-falcon-7b-hf --use_gemm_plugin float16 --remove_input_padding \
  --use_gpt_attention_plugin float16 --output_dir tensorrt-llm-falcon-7b-hf/engine/

[02/20/2024-18:18:26] [TRT-LLM] [I] Remove Padding Enabled
Traceback (most recent call last):
  File "/home/jobuser/.local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/home/jobuser/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 217, in main
    build_and_save(source, build_config, args.output_dir, workers,
  File "/home/jobuser/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 154, in build_and_save
    build_and_save_shard(rank, rank % workers, ckpt_dir, build_config,
  File "/home/jobuser/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 130, in build_and_save_shard
    engine = build(build_config,
  File "/home/jobuser/.local/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 609, in build
    raise RuntimeError(
RuntimeError: Unsupported model architecture: FalconForCausalLM

Expected behavior

Engine compiles successfullly

actual behavior

trtllm-build returns

RuntimeError: Unsupported model architecture: FalconForCausalLM

additional notes

I am following very simple standard script from the repo. The weights are HF weights. The build is simple too. I already have pods with all the dependencies installed. I verified tensorrt-llm can be loaded/used in python repr.

shekhars-li commented 7 months ago

Update: I see the latest release 0.7.1 does not support FalconForCausalLM in MODEL_MAP yet. I do not have an option to compile from source as I can only push a precompiled docker image and not run the compilation on the cluster with A100. Can you please create a new release with the latest changes that support the FalconForCausalLM architecture too?

shekhars-li commented 7 months ago

As a final attempt, I tried to install unreleased version myself

pip install tensorrt-llm==0.9.0.dev2024020600 --extra-index-url https://pypi.nvidia.com

And I am unable to install also:

INFO: pip is looking at multiple versions of tensorrt-llm to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install tensorrt-llm because these package versions have conflicting dependencies.

The conflict is caused by:
    nvidia-ammo 0.7.3 depends on torchprofile>=0.0.4
    nvidia-ammo 0.7.2 depends on torchprofile>=0.0.4
    nvidia-ammo 0.7.1 depends on onnxruntime>=1.16.1
    nvidia-ammo 0.7.0 depends on onnxruntime>=1.16.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

NVIDIA / TensorRT-LLM