dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.09k stars 435 forks source link

Error when building text-generation-inference container on Jetson Orin 8GB #357

Open georgejeno8 opened 8 months ago

georgejeno8 commented 8 months ago

Hello,

I followed the system setup instructions and tried to build the text-generation-inference container on my Jetson Orin 8GB running JetPack 5.1, but I seem to be running into the following error:

...
Sending build context to Docker daemon  13.31kB
Step 1/29 : ARG BASE_IMAGE
Step 2/29 : FROM ${BASE_IMAGE}
 ---> d8908d69616b
Step 3/29 : WORKDIR /opt
 ---> Using cache
 ---> 589bb075dd11
Step 4/29 : ARG PROTOC_URL=https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-linux-aarch_64.zip
 ---> Using cache
 ---> cfc38199410f
Step 5/29 : ARG PROTOC_ZIP=protoc-21.12-linux-aarch_64.zip
 ---> Using cache
 ---> ba89bb1f2c44
Step 6/29 : RUN wget --quiet --show-progress --progress=bar:force:noscroll --no-check-certificate ${PROTOC_URL} -O ${PROTOC_ZIP} &&     unzip -o ${PROTOC_ZIP} -d /usr/local bin/protoc &&     unzip -o ${PROTOC_ZIP} -d /usr/local 'include/*' &&     rm ${PROTOC_ZIP}
 ---> Using cache
 ---> f079a93c4ba1
Step 7/29 : RUN which protoc && protoc --version
 ---> Using cache
 ---> 71cb62d4a89b
Step 8/29 : RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1 &&     update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
 ---> Using cache
 ---> 7ce89b15c438
Step 9/29 : RUN git clone --depth=1 https://github.com/huggingface/text-generation-inference
 ---> Using cache
 ---> 3c69aa67b335
Step 10/29 : WORKDIR /opt/text-generation-inference/server
 ---> Using cache
 ---> 446c0c56105a
Step 11/29 : RUN sed 's|^bitsandbytes==.*|bitsandbytes|g' -i requirements.txt
 ---> Running in b9ba2352162e
sed: can't read requirements.txt: No such file or directory
The command '/bin/sh -c sed 's|^bitsandbytes==.*|bitsandbytes|g' -i requirements.txt' returned a non-zero code: 2

When I try to run from one of the prebuilt images instead everything starts successfully, but the following error occurs when trying to run the inference server with text-generation-launcher:

2024-01-03T17:27:49.104673Z  INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "orin-nano-tieset", port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data/models/huggingface"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-01-03T17:27:49.105036Z  INFO download: text_generation_launcher: Starting download process.
2024-01-03T17:28:05.832957Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-01-03T17:28:08.214202Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-01-03T17:28:08.215726Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-01-03T17:28:15.371095Z  WARN text_generation_launcher: We're not using custom kernels.

2024-01-03T17:28:15.526436Z  WARN text_generation_launcher: Could not import Flash Attention enabled models: No module named 'vllm'

2024-01-03T17:28:15.540418Z  WARN text_generation_launcher: Could not import Mistral model: No module named 'dropout_layer_norm'

2024-01-03T17:28:17.262192Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/usr/local/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/usr/local/lib/python3.8/dist-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/typer/core.py", line 778, in main
    return _main(
  File "/usr/local/lib/python3.8/dist-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/text-generation-inference/server/text_generation_server/cli.py", line 83, in serve
    server.serve(
  File "/opt/text-generation-inference/server/text_generation_server/server.py", line 207, in serve
    asyncio.run(
  File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
    self._run_once()
  File "/usr/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
    handle._run()
  File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/text-generation-inference/server/text_generation_server/server.py", line 159, in serve_inner
    model = get_model(
  File "/opt/text-generation-inference/server/text_generation_server/models/__init__.py", line 157, in get_model
    return BLOOMSharded(
  File "/opt/text-generation-inference/server/text_generation_server/models/bloom.py", line 48, in __init__
    self.process_group, rank, world_size = initialize_torch_distributed()
  File "/opt/text-generation-inference/server/text_generation_server/utils/dist.py", line 48, in initialize_torch_distributed
    from torch.distributed import ProcessGroupNCCL
ImportError: cannot import name 'ProcessGroupNCCL' from 'torch.distributed' (/usr/local/lib/python3.8/dist-packages/torch/distributed/__init__.py)

Is there something incorrect with how my Jetson Orin is configured? Thank you very much for the help!

georgejeno8 commented 8 months ago

Looking at https://github.com/huggingface/text-generation-inference/tree/v1.1.1/server and https://github.com/huggingface/text-generation-inference/tree/v1.2.0/server it seems that they moved from a single requirements.txt file to requirements_common.txt and requirements_cuda.txt, does that need to be reflected in the build script?

dusty-nv commented 8 months ago

@georgejeno8 probably, but I don't really maintain text-generation-inference package as its heavyweight and I don't use it. Happy to accept PR's though!

xunkai55 commented 7 months ago

Most likely you need pytorch:2.0-distributed rather than pytorch:2.0