dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.18k stars 448 forks source link

No docker image for text-text-generation-inference on JetPack 6.0 DP #378

Open ihubanov opened 8 months ago

ihubanov commented 8 months ago
$ ./run.sh $(./autotag text-generation-inference)           
Namespace(packages=['text-generation-inference'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- L4T_VERSION=36.2.0  JETPACK_VERSION=6.0  CUDA_VERSION=12.2.140
-- Finding compatible container image for ['text-generation-inference']

Couldn't find a compatible container for text-generation-inference, would you like to build it? [y/N] y
-- Package l4t-text-generation has missing dependencies, disabling...  ("couldn't find package:  bitsandbytes")
-- Package l4t-tensorflow:tf1 has missing dependencies, disabling...  ("couldn't find package:  tensorflow")
-- Package text-generation-inference has missing dependencies, disabling...  ("couldn't find package:  bitsandbytes")
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/sniffski/Documents/jetson-containers/jetson_containers/tag.py", line 58, in <module>
    image = find_container(args.packages[0], prefer_sources=args.prefer, disable_sources=args.disable, user=args.user, quiet=args.quiet)
  File "/home/sniffski/Documents/jetson-containers/jetson_containers/container.py", line 490, in find_container
    return build_container('', package) #, simulate=True)
  File "/home/sniffski/Documents/jetson-containers/jetson_containers/container.py", line 64, in build_container
    packages = resolve_dependencies(packages)
  File "/home/sniffski/Documents/jetson-containers/jetson_containers/packages.py", line 304, in resolve_dependencies
    packages, changed = add_depends(packages)
  File "/home/sniffski/Documents/jetson-containers/jetson_containers/packages.py", line 280, in add_depends
    for dependency in find_package(package).get('depends', []):
  File "/home/sniffski/Documents/jetson-containers/jetson_containers/packages.py", line 166, in find_package
    raise KeyError(f"couldn't find package:  {package}")
KeyError: "couldn't find package:  text-generation-inference"
-- Error:  return code 1
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /home/sniffski/Documents/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb
"docker run" requires at least 1 argument.
See 'docker run --help'.

Usage:  docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Create and run a new container from an image

Is there a guide on how to build an image for my Jetson AGX Orin 64GB: L4T_VERSION=36.2.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2.140

dusty-nv commented 8 months ago

Hi @ihubanov, I haven't kept up with the patches for building text-generation-inference, as it wasn't really getting used and is complicated, and there are faster LLMs available for Jetson like MLC and llama.cpp. I may just want to remove it to avoid confusion, but have kept it around for reference on JetPack 5. You can attempt to build it on JetPack 6 with build.sh and if you get it working, I'd accept a PR with any updated patches needed to it's dockerfile.

ihubanov commented 8 months ago

Do you have a simple guide on how you did it for previous JetPack releases... Another question, do you happen to know if any of the other containers provide similar functionality like the one I'm interested in text-generation-inference... I need the generate API endpoint so I can build a chatbot frontend which could use the API endpoint for text generation...

dusty-nv commented 8 months ago

Do you have a simple guide on how you did it for previous JetPack releases...

Sure @ihubanov, first read these docs for getting setup to build containers:

Then you would try ./build.sh text-generation-inference (that is what had been working on JetPack 5)

Another question, do you happen to know if any of the other containers provide similar functionality like the one I'm interested in text-generation-inference... I need the generate API endpoint so I can build a chatbot frontend which could use the API endpoint for text generation...

Yes, both llama.cpp and text-generation-webui can expose OpenAI-compatible endpoints (and I have working containers on JetPack 5/6 for both of these), and you can use OpenAI's python client (or REST from Javascript/ect) to query it. The OpenAI API is understandingly prevalent by this point, so it seems like a pretty good option to go with that has good adoption. It's fine for text-based chat completion, however when it comes to streaming video/audio for multimodal models I personally prefer to have the LLM running directly in the same process to avoid latency and memory overhead associated with transferring that multimedia across processes.

ihubanov commented 7 months ago

Man when I try the ./build.sh text-generation-inference it crashes with 'package 'text-generation-inference' not found... Also ./build.sh --list-packages doesn't show it... Tried replacing the config.py with the one from webui, but it seems it is not the case... Not sure why this package is not found, but it is actually in the same place where webui is.

dusty-nv commented 7 months ago

@ihubanov text-generation-inference gets disabled on JP6 because it depends on bitsandbytes, which has a bunch of patches applied to it on JP5 which I don't feel like updating because bitsandbytes is slow and the quantization's been surpassed by AutoGPTQ, AWQ, ect. Also regarding TGI there are faster/easier options like MLC and llama.cpp