Open ihubanov opened 9 months ago
Hi @ihubanov, I haven't kept up with the patches for building text-generation-inference, as it wasn't really getting used and is complicated, and there are faster LLMs available for Jetson like MLC and llama.cpp. I may just want to remove it to avoid confusion, but have kept it around for reference on JetPack 5. You can attempt to build it on JetPack 6 with build.sh and if you get it working, I'd accept a PR with any updated patches needed to it's dockerfile.
Do you have a simple guide on how you did it for previous JetPack releases... Another question, do you happen to know if any of the other containers provide similar functionality like the one I'm interested in text-generation-inference... I need the generate API endpoint so I can build a chatbot frontend which could use the API endpoint for text generation...
Do you have a simple guide on how you did it for previous JetPack releases...
Sure @ihubanov, first read these docs for getting setup to build containers:
Then you would try ./build.sh text-generation-inference
(that is what had been working on JetPack 5)
Another question, do you happen to know if any of the other containers provide similar functionality like the one I'm interested in text-generation-inference... I need the generate API endpoint so I can build a chatbot frontend which could use the API endpoint for text generation...
Yes, both llama.cpp and text-generation-webui can expose OpenAI-compatible endpoints (and I have working containers on JetPack 5/6 for both of these), and you can use OpenAI's python client (or REST from Javascript/ect) to query it. The OpenAI API is understandingly prevalent by this point, so it seems like a pretty good option to go with that has good adoption. It's fine for text-based chat completion, however when it comes to streaming video/audio for multimodal models I personally prefer to have the LLM running directly in the same process to avoid latency and memory overhead associated with transferring that multimedia across processes.
Man when I try the ./build.sh text-generation-inference
it crashes with 'package 'text-generation-inference' not found... Also ./build.sh --list-packages doesn't show it... Tried replacing the config.py with the one from webui, but it seems it is not the case... Not sure why this package is not found, but it is actually in the same place where webui is.
@ihubanov text-generation-inference gets disabled on JP6 because it depends on bitsandbytes, which has a bunch of patches applied to it on JP5 which I don't feel like updating because bitsandbytes is slow and the quantization's been surpassed by AutoGPTQ, AWQ, ect. Also regarding TGI there are faster/easier options like MLC and llama.cpp
Is there a guide on how to build an image for my Jetson AGX Orin 64GB:
L4T_VERSION=36.2.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2.140