feat: jetpack-l4t based llama.cpp container

User Story

As a user I want to run accelerated models on Jetson AGX Orins So that I can take advantage of the device's NPUs

Additional context

Compiling using the l4t-jetpack nvidia container: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-jetpack Example of building containerized with all build tools: https://github.com/dusty-nv/jetson-containers/tree/dev

Containers don't seem to need --gpus=all, and the Jetson + Jetpack comes with it's own nvidia container toolkit ready to go. It looks like this is a combination of the CPU UDS bundle with the inferencing containers being built on the jetpack image.

defenseunicorns / leapfrogai

feat: jetpack-l4t based llama.cpp container #846

User Story

Additional context