TrelisResearch / one-click-llms

One click templates for inferencing Language Models
71 stars 9 forks source link

one-click-llms

These one click templates allow you to quickly boot up an API for a given language model.

Advanced inferencing scripts (incl. for function calling) are available for purchase here.

Note: vLLM runs into issues sometimes if the pod template does not have the correct CUDA drivers. Unfortunately there is no way to know when picking a GPU. An issue has been raised here. As an alternative, you can run TGI (and even query in openai style, guide here). TGI is faster than vLLM and recommended in general.

Runpod One-Click Templates

To support the Trelis Research YouTube channel, you can sign up for an account with this link. Trelis is also supported by a 1-2% commission by your use of one-click templates.

Fine-tuning Notebook Setup

MoonDream Multi-modal API (openai-ish)

Text Generation Inference (fastest):

vLLM (requires an A100 or H100 or A6000, i.e. ampere architecture):

Note: The vLLM image has compatibility issues with certain CUDA drivers, leading to issues on certain pods. A6000 Ada is typically an option that works.

llama.cpp One-click templates:

Post a new issue if you would like other templates

Vast AI One-Click Templates

To support the Trelis Research YouTube channel, you can sign up for an account with this affiliate link. Trelis is also supported by a 1-2% commission by your use of one-click templates.

Fine-tuning Notebook Setup

Text Generation Inference (fastest):

vLLM (requires an A100 or H100 or A6000, i.e. ampere architecture):

llama.cpp One-click templates:

Function-calling One-Click Templates

One-click templates for function-calling are located on the HuggingFace model cards. Check out the collection here.

 Changelog

Feb 16 2023:

Jan 21 2023:

Jan 9 2023:

Dec 30 2023:

Dec 29 2023: