Research Local LLMs that fit within 16gb VRAM and can respond within 5 seconds

UAA-Robo / CoffeeBot

Robot Arm Codebase

3 stars 0 forks source link

Research Local LLMs that fit within 16gb VRAM and can respond within 5 seconds #34

Open AnthonyvW opened 1 year ago

AnthonyvW commented 1 year ago

Research local Large Language Models that can fit within 16GB VRAM, respond within 5 seconds (the less the better), and are available for commercial use. You will need to research both a back end to run the LLM and the LLM to use.

The LLM will need to be able to respond in a JSON or YAML format, accurately say items from a list and be able to give yes/no answers.

AnthonyvW commented 1 year ago

Brief research turned up Llama.cpp and KoboldCPP for backends. Other alternatives possibly exist.

AnthonyvW commented 1 year ago

Got several models working on LocalGPT for a backend. This backend allows for querying local files for use in the LLM's responses.

AnthonyvW commented 1 year ago

Backends that have been tried

Any 13B model should work as it fits in about 12gb of VRAM using the following two backends

ExLlama2 - Was too unstable, would go off topic within the first prompt or two ExLlama - Is stable as long as end on newline is enabled. Unfortunately this results in short responses. Oobabooga - Is stable, but is slower than both ExLlama and ExLlama2.