Open AnthonyvW opened 1 year ago
Brief research turned up Llama.cpp and KoboldCPP for backends. Other alternatives possibly exist.
Got several models working on LocalGPT for a backend. This backend allows for querying local files for use in the LLM's responses.
Backends that have been tried
Any 13B model should work as it fits in about 12gb of VRAM using the following two backends
ExLlama2 - Was too unstable, would go off topic within the first prompt or two ExLlama - Is stable as long as end on newline is enabled. Unfortunately this results in short responses. Oobabooga - Is stable, but is slower than both ExLlama and ExLlama2.
Research local Large Language Models that can fit within 16GB VRAM, respond within 5 seconds (the less the better), and are available for commercial use. You will need to research both a back end to run the LLM and the LLM to use.
The LLM will need to be able to respond in a JSON or YAML format, accurately say items from a list and be able to give yes/no answers.