TabbyAPI is a local LLM server that, as opposed to Ollama and LM Studio, uses the ExLlamaV2 inference backend instead of Llama.cpp, which has a much faster TTFT (time to first token) and slightly higher TPS (tokens per second), but at the cost of only supporting GPU inference.
It is a good choice for people with decent GPUs that want to minimize response latency.
TabbyAPI is a local LLM server that, as opposed to Ollama and LM Studio, uses the ExLlamaV2 inference backend instead of Llama.cpp, which has a much faster TTFT (time to first token) and slightly higher TPS (tokens per second), but at the cost of only supporting GPU inference.
It is a good choice for people with decent GPUs that want to minimize response latency.