A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
780
stars
40
forks
source link
add llm input_embeddings layer to enable text-to-vector capabilities,… #6
Closed
peng3307165 closed 10 months ago
add llm input_embeddings layer to enable text-to-vector capabilities and uploade script for extracting GPT-NeoX embedding layer.