add llm input_embeddings layer to enable text-to-vector capabilities,…

codefuse-ai / ModelCache

A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.

Other

780 stars 40 forks source link

Closed peng3307165 closed 10 months ago

peng3307165 commented 10 months ago

add llm input_embeddings layer to enable text-to-vector capabilities and uploade script for extracting GPT-NeoX embedding layer.