We need to diagram the moving parts and the hardware costs required to support them
In broad strokes, we have the following components
Slack
Chat service - listens to slack for inbound requests and constructs LLM requests to
LLM proxy service - accepts OpenAI API compatible requests (translates them as needed) and routes them to the configured LLM endpoint. Also receives LLM response and translates as necessary to OpenAI API compatible response
Databricks hosted LLM's - accept queries from proxy and return results
We need to diagram the moving parts and the hardware costs required to support them
In broad strokes, we have the following components
Slack Chat service - listens to slack for inbound requests and constructs LLM requests to LLM proxy service - accepts OpenAI API compatible requests (translates them as needed) and routes them to the configured LLM endpoint. Also receives LLM response and translates as necessary to OpenAI API compatible response Databricks hosted LLM's - accept queries from proxy and return results