Closed glen0125 closed 8 months ago
Does llm only interact with compute_kernel and is it hidden from other modules?
Does llm only interact with compute_kernel and is it hidden from other modules?
All components interact with LLM through the interface provided by compute_kernel.
Hello,
I would like to clarify some aspects of our system:
Role and Agent Relationship: The relationship between a role and an agent is akin to an employee holding a position in a company. The company's workflow describes the function of the position, and different agents can hold the same position. Our design does not exclude one agent from holding multiple roles in a workflow, but it's best not to do so. Chat sessions are a separate issue; fundamentally, the state of the workflow consists of chat sessions. I might write an article about this, but for now, you can understand how it works by reading the code, especially which messages enter which chat sessions.
LLM Completion: Yes, currently, LLM completion is indivisible and is determined by the compute_kernel as to when it will execute. If there are no compute nodes in the compute_kernel that support LLM completion, the call will fail. LLM completion is a simple abstraction that allows our system to get up and running quickly from the MVP's perspective.
LLM Mode and Max Token: In fact, information like LLM and max_token is specified by the agent, which forms part of the agent's persona. Roles can change agents at any time, but once an agent is established, its LLM configuration cannot be changed. The max_token_size is the max_token of the LLM.
Compute Kernel: The compute_kernel is a singleton and is the most critical basic component of the system, initialized first. We can add/remove compute nodes it sees at runtime to control its behavior. There will undoubtedly be a lot of optimization opportunities in the future, but it's not the focus of this version. Done is better than perfect!
Compute Node Capabilities: The definition of a compute node's capabilities is not surprising. "Whether it supports LLM" is a capability, not a refinement of some functions within the LLM.
Chat Session and Historical Records: According to the token limit of the LLM, the most recent chat records are loaded. In fact, our design leaves it to the workflow developer to define the method of obtaining prompt words from the chat session's historical records.
Compute Kernel Scheduling: The implementation of the compute_kernel's schedule method contains a clever algorithm for finding compute nodes. This implementation can be simple or brilliant, but it will never be overly complicated.
I hope this clarifies your queries. Please feel free to reach out if you have any further questions.
There should be several levels of history:
After reading the Workflow code, I would like to share my understanding and questions to see if it aligns with the overall design. As I am interested in llm kernel packaging, my focus is primarily on this area.