geekan / MetaGPT

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
https://deepwisdom.ai/
MIT License
45.08k stars 5.35k forks source link

How to associate inputs and outputs of multiple ActionNodes #1445

Closed chenk-gd closed 1 month ago

chenk-gd commented 3 months ago

What is the recommended practice if one wishes to use the output of the previous ActionNode as part of the prompt of the next ActionNode's fill method? The current MetaGPT default (strgy='complex') has no relationship between individual ActionNodes. It just collects the output of each child ActionNode and puts it into the upper-level ActionNode's instruct_content variable.

        elif strgy == "complex":
            # 这里隐式假设了拥有children
            tmp = {}
            for _, i in self.children.items():
                if exclude and i.key in exclude:
                    continue
                child = await i.simple_fill(schema=schema, mode=mode, images=images, timeout=timeout, exclude=exclude)
                tmp.update(child.instruct_content.model_dump())
            cls = self._create_children_class()
            self.instruct_content = cls(**tmp)
            return self

Or, is this a recommended ActionNode implementation for this kind of requirement?

iorisa commented 3 months ago

If you want to use the output of the previous ActionNode as part of the fill method prompt for the next ActionNode, that means the previous action is done. So these should be two different actions. ActionNode is only responsible for modularizing different blocks of prompt within an action and do not support cross-action associations.

chenk-gd commented 3 months ago

If the ActionNode is responsible for modularizing the Prompt, but actually with the parameter strgy='complex', each ActionNode builds the Prompt individually and calls LLM to get the result. This seems inconsistent with 'ActionNode is responsible for modularizing Prompt'. What scenarios does this apply to?

There is such a task. The first step is to analyze a certain document to extract the relevant part A, the next step is to generate B, C, and D (examples are given separately), based on A, and finally, based on another example, using B, C, and D, the final result is generated. So what is the recommended way for this task, in the framework of Action/ActionNode?

Another question, ActionNode supports output examples. But how to represent examples that contain both inputs and outputs?

chenk-gd commented 3 months ago

And for the example of printing the Fibonacci series using ActionNode, it also takes the output of the previous ActionNode (SIMPLE_THINK_NODE) as input to the next ActionNode (SIMPLE_CHECK_NODE):

        elif strgy == "complex":
            # 这里隐式假设了拥有children
            child_context = context  # 输入context作为第一个子节点的context
            for _, i in self.children.items():
                i.set_context(child_context)  # 为子节点设置context
                child = await i.simple_fill(schema=schema, mode=mode)
                child_context = child.content  # 将返回内容(child.content)作为下一个子节点的context
iorisa commented 3 months ago

1. ActionNode

write_prd_an.py provides some usage examples.

NODES = [
    LANGUAGE,
    PROGRAMMING_LANGUAGE,
    ORIGINAL_REQUIREMENTS,
    PROJECT_NAME,
    PRODUCT_GOALS,
    USER_STORIES,
    COMPETITIVE_ANALYSIS,
    COMPETITIVE_QUADRANT_CHART,
    REQUIREMENT_ANALYSIS,
    REQUIREMENT_POOL,
    UI_DESIGN_DRAFT,
    ANYTHING_UNCLEAR,
]

REFINED_NODES = [
    LANGUAGE,
    PROGRAMMING_LANGUAGE,
    REFINED_REQUIREMENTS,
    PROJECT_NAME,
    REFINED_PRODUCT_GOALS,
    REFINED_USER_STORIES,
    COMPETITIVE_ANALYSIS,
    COMPETITIVE_QUADRANT_CHART,
    REFINED_REQUIREMENT_ANALYSIS,
    REFINED_REQUIREMENT_POOL,
    UI_DESIGN_DRAFT,
    ANYTHING_UNCLEAR,
]

WRITE_PRD_NODE = ActionNode.from_children("WritePRD", NODES)
REFINED_PRD_NODE = ActionNode.from_children("RefinedPRD", REFINED_NODES)

NODES and REFINED_NODES reuse some prompt module such as LANGUAGE, PROGRAMMING_LANGUAGE, and so on:

LANGUAGE = ActionNode(
    key="Language",
    expected_type=str,
    instruction="Provide the language used in the project, typically matching the user's requirement language.",
    example="en_us",
)

PROGRAMMING_LANGUAGE = ActionNode(
    key="Programming Language",
    expected_type=str,
    instruction="Python/JavaScript or other mainstream programming language.",
    example="Python",
)
  1. simple strgy merge all ActionNode object inputs into a single prompt:
      async def simple_fill(
        self, schema, mode, images: Optional[Union[str, list[str]]] = None, timeout=USE_CONFIG_TIMEOUT, exclude=None
    ):
        prompt = self.compile(context=self.context, schema=schema, mode=mode, exclude=exclude)
        ......
        content, scontent = await self._aask_v1(
                prompt, class_name, mapping, images=images, schema=schema, timeout=timeout
            )
        ......
  2. complex strgy run each ActionNode object isolatedly and merge all outputs into a single dict:
    tmp = {}
    for _, i in self.children.items():
      if exclude and i.key in exclude:
          continue
      child = await i.simple_fill(schema=schema, mode=mode, images=images, timeout=timeout, exclude=exclude)
      tmp.update(child.instruct_content.model_dump())

    For example:

    WRITE_PRD_NODE.fill(strgy="simple", ...)  # merge all children ActionNode object inputs into a single prompt.
    WRITE_PRD_NODE.fill(strgy="complex", ...)  # merge all children ActionNode object outputs into a single dict.

2. DAG Flow

You can refer to the qa_engineer.py to build the flow you need.

    async def _act(self) -> Message:
        ......
        code_filters = any_to_str_set({PrepareDocuments, SummarizeCode})
        test_filters = any_to_str_set({WriteTest, DebugError})
        run_filters = any_to_str_set({RunCode})
        for msg in self.rc.news:
            # Decide what to do based on observed msg type, currently defined by human,
            # might potentially be moved to _think, that is, let the agent decides for itself
            if msg.cause_by in code_filters:
                # engineer wrote a code, time to write a test for it
                await self._write_test(msg) # publish_message(AIMessage(cause_by=WriteTest, send_to=self))
            elif msg.cause_by in test_filters:
                # I wrote or debugged my test code, time to run it
                await self._run_code(msg) # publish_message(AIMessage(cause_by=RunCode, send_to=self))
            elif msg.cause_by in run_filters:
                # I ran my test code, time to fix bugs, if any
                await self._debug_error(msg) # publish_message(AIMessage(cause_by=DebugError, send_to=self))
            elif msg.cause_by == any_to_str(UserRequirement):
                return await self._parse_user_requirement(msg)  # publish_message(AIMessage(cause_by=PrepareDocuments, send_to=self))
        ......

Where:

  1. for msg in self.rc.news processes each message sent to itself one by one;
  2. Executing self._write_test(msg) will send a new WriteTest message to itself, and this message will be added to self.rc.news;
  3. Executing self._run_code(msg) will send a new RunCode message to itself, and this message will be added to self.rc.news;
  4. Executing self._debug_error(msg) will send a new DebugError message to itself, and this message will be added to self.rc.news.

You can refer to QaEngineer's message passing approach to implement your DAG flow. You can use memory or external storage to wait until the results of B, C, and D are all collected, and then publish a new message to trigger the subsequent workflow.

More Details: Agent Communication

https://docs.deepwisdom.ai/main/en/guide/in_depth_guides/agent_communication.html

chenk-gd commented 2 months ago

Thank you for answering. When constructing a DAG flow, suppose there are 3 actions A, B and C. If A and B are independent but C depends both on A and B, how is this case implemented?

iorisa commented 2 months ago
  1. Who consumes the data, and who is responsible for determining whether the conditions are met.
  2. At the end of the execution, role just emit the results, regardless of who is consuming the data downstream.
better629 commented 1 month ago

Due to the lack of updates or replies by the user for a long time, we will close it. Please reopen it if necessary.