Finish action customization.

pluswave commented 2 months ago

If I want to an agent to output multiline code, it is hard to use current framework .

that is because the syntax:

    Finish [ {"response": "content" } ]

the code should be included in the content, for gpt-3.5 , multiline content inside a json formatted string is error-prone.

If ,the customization code can easily :

re-parse the llm-output if a action parser fails
specify other format other than current FinishAction

it will be easier.

JimSalesforce commented 2 months ago

Thanks for pointing this out. Actually, the only constraint for Finish action is the name should be 'Finish', the output parameters can still be defined in your customized way. To achieve this, there may be two ways to solve it

First, you could specify the parameter doc for finish action to be a coding block, e.g. you define a new action as

class FinishAction(BaseAction):
def __init__(self) -> None:
    action_name = "Finish"
    action_desc = "Complete the task with a code block."
    params_doc = {
        'code': "this is the finish action parameter. Input a block of code to solve the task."
    }
    super().__init__(
        action_name=action_name,
        action_desc=action_desc,
        params_doc=params_doc,
    )

def __call__(self, code):
    return code

Then, you can either substitute the finish action as this new action or insert it to the agent.actions property. See this benchmark example for redefine the agent.actions.

Another way is to define a new action, CodeGen as follows:

class CodeGen(BaseAction):
def __init__(self) -> None:
    action_name = "code_gen"
    action_desc = "generating a code block"
    params_doc = {
        'code': "this is the action parameter. Input a block of code to solve the task."
    }
    super().__init__(
        action_name=action_name,
        action_desc=action_desc,
        params_doc=params_doc,
    )

def __call__(self, code):
  # save code if you want
  return code

I think the second way would be easier to implement. And you can still test the performance afterwards.

pluswave commented 2 months ago

I don't think you get my point. the key problem is the parse method for ALL actions are the same. that is , the prompt and the code, together define an action schema like the following format:

ActionName [ {"params1": "content", "params2":"content"} ]

which, force the llm to put the meaningful contents INSIDE a JSON format, which is error prone, because it is easy for llm to output like the format below instead of the format defined above when content has multi-line code:

ActionName

```python
# multi-line code here

```

Another issue for redefine FinishAction, it has some restriction, which is caused in the forward method of BaseAgent, it hardcode FinishAct instead of custom Action in the call chain:

        if agent_act.name == FinishAct.action_name:
            act_found_flag = True
            observation = "Task Completed."
            task.completion = "completed"
            task.answer = FinishAct(**agent_act.params) # this line hardcode FinishAct

JimSalesforce commented 2 months ago

I see. Yea, current action is based on the json output. If you want to define another type of actions, you may have to change the action type and define your own parser.

Yea, we also find this Finish issue. We will create a new PR to fix it.

SalesforceAIResearch / AgentLite

Finish action customization. #17