Closed jamesbraza closed 6 days ago
Thanks @whitead for the LGTM, appreciated.
@mskarlin and I were discussing today, and we figured out we can get rid of the "cannot answer"
string literal checking in the code base by moving the complete
tool to have a bool
argument is_sure
that basically plays the role of AgentStatus.UNSURE
.
In other words, we're planning to move the "unsure" definition from the environment to the agent.
I decided to resolve the future comments in another PR. Going to merge this one as it's a somewhat atomic change
Motivation
Part 1
Based on our
Environment.step
'sdone
condition: https://github.com/Future-House/paper-qa/blob/v5.4.0/paperqa/agents/env.py#L199-L205We currently (as of v5.4) incorrectly conclude a rollout is
done
ongen_answer
'sanswer
s such as:Part 2
We realized that https://github.com/Future-House/paper-qa/pull/671 is:
tool_choice="auto"
(OpenAI default). An example agent here is LangChain'sOpenAIFunctionsAgent
tool_choice="required"
, since they cannot specify empty tool calls. This includes ourldp
0.14Agent
s oraviary
0.10ToolSelector
Part 3
In general, empty tool calls signifying
done
is probably not a generalized assumption.Implementation
To be generally applicable, here we introduce another tool, the
complete
tool. When invoked, this tool signifies the rollout isdone
. This enables:PaperQAEnvironment.step
such that it doesn't special casedone
logic for empty tool calls, which in turn enables simplification of thePaperQAEnvironment.reset
observationscomplete
are now clear truncations or failuresNotably this change also simplifies
GradablePaperQAEnvironment.step
, now we can directly parse thestate.session.answer
whendone
, as opposed to parsing themessages
.The tradeoffs here are due to a fifth tool being added, which: