Created `complete` tool to allow unsure answers

jamesbraza commented 1 week ago

Motivation

Part 1

Based on our Environment.step's done condition: https://github.com/Future-House/paper-qa/blob/v5.4.0/paperqa/agents/env.py#L199-L205

We currently (as of v5.4) incorrectly conclude a rollout is done on gen_answer's answers such as:

"Based on the sources provided, it appears no one has done x."
"The sources provide some evidence about gene X, but gene Y is not discussed."

Part 2

We realized that https://github.com/Future-House/paper-qa/pull/671 is:

Useful for agents backed by LLM tool_choice="auto" (OpenAI default). An example agent here is LangChain's OpenAIFunctionsAgent
Not useful for agents backed by LLM tool_choice="required", since they cannot specify empty tool calls. This includes our ldp 0.14 Agents or aviary 0.10 ToolSelector

Part 3

In general, empty tool calls signifying done is probably not a generalized assumption.

Implementation

To be generally applicable, here we introduce another tool, the complete tool. When invoked, this tool signifies the rollout is done. This enables:

The unsure answers (see motivation part 1 above) to be non-terminal/intermediary outputs
Simplified PaperQAEnvironment.step such that it doesn't special case done logic for empty tool calls, which in turn enables simplification of the PaperQAEnvironment.reset observations
Trajectories not ending in complete are now clear truncations or failures

Notably this change also simplifies GradablePaperQAEnvironment.step, now we can directly parse the state.session.answer when done, as opposed to parsing the messages.

The tradeoffs here are due to a fifth tool being added, which:

Increases API prompt token usage
Increases the number of moving parts in the system

jamesbraza commented 6 days ago

Thanks @whitead for the LGTM, appreciated.

@mskarlin and I were discussing today, and we figured out we can get rid of the "cannot answer" string literal checking in the code base by moving the complete tool to have a bool argument is_sure that basically plays the role of AgentStatus.UNSURE.

In other words, we're planning to move the "unsure" definition from the environment to the agent.

jamesbraza commented 6 days ago

I decided to resolve the future comments in another PR. Going to merge this one as it's a somewhat atomic change

Future-House / paper-qa