QuangBK / localLLM_guidance

Local LLM ReAct Agent with Guidance
151 stars 24 forks source link

Use cycles with stop for agents #4

Closed joaopiopedreira closed 1 year ago

joaopiopedreira commented 1 year ago

Hi @QuangBK , congrats for your work here, really nice. I was playing around with your agent and it didn't work for what I was after, because some of my agents require the program to search the internet up to 5 times. So I've tried to extend your concept with a {{#each}} cycle. It definitely works, but I was hoping to find a better way to exit the cycle when the final solution is reached. Any thoughts? Thank you!

Here's your code, modified for my use case:


prompt_template = """
(...)
### Input:
{{question}}

### Response:
Question: {{question}}
{{#each iterations}}
Thought: {{gen 'thought' stop='\\n'}}
{{#if (== thought " I now know the final answer.")}}
Final Answer: {{gen 'final' stop='\\n'}}
{{await 'instruction'}}
{{/if}}
Action: {{select 'tool_name' options=valid_tools}}
Action Input: {{gen 'actInput' stop='\\n'}}
Observation:{{search actInput}}
{{/each}}"""

prompt = guidance(prompt_template)
result = prompt(
    question="How old is the Portuguese president's wife?", 
    search=searchGoogle, 
    valid_answers=valid_answers, 
    valid_tools=valid_tools,
    iterations=[1,2,3,4,5]
)
QuangBK commented 1 year ago

Hi, thank you for your comment. The main reason for this problem is the hallucinations. Sometimes, the LLM keeps the "Thought" loop instead of giving the " I now know the final answer. Final answer: .." even the observations provide them with enough information for answering. Since the Vicuna or WizardML is not trained to work with ReAct framework, the model tends to make its decision randomly (continue to search or give the final answer). That's why I forced LLM to give the answer after several loops. To directly solve this problem, we need to teach the model somehow to give the correct decision given previous observations. I suggest trying with a bigger model (33B or 65B), or we can fine-tune them to work directly with ReAct framework.