SalesforceAIResearch / AgentLite

Apache License 2.0
537 stars 60 forks source link

Running evaluate_hotpot_qa.py throws an error #29

Open gabriead opened 4 months ago

gabriead commented 4 months ago

Hey guys, I appreciated reading your paper. However I just wanted to see your eval results for myself and running the script results in the follwoing error when I execute python evaluate_hotpot_qa.py --llm gpt-4o --agent_arch act. I have implemented an adapter for AzureOpenAI in order to use gpt-4o which I have tested with your framework beforehand and worked well.

Connected to pydev debugger (build 232.9559.58)
Processing:   0%|          | 0/100 [00:00<?, ?it/s]Agent wiki_search_agent receives the following TaskPackage:
[
    Task ID: 03244001-d72a-4b5e-b5a1-bf22bde7636d
    Instruction: Which magazine was started first Arthur's Magazine or First for Women?
]
====wiki_search_agent starts execution on TaskPackage 03244001-d72a-4b5e-b5a1-bf22bde7636d====
Agent wiki_search_agent takes 0-step Action:
{
    name: Think
    params: {'response': "I need to search Arthur's Magazine and First for Women, find their start dates, and then compare them to determine which was started first."}
}
Observation: OK
Agent wiki_search_agent takes 1-step Action:
{
    name: Wikipedia_Search
    params: {'query': "Arthur's Magazine"}
}
Observation: Arthur's Magazine (1844–1846) was an American literary periodical published in Philadelphia in the 19th century. Edited by Timothy Shay Arthur, it featured work by Edgar A. Poe, J.H. Ingraham, Sarah Josepha Hale, Thomas G. Spear, and others. In May 1846 it was merged into Godey's Lady's Book.
A few years later Arthur would launch a new publication entitled Arthur's Home Magazine.
Agent wiki_search_agent takes 2-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'First for Women'}
}
Observation: First for Women is a woman's magazine published by A360media in the US. The magazine was started in 1989 by Bauer Media Group. In 2011 the circulation of the magazine was 1,310,696 copies.
Agent wiki_search_agent takes 3-step Action:
{
    name: Finish
    params: {'response': "Arthur's Magazine"}
}
Observation: Arthur's Magazine
=========wiki_search_agent finish execution. TaskPackage[ID:03244001-d72a-4b5e-b5a1-bf22bde7636d] status:
[
    completion: completed
    answer: Arthur's Magazine
]
==========
Processing:   1%|          | 1/100 [00:55<1:31:58, 55.74s/it]Agent wiki_search_agent receives the following TaskPackage:
[
    Task ID: 03244001-d72a-4b5e-b5a1-bf22bde7636d
    Instruction: The Oberoi family is part of a hotel company that has a head office in what city?
]
====wiki_search_agent starts execution on TaskPackage 03244001-d72a-4b5e-b5a1-bf22bde7636d====
Agent wiki_search_agent takes 0-step Action:
{
    name: Think
    params: {'response': 'I need to search for the Oberoi family and their associated hotel company, and then find the location of its head office.'}
}
Observation: OK
Agent wiki_search_agent takes 1-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'Oberoi family hotel company'}
}
Observation: The Oberoi Group is a luxury hotel group with its head office in New Delhi, India. Founded in 1934, the company owns and operates 32 luxury hotels and two river cruise ships in 7 countries, primarily under its Oberoi Hotels & Resorts and Trident brands. The group also operates The Oberoi Centre for Learning and Development, which is regarded as one of Asia's top institutions for hospitality education.

Agent wiki_search_agent takes 2-step Action:
{
    name: Finish
    params: {'response': 'New Delhi, India'}
}
Observation: New Delhi, India
=========wiki_search_agent finish execution. TaskPackage[ID:03244001-d72a-4b5e-b5a1-bf22bde7636d] status:
[
    completion: completed
    answer: New Delhi, India
]
==========
Processing:   2%|▏         | 2/100 [01:10<51:35, 31.58s/it]  Agent wiki_search_agent receives the following TaskPackage:
[
    Task ID: 03244001-d72a-4b5e-b5a1-bf22bde7636d
    Instruction:  What nationality was James Henry Miller's wife?
]
====wiki_search_agent starts execution on TaskPackage 03244001-d72a-4b5e-b5a1-bf22bde7636d====
Agent wiki_search_agent takes 0-step Action:
{
    name: Think
    params: {'response': 'I need to search for James Henry Miller and find information about his wife to determine her nationality.'}
}
Observation: OK
Agent wiki_search_agent takes 1-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'James Henry Miller'}
}
Observation: Henry Valentine Miller (December 26, 1891 – June 7, 1980) was an American novelist, short story writer and essayist. He broke with existing literary forms and developed a new type of semi-autobiographical novel that blended character study, social criticism, philosophical reflection, stream of consciousness, explicit language, sex, surrealist free association, and mysticism. His most characteristic works of this kind are Tropic of Cancer, Black Spring, Tropic of Capricorn, and the trilogy The Rosy Crucifixion, which are based on his experiences in New York City and Paris (all of which were banned in the United States until 1961). He also wrote travel memoirs and literary criticism, and painted watercolors.
Agent wiki_search_agent takes 2-step Action:
{
    name: Action:Wikipedia_Search[{"query": "James Henry Miller wife nationality"}]
    params: {}
}
Observation: "This is the wrong action to call. Please check your available action list.
Agent wiki_search_agent takes 3-step Action:
{
    name: Action:Wikipedia_Search[{"query": "Ewan MacColl"}]
    params: {}
}
Observation: "This is the wrong action to call. Please check your available action list.
Agent wiki_search_agent takes 4-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'Ewan MacColl'}
}
Processing:   2%|▏         | 2/100 [01:20<1:05:29, 40.09s/it]
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2023.2.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1500, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2023.2.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:\Users\adria\AgentLite\benchmark\hotpotqa\evaluate_hotpot_qa.py", line 170, in <module>
    f1, acc = run_hotpot_qa_agent(level=args.level, llm_name=args.llm, agent_arch=args.agent_arch, PROMPT_DEBUG_FLAG=args.debug)
  File "C:\Users\adria\AgentLite\benchmark\hotpotqa\evaluate_hotpot_qa.py", line 121, in run_hotpot_qa_agent
    response = agent(test_task_pack)
  File "C:\Users\adria\AgentLite\agentlite\agents\BaseAgent.py", line 115, in __call__
    self.execute(task)
  File "C:\Users\adria\AgentLite\agentlite\agents\BaseAgent.py", line 150, in execute
    observation = self.forward(task, action)
  File "C:\Users\adria\AgentLite\agentlite\agents\BaseAgent.py", line 238, in forward
    observation = action(**agent_act.params)
  File "C:\Users\adria\AgentLite\benchmark\hotpotqa\SearchActions.py", line 19, in __call__
    article = wikipedia.page(search_results[0])
  File "C:\Users\adria\.conda\envs\AgentLite\lib\site-packages\wikipedia\wikipedia.py", line 276, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "C:\Users\adria\.conda\envs\AgentLite\lib\site-packages\wikipedia\wikipedia.py", line 299, in __init__
    self.__load(redirect=redirect, preload=preload)
  File "C:\Users\adria\.conda\envs\AgentLite\lib\site-packages\wikipedia\wikipedia.py", line 345, in __load
    raise PageError(self.title)
wikipedia.exceptions.PageError: Page id "ewan malcolm" does not match any pages. Try another id!
python-BaseException

The adapter in agents_llm.py lokks like this:

def get_llm_backend(llm_config: LLMConfig):
    llm_name = llm_config.llm_name
    llm_provider = llm_config.provider
    if llm_name in OPENAI_CHAT_MODELS:
        return LangchainChatModel(llm_config)
    if llm_name in "gpt-4o":
        llm_name = "gpt-4o"
        llm_config_dict = {
            "llm_name": llm_name,
            "temperature": 0.01,
            "context_len": 4096,
        }
        llm_config = LLMConfig(llm_config_dict)
        return AzureOpenAIChatLLM(llm_config=llm_config)

    elif llm_name in OPENAI_LLM_MODELS:
        return LangchainLLM(llm_config)
    else:
        return LangchainLLM(llm_config)
    # TODO: add more llm providers and inference APIs but for now we are using langchainLLM as the default
    # Using other LLM providers will require additional setup and configuration
    # We suggest subclass BaseLLM and implement the run method for the specific provider in your own best practices
gabriead commented 4 months ago

And another error but that is different then the one above running it on Windows:

Processing:   0%|          | 0/100 [00:00<?, ?it/s]Agent wiki_search_agent receives the following TaskPackage:
[
    Task ID: d8276742-09bf-43dc-b9c6-56837048a849
    Instruction: Which magazine was started first Arthur's Magazine or First for Women?
]
====wiki_search_agent starts execution on TaskPackage d8276742-09bf-43dc-b9c6-56837048a849====
Agent wiki_search_agent takes 0-step Action:
{
    name: Think
    params: {'response': "I need to search for the start dates of Arthur's Magazine and First for Women to determine which was started first."}
}
Observation: OK
Agent wiki_search_agent takes 1-step Action:
{
    name: Wikipedia_Search
    params: {'query': "Arthur's Magazine"}
}
Observation: Arthur's Magazine (1844–1846) was an American literary periodical published in Philadelphia in the 19th century. Edited by Timothy Shay Arthur, it featured work by Edgar A. Poe, J.H. Ingraham, Sarah Josepha Hale, Thomas G. Spear, and others. In May 1846 it was merged into Godey's Lady's Book.
A few years later Arthur would launch a new publication entitled Arthur's Home Magazine.
Agent wiki_search_agent takes 2-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'First for Women'}
}
Observation: First for Women is a woman's magazine published by A360media in the US. The magazine was started in 1989 by Bauer Media Group. In 2011 the circulation of the magazine was 1,310,696 copies.
Agent wiki_search_agent takes 3-step Action:
{
    name: Finish
    params: {'response': "Arthur's Magazine"}
}
Observation: Arthur's Magazine
=========wiki_search_agent finish execution. TaskPackage[ID:d8276742-09bf-43dc-b9c6-56837048a849] status:
[
    completion: completed
    answer: Arthur's Magazine
]
==========
Processing:   1%|          | 1/100 [00:08<14:42,  8.91s/it]Agent wiki_search_agent receives the following TaskPackage:
[
    Task ID: d8276742-09bf-43dc-b9c6-56837048a849
    Instruction: The Oberoi family is part of a hotel company that has a head office in what city?
]
====wiki_search_agent starts execution on TaskPackage d8276742-09bf-43dc-b9c6-56837048a849====
Agent wiki_search_agent takes 0-step Action:
{
    name: Think
    params: {'response': 'I need to search for the Oberoi family and their hotel company to find the city where the head office is located.'}
}
Observation: OK
Agent wiki_search_agent takes 1-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'Oberoi family hotel company head office'}
}
Observation: The Oberoi Group is a luxury hotel group with its head office in New Delhi, India. Founded in 1934, the company owns and operates 32 luxury hotels and two river cruise ships in 7 countries, primarily under its Oberoi Hotels & Resorts and Trident brands. The group also operates The Oberoi Centre for Learning and Development, which is regarded as one of Asia's top institutions for hospitality education.

Agent wiki_search_agent takes 2-step Action:
{
    name: Finish
    params: {'response': 'New Delhi, India'}
}
Observation: New Delhi, India
=========wiki_search_agent finish execution. TaskPackage[ID:d8276742-09bf-43dc-b9c6-56837048a849] status:
[
    completion: completed
    answer: New Delhi, India
]
==========
Processing:   2%|▏         | 2/100 [00:17<14:28,  8.86s/it]Agent wiki_search_agent receives the following TaskPackage:
[
    Task ID: d8276742-09bf-43dc-b9c6-56837048a849
    Instruction:  What nationality was James Henry Miller's wife?
]
====wiki_search_agent starts execution on TaskPackage d8276742-09bf-43dc-b9c6-56837048a849====
Agent wiki_search_agent takes 0-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'James Henry Miller wife'}
}
Observation: Henry Valentine Miller (December 26, 1891 – June 7, 1980) was an American novelist, short story writer and essayist. He broke with existing literary forms and developed a new type of semi-autobiographical novel that blended character study, social criticism, philosophical reflection, stream of consciousness, explicit language, sex, surrealist free association, and mysticism. His most characteristic works of this kind are Tropic of Cancer, Black Spring, Tropic of Capricorn, and the trilogy The Rosy Crucifixion, which are based on his experiences in New York City and Paris (all of which were banned in the United States until 1961). He also wrote travel memoirs and literary criticism, and painted watercolors.
Agent wiki_search_agent takes 1-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'James Henry Miller'}
}
Observation: Henry Valentine Miller (December 26, 1891 – June 7, 1980) was an American novelist, short story writer and essayist. He broke with existing literary forms and developed a new type of semi-autobiographical novel that blended character study, social criticism, philosophical reflection, stream of consciousness, explicit language, sex, surrealist free association, and mysticism. His most characteristic works of this kind are Tropic of Cancer, Black Spring, Tropic of Capricorn, and the trilogy The Rosy Crucifixion, which are based on his experiences in New York City and Paris (all of which were banned in the United States until 1961). He also wrote travel memoirs and literary criticism, and painted watercolors.
Agent wiki_search_agent takes 2-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'James Henry Miller wife nationality'}
}
Observation: Guyanese nationality law is regulated by the 1980 Constitution of Guyana, as amended; the Citizenship Act of 1967, and its revisions; and various British Nationality laws. These laws determine who is, or is eligible to be, a national of Guyana. Guyanese nationality is typically obtained either on the principle of jus soli, i.e. by birth in Guyana; or under the rules of jus sanguinis, i.e. by birth abroad to parents with Guyanese nationality. It can also be granted to persons with an affiliation to the country, or to a permanent resident who has lived in the country for a given period of time through naturalisation. There is not currently a program in Guyana for persons to acquire nationality through investment in the country. Nationality establishes one's international identity as a member of a sovereign nation. Though it is not synonymous with citizenship, for rights granted under domestic law for domestic purposes, the United Kingdom, and thus the Commonwealth, have traditionally used the words interchangeably.
Agent wiki_search_agent takes 3-step Action:
{
    name: Think
    params: {'response': "I did not get specific information about James Henry Miller's wife from the searches. I need to refine my search to find details about his wife's nationality."}
}
Observation: OK
Agent wiki_search_agent takes 4-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'James Henry Miller spouse'}
}
Observation: Henry Valentine Miller (December 26, 1891 – June 7, 1980) was an American novelist, short story writer and essayist. He broke with existing literary forms and developed a new type of semi-autobiographical novel that blended character study, social criticism, philosophical reflection, stream of consciousness, explicit language, sex, surrealist free association, and mysticism. His most characteristic works of this kind are Tropic of Cancer, Black Spring, Tropic of Capricorn, and the trilogy The Rosy Crucifixion, which are based on his experiences in New York City and Paris (all of which were banned in the United States until 1961). He also wrote travel memoirs and literary criticism, and painted watercolors.
Agent wiki_search_agent takes 5-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'James Henry Miller'}
}
Observation: Henry Valentine Miller (December 26, 1891 – June 7, 1980) was an American novelist, short story writer and essayist. He broke with existing literary forms and developed a new type of semi-autobiographical novel that blended character study, social criticism, philosophical reflection, stream of consciousness, explicit language, sex, surrealist free association, and mysticism. His most characteristic works of this kind are Tropic of Cancer, Black Spring, Tropic of Capricorn, and the trilogy The Rosy Crucifixion, which are based on his experiences in New York City and Paris (all of which were banned in the United States until 1961). He also wrote travel memoirs and literary criticism, and painted watercolors.
Agent wiki_search_agent takes 6-step Action:
{
    name: Think
    params: {'response': "I didn't find specific information about James Henry Miller's wife. I need to refine my search further to identify her nationality more accurately."}
}
Observation: OK
Agent wiki_search_agent takes 7-step Action:
{
    name: Wikipedia_Search
    params: {'query': 'James Henry Miller wife nationalities'}
}
Observation: Henry Box Brown (c. 1815 – June 15, 1897) was an enslaved man from Virginia who escaped to freedom at the age of 33 by arranging to have himself mailed in a wooden crate in 1849 to abolitionists in Philadelphia, Pennsylvania.
For a short time, Brown became a noted abolitionist speaker in the northeast United States. As a public figure and fugitive slave, Brown felt extremely endangered by the passage of the Fugitive Slave Law of 1850, which increased the pressure to capture escaped slaves. He moved to England and lived there for 25 years, touring with an anti-slavery panorama, and becoming a magician and showman.
Brown married and started a family with an English woman, Jane Floyd. Brown's first wife, Nancy, remained in slavery. Brown returned to the United States with his English family in 1875, where he continued to earn a living as an entertainer. He toured and performed as a magician, speaker, and mesmerist until at least 1889. The last decade of his life (1886–97) was spent in Toronto, where he died in 1897.
Processing:   2%|▏         | 2/100 [00:34<28:06, 17.21s/it]
Traceback (most recent call last):
  File "C:\Users\adria\AgentLite\agentlite\logging\terminal_logger.py", line 41, in __save_log__
    f.write(str_color_remove(log_str) + "\n")
  File "C:\Users\adria\.conda\envs\AgentLite\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2009' in position 32: character maps to <undefined>
python-BaseException