Delegate Research - Githubissues

nerfZael commented 5 months ago

Closes #244

A new DelegateResearchTokensAgent is implemented. It has 1 tool: research With this tool, it can trigger a sub chat between a new user proxy and the old research agent. They will then work on solving the research tasks.

This improves the capability of research tasks, as well as minimizes confusion that the other agents in the group chat can have from viewing research related messages.

However, the execution speed is significantly lower because of a couple of reasons:

There's an additional delegate call
There's sometimes an additional summary response from the main user_proxy
The DelegateResearchTokensAgent can be too liberal and sometime use more than one tool call
There's an additional summarization call NOTE: in the current implementation I have replaced the summarization LLM call with a message aggregation step to improve performance. While I believe the LLM call can be better (and more succinct) than the aggregation, I have not noticed any negative effects with our current suite of tests.

Further execution speed improvement could be done by running the multiple tool calls (if made) in parallel, but I have not seen an easy way to do that

nerfZael commented 5 months ago

/workflows/benchmarks agents/token

github-actions[bot] commented 5 months ago

Finished benchmarks

Test Run Summary

Run from: ./autotx/tests/agents/token
Base path: autotx/tests/agents/token/
Iterations: 5
Total Cost: $14.61
Total Success Rate (%): ${\color{red} \LARGE \texttt {84.17} \large \texttt { (88/-8)} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`research/test_advanced.py::test_research_and_swap_many_tokens_subjective_complex`	${\color{lightgreen} \large \texttt {20} \normalsize \texttt {(+10)} }$	${\color{lightgreen} \large \texttt {1}}$	${\color{lightgreen} \large \texttt {4}}$	5.03m	$0.59
`research/test_advanced.py::test_research_and_swap_many_tokens_subjective_simple`	${\color{lightgreen} \large \texttt {100} \normalsize \texttt {(+10)} }$	${\color{lightgreen} \large \texttt {5}}$	${\color{lightgreen} \large \texttt {0}}$	2.89m	$0.48
`research/test_research.py:1:`	${\color{yellow} \large \texttt {0} \normalsize \texttt {} }$	${\color{yellow} \large \texttt {0}}$	${\color{yellow} \large \texttt {5}}$	2s	$0.00
`research/test_research_and_swap.py::test_research_and_buy_multiple`	${\color{red} \large \texttt {0} \normalsize \texttt {(-100)} }$	${\color{red} \large \texttt {0}}$	${\color{red} \large \texttt {5}}$	1.69m	$0.30
`research/test_research_and_swap.py::test_research_and_buy_one`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	1.13m	$0.21
`research/test_research_swap_and_send.py::test_research_buy_multiple_send_multiple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	2.00m	$0.25
`research/test_research_swap_and_send.py::test_research_buy_one_send_multiple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	1.71m	$0.24
`research/test_research_swap_and_send.py::test_research_buy_one_send_one`	${\color{red} \large \texttt {80} \normalsize \texttt {(-20)} }$	${\color{red} \large \texttt {4}}$	${\color{red} \large \texttt {1}}$	1.63m	$0.27
`send/test_send.py::test_send_erc20`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	36s	$0.02
`send/test_send.py::test_send_erc20_parallel`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	39s	$0.03
`send/test_send.py::test_send_eth_multiple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	1.05m	$0.04
`send/test_send.py::test_send_native`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	25s	$0.02
`send/test_send.py::test_send_native_sequential`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	37s	$0.04
`test_swap.py::test_swap_complex_1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	50s	$0.04
`test_swap.py::test_swap_complex_2`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	51s	$0.05
`test_swap.py::test_swap_multiple_1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	42s	$0.03
`test_swap.py::test_swap_multiple_2`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	50s	$0.05
`test_swap.py::test_swap_native`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	30s	$0.02
`test_swap.py::test_swap_triple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	49s	$0.03
`test_swap.py::test_swap_with_non_default_token`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	29s	$0.02
`test_swap_and_send.py::test_send_and_swap_complex`	${\color{red} \large \texttt {20} \normalsize \texttt {(-80)} }$	${\color{red} \large \texttt {1}}$	${\color{red} \large \texttt {4}}$	49s	$0.05
`test_swap_and_send.py::test_send_and_swap_simple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	43s	$0.05
`test_swap_and_send.py::test_swap_and_send_complex`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	48s	$0.05
`test_swap_and_send.py::test_swap_and_send_simple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	1.18m	$0.03

Total run time: 139.84 minutes

nerfZael commented 5 months ago

/workflows/benchmarks agents/token/research/test_research.py

github-actions[bot] commented 5 months ago

Running benchmarks...

nerfZael commented 5 months ago

/workflows/benchmarks agents/token/test_swap_and_send.py::test_send_and_swap_complex 10

github-actions[bot] commented 5 months ago

Finished benchmarks Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token/test_swap_and_send.py::test_send_and_swap_complex
Iterations: 10
Total Cost: $0.74
Total Success Rate (%): ${\color{red} \LARGE \texttt {90.00} \large \texttt { (90/-10)} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`test_send_and_swap_complex`	${\color{red} \large \texttt {90} \normalsize \texttt {(-10)} }$	${\color{red} \large \texttt {9}}$	${\color{red} \large \texttt {1}}$	1.08m	$0.07

Total run time: 10.81 minutes

nerfZael commented 5 months ago

/workflows/benchmarks agents/token/research/test_research.py

github-actions[bot] commented 5 months ago

Finished benchmarks Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token/research/test_research.py
Base path: autotx/tests/agents/token/research/test_research.py::
Iterations: 5
Total Cost: $4.15
Total Success Rate (%): ${\color{red} \LARGE \texttt {96.00} \large \texttt { (96/-2)} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`test_get_top_5_memecoins`	${\color{lightgreen} \large \texttt {100} \normalsize \texttt {(+10)} }$	${\color{lightgreen} \large \texttt {5}}$	${\color{lightgreen} \large \texttt {0}}$	1.16m	$0.20
`test_get_top_5_memecoins_in_optimism`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	1.28m	$0.20
`test_get_top_5_most_traded_tokens_from_l1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	1.04m	$0.19
`test_get_top_5_tokens_from_base`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	1.04m	$0.19
`test_price_change_information`	${\color{red} \large \texttt {80} \normalsize \texttt {(-20)} }$	${\color{red} \large \texttt {4}}$	${\color{red} \large \texttt {1}}$	30s	$0.05

Total run time: 25.10 minutes

nerfZael commented 4 months ago

/workflows/benchmarks agents/token/research/test_research_and_swap.py,agents/token/research/test_advanced.py

github-actions[bot] commented 4 months ago

Finished benchmarks Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token/research/test_research_and_swap.py,./autotx/tests/agents/token/research/test_advanced.py
Base path: autotx/tests/agents/token/research/test_
Iterations: 5
Total Cost: $7.04
Total Success Rate (%): ${\color{lightgreen} \LARGE \texttt {80.00} \large \texttt { (80/+5)} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`advanced.py::test_research_and_swap_many_tokens_subjective_complex`	${\color{lightgreen} \large \texttt {40} \normalsize \texttt {(+30)} }$	${\color{lightgreen} \large \texttt {2}}$	${\color{lightgreen} \large \texttt {3}}$	3.53m	$0.61
`advanced.py::test_research_and_swap_many_tokens_subjective_simple`	${\color{red} \large \texttt {80} \normalsize \texttt {(-10)} }$	${\color{red} \large \texttt {4}}$	${\color{red} \large \texttt {1}}$	2.76m	$0.40
`research_and_swap.py::test_research_and_buy_multiple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	57s	$0.19
`research_and_swap.py::test_research_and_buy_one`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	53s	$0.20

Total run time: 40.53 minutes

nerfZael commented 4 months ago

/workflows/benchmarks agents/token/research/test_research_swap_and_send.py

github-actions[bot] commented 4 months ago

Finished benchmarks Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token/research/test_research_swap_and_send.py
Base path: autotx/tests/agents/token/research/test_research_swap_and_send.py::
Iterations: 5
Total Cost: $3.76
Total Success Rate (%): ${\color{red} \LARGE \texttt {86.67} \large \texttt { (87/-13)} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`test_research_buy_multiple_send_multiple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	1.67m	$0.30
`test_research_buy_one_send_multiple`	${\color{red} \large \texttt {80} \normalsize \texttt {(-20)} }$	${\color{red} \large \texttt {4}}$	${\color{red} \large \texttt {1}}$	1.32m	$0.24
`test_research_buy_one_send_one`	${\color{red} \large \texttt {80} \normalsize \texttt {(-20)} }$	${\color{red} \large \texttt {4}}$	${\color{red} \large \texttt {1}}$	1.21m	$0.22

Total run time: 20.99 minutes

nerfZael commented 4 months ago

/workflows/benchmarks agents/token/research/test_research.py

github-actions[bot] commented 4 months ago

Finished benchmarks Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token/research/test_research.py
Base path: autotx/tests/agents/token/research/test_research.py::
Iterations: 5
Total Cost: $3.41
Total Success Rate (%): ${\color{lightgreen} \LARGE \texttt {100.00} \large \texttt { (100/+2)} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`test_get_top_5_memecoins`	${\color{lightgreen} \large \texttt {100} \normalsize \texttt {(+10)} }$	${\color{lightgreen} \large \texttt {5}}$	${\color{lightgreen} \large \texttt {0}}$	55s	$0.15
`test_get_top_5_memecoins_in_optimism`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	49s	$0.14
`test_get_top_5_most_traded_tokens_from_l1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	50s	$0.15
`test_get_top_5_tokens_from_base`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	59s	$0.16
`test_price_change_information`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	30s	$0.08

Total run time: 20.15 minutes

agentcoinorg / AutoTx

Delegate Research #248

Test Run Summary

Detailed Results

Test Run Summary

Detailed Results

Test Run Summary

Detailed Results

Test Run Summary

Detailed Results

Test Run Summary

Detailed Results

Test Run Summary

Detailed Results