Closed nerfZael closed 4 months ago
/workflows/benchmarks agents/token
./autotx/tests/agents/token
autotx/tests/agents/token/
Test Name | Success Rate (%) | Passes | Fails | Avg Time | Avg Cost |
---|---|---|---|---|---|
research/test_advanced.py::test_research_and_swap_many_tokens_subjective_complex |
${\color{lightgreen} \large \texttt {20} \normalsize \texttt {(+10)} }$ | ${\color{lightgreen} \large \texttt {1}}$ | ${\color{lightgreen} \large \texttt {4}}$ | 5.03m | $0.59 |
research/test_advanced.py::test_research_and_swap_many_tokens_subjective_simple |
${\color{lightgreen} \large \texttt {100} \normalsize \texttt {(+10)} }$ | ${\color{lightgreen} \large \texttt {5}}$ | ${\color{lightgreen} \large \texttt {0}}$ | 2.89m | $0.48 |
research/test_research.py:1: |
${\color{yellow} \large \texttt {0} \normalsize \texttt {} }$ | ${\color{yellow} \large \texttt {0}}$ | ${\color{yellow} \large \texttt {5}}$ | 2s | $0.00 |
research/test_research_and_swap.py::test_research_and_buy_multiple |
${\color{red} \large \texttt {0} \normalsize \texttt {(-100)} }$ | ${\color{red} \large \texttt {0}}$ | ${\color{red} \large \texttt {5}}$ | 1.69m | $0.30 |
research/test_research_and_swap.py::test_research_and_buy_one |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 1.13m | $0.21 |
research/test_research_swap_and_send.py::test_research_buy_multiple_send_multiple |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 2.00m | $0.25 |
research/test_research_swap_and_send.py::test_research_buy_one_send_multiple |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 1.71m | $0.24 |
research/test_research_swap_and_send.py::test_research_buy_one_send_one |
${\color{red} \large \texttt {80} \normalsize \texttt {(-20)} }$ | ${\color{red} \large \texttt {4}}$ | ${\color{red} \large \texttt {1}}$ | 1.63m | $0.27 |
send/test_send.py::test_send_erc20 |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 36s | $0.02 |
send/test_send.py::test_send_erc20_parallel |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 39s | $0.03 |
send/test_send.py::test_send_eth_multiple |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 1.05m | $0.04 |
send/test_send.py::test_send_native |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 25s | $0.02 |
send/test_send.py::test_send_native_sequential |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 37s | $0.04 |
test_swap.py::test_swap_complex_1 |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 50s | $0.04 |
test_swap.py::test_swap_complex_2 |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 51s | $0.05 |
test_swap.py::test_swap_multiple_1 |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 42s | $0.03 |
test_swap.py::test_swap_multiple_2 |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 50s | $0.05 |
test_swap.py::test_swap_native |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 30s | $0.02 |
test_swap.py::test_swap_triple |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 49s | $0.03 |
test_swap.py::test_swap_with_non_default_token |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 29s | $0.02 |
test_swap_and_send.py::test_send_and_swap_complex |
${\color{red} \large \texttt {20} \normalsize \texttt {(-80)} }$ | ${\color{red} \large \texttt {1}}$ | ${\color{red} \large \texttt {4}}$ | 49s | $0.05 |
test_swap_and_send.py::test_send_and_swap_simple |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 43s | $0.05 |
test_swap_and_send.py::test_swap_and_send_complex |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 48s | $0.05 |
test_swap_and_send.py::test_swap_and_send_simple |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 1.18m | $0.03 |
Total run time: 139.84 minutes
/workflows/benchmarks agents/token/research/test_research.py
/workflows/benchmarks agents/token/test_swap_and_send.py::test_send_and_swap_complex 10
Finished benchmarks Download artifacts
./autotx/tests/agents/token/test_swap_and_send.py::test_send_and_swap_complex
Test Name | Success Rate (%) | Passes | Fails | Avg Time | Avg Cost |
---|---|---|---|---|---|
test_send_and_swap_complex |
${\color{red} \large \texttt {90} \normalsize \texttt {(-10)} }$ | ${\color{red} \large \texttt {9}}$ | ${\color{red} \large \texttt {1}}$ | 1.08m | $0.07 |
Total run time: 10.81 minutes
/workflows/benchmarks agents/token/research/test_research.py
Finished benchmarks Download artifacts
./autotx/tests/agents/token/research/test_research.py
autotx/tests/agents/token/research/test_research.py::
Test Name | Success Rate (%) | Passes | Fails | Avg Time | Avg Cost |
---|---|---|---|---|---|
test_get_top_5_memecoins |
${\color{lightgreen} \large \texttt {100} \normalsize \texttt {(+10)} }$ | ${\color{lightgreen} \large \texttt {5}}$ | ${\color{lightgreen} \large \texttt {0}}$ | 1.16m | $0.20 |
test_get_top_5_memecoins_in_optimism |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 1.28m | $0.20 |
test_get_top_5_most_traded_tokens_from_l1 |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 1.04m | $0.19 |
test_get_top_5_tokens_from_base |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 1.04m | $0.19 |
test_price_change_information |
${\color{red} \large \texttt {80} \normalsize \texttt {(-20)} }$ | ${\color{red} \large \texttt {4}}$ | ${\color{red} \large \texttt {1}}$ | 30s | $0.05 |
Total run time: 25.10 minutes
/workflows/benchmarks agents/token/research/test_research_and_swap.py,agents/token/research/test_advanced.py
Finished benchmarks Download artifacts
./autotx/tests/agents/token/research/test_research_and_swap.py,./autotx/tests/agents/token/research/test_advanced.py
autotx/tests/agents/token/research/test_
Test Name | Success Rate (%) | Passes | Fails | Avg Time | Avg Cost |
---|---|---|---|---|---|
advanced.py::test_research_and_swap_many_tokens_subjective_complex |
${\color{lightgreen} \large \texttt {40} \normalsize \texttt {(+30)} }$ | ${\color{lightgreen} \large \texttt {2}}$ | ${\color{lightgreen} \large \texttt {3}}$ | 3.53m | $0.61 |
advanced.py::test_research_and_swap_many_tokens_subjective_simple |
${\color{red} \large \texttt {80} \normalsize \texttt {(-10)} }$ | ${\color{red} \large \texttt {4}}$ | ${\color{red} \large \texttt {1}}$ | 2.76m | $0.40 |
research_and_swap.py::test_research_and_buy_multiple |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 57s | $0.19 |
research_and_swap.py::test_research_and_buy_one |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 53s | $0.20 |
Total run time: 40.53 minutes
/workflows/benchmarks agents/token/research/test_research_swap_and_send.py
Finished benchmarks Download artifacts
./autotx/tests/agents/token/research/test_research_swap_and_send.py
autotx/tests/agents/token/research/test_research_swap_and_send.py::
Test Name | Success Rate (%) | Passes | Fails | Avg Time | Avg Cost |
---|---|---|---|---|---|
test_research_buy_multiple_send_multiple |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 1.67m | $0.30 |
test_research_buy_one_send_multiple |
${\color{red} \large \texttt {80} \normalsize \texttt {(-20)} }$ | ${\color{red} \large \texttt {4}}$ | ${\color{red} \large \texttt {1}}$ | 1.32m | $0.24 |
test_research_buy_one_send_one |
${\color{red} \large \texttt {80} \normalsize \texttt {(-20)} }$ | ${\color{red} \large \texttt {4}}$ | ${\color{red} \large \texttt {1}}$ | 1.21m | $0.22 |
Total run time: 20.99 minutes
/workflows/benchmarks agents/token/research/test_research.py
Finished benchmarks Download artifacts
./autotx/tests/agents/token/research/test_research.py
autotx/tests/agents/token/research/test_research.py::
Test Name | Success Rate (%) | Passes | Fails | Avg Time | Avg Cost |
---|---|---|---|---|---|
test_get_top_5_memecoins |
${\color{lightgreen} \large \texttt {100} \normalsize \texttt {(+10)} }$ | ${\color{lightgreen} \large \texttt {5}}$ | ${\color{lightgreen} \large \texttt {0}}$ | 55s | $0.15 |
test_get_top_5_memecoins_in_optimism |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 49s | $0.14 |
test_get_top_5_most_traded_tokens_from_l1 |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 50s | $0.15 |
test_get_top_5_tokens_from_base |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 59s | $0.16 |
test_price_change_information |
${\color{none} \large \texttt {100} \normalsize \texttt {} }$ | ${\color{none} \large \texttt {5}}$ | ${\color{none} \large \texttt {0}}$ | 30s | $0.08 |
Total run time: 20.15 minutes
Closes #244
A new
DelegateResearchTokensAgent
is implemented. It has 1 tool:research
With this tool, it can trigger a sub chat between a new user proxy and the old research agent. They will then work on solving the research tasks.This improves the capability of research tasks, as well as minimizes confusion that the other agents in the group chat can have from viewing research related messages.
However, the execution speed is significantly lower because of a couple of reasons:
DelegateResearchTokensAgent
can be too liberal and sometime use more than one tool callFurther execution speed improvement could be done by running the multiple tool calls (if made) in parallel, but I have not seen an easy way to do that