Closed pplonski closed 1 year ago
Also found out about this behavior and commented on it here: https://discord.com/channels/1092243196446249134/1097875855214137374/1162518524724523048
Basically:
2013's revenue slightly varies by source. Macrotrends states it's 2,013 million (https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue), but Statista states it's 2,014 million (https://www.statista.com/statistics/272120/revenue-of-tesla/). This variation is enough to make the benchmark fail (the expected value is exactly 2,014 million).
I think these values have some room for approximation error, just like any internet stat, and I think the evaluation should account for this.
This has been fixed! Thanks for surfacing.
Hi @SilenNaihin,
Thanks for quick action. I've checked the test and it will still fail. Right now it checks for 2,013
and 2,014
. No chances to pass ... It would be nice to check 2,01*
but I doubt it is possible.
I don't know how the answer
is used but it is not fixed as well
https://github.com/Significant-Gravitas/AutoGPT/blob/e9b64adae9fce180a392c726457e150177e746fb/benchmark/agbenchmark/challenges/verticals/scrape/4_revenue_retrieval_2/data.json#L11
Yup, realized my mistake a couple hours later and rectified. Thanks for surfacing!
⚠️ Search for existing issues first ⚠️
Which Operating System are you using?
Linux
Which version of AutoGPT are you using?
Latest Release
Do you use OpenAI GPT-3 or GPT-4?
GPT-3.5
Which area covers your issue best?
Challenges
Describe your issue.
There is a typo in test data for RevenueRetrival2 data. Tests are checking for
2,014
while they should check for2,013
.https://github.com/Significant-Gravitas/AutoGPT/blob/1eadc64dc0a693c7c9de77ddaef857f3a36f7950/benchmark/agbenchmark/challenges/verticals/scrape/4_revenue_retrieval_2/data.json#L24
Here are sources that reports
2,013
as revenue:Upload Activity Log Content
No response
Upload Error Log Content
No response