Typo in RevenuRetrival2

Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

https://agpt.co

Other

168.31k stars 44.4k forks source link

Typo in RevenuRetrival2 #5782

Closed pplonski closed 1 year ago

pplonski commented 1 year ago

⚠️ Search for existing issues first ⚠️

[X] I have searched the existing issues, and there is no existing issue for my problem

Which Operating System are you using?

Linux

Which version of AutoGPT are you using?

Latest Release

Do you use OpenAI GPT-3 or GPT-4?

GPT-3.5

Which area covers your issue best?

Challenges

Describe your issue.

There is a typo in test data for RevenueRetrival2 data. Tests are checking for 2,014 while they should check for 2,013.

https://github.com/Significant-Gravitas/AutoGPT/blob/1eadc64dc0a693c7c9de77ddaef857f3a36f7950/benchmark/agbenchmark/challenges/verticals/scrape/4_revenue_retrieval_2/data.json#L24

Here are sources that reports 2,013 as revenue:

Upload Activity Log Content

No response

Upload Error Log Content

No response

namesty commented 1 year ago

Also found out about this behavior and commented on it here: https://discord.com/channels/1092243196446249134/1097875855214137374/1162518524724523048

Basically:

2013's revenue slightly varies by source. Macrotrends states it's 2,013 million (https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue), but Statista states it's 2,014 million (https://www.statista.com/statistics/272120/revenue-of-tesla/). This variation is enough to make the benchmark fail (the expected value is exactly 2,014 million).

I think these values have some room for approximation error, just like any internet stat, and I think the evaluation should account for this.

SilenNaihin commented 1 year ago

This has been fixed! Thanks for surfacing.

pplonski commented 1 year ago

Hi @SilenNaihin,

Thanks for quick action. I've checked the test and it will still fail. Right now it checks for 2,013 and 2,014. No chances to pass ... It would be nice to check 2,01* but I doubt it is possible.

https://github.com/Significant-Gravitas/AutoGPT/blob/e9b64adae9fce180a392c726457e150177e746fb/benchmark/agbenchmark/challenges/verticals/scrape/4_revenue_retrieval_2/data.json#L24-L25

I don't know how the answer is used but it is not fixed as well https://github.com/Significant-Gravitas/AutoGPT/blob/e9b64adae9fce180a392c726457e150177e746fb/benchmark/agbenchmark/challenges/verticals/scrape/4_revenue_retrieval_2/data.json#L11

SilenNaihin commented 1 year ago

Yup, realized my mistake a couple hours later and rectified. Thanks for surfacing!