Understanding Test Internal Functionality

IroncladDev / llm-arena

Compare LLMs side-by-side

https://llmarena.ai

MIT License

13 stars 1 forks source link

Understanding Test Internal Functionality #85

Open MagicPupu opened 2 months ago

MagicPupu commented 2 months ago

Hello IroncladDev Team,

I would like to use llmarena to test some LLMs and generate performance reports.

However, I am unsure about how the results are calculated. Is the percentage of success for each test dataset quantitative or qualitative? Specifically, is it the percentage of correct answers within the dataset, or the percentage of the precision of its answers?

Thanks in advance, Antoine

IroncladDev commented 2 months ago

Take a look at the Contributor Page, data has to be entered manually and backed up by a source

MagicPupu commented 2 months ago

Thank you for your response.

I wanted to know how the result data is generated and whether you perform any manipulation of this data, or if it comes directly from the benchmarks. Now I understand that I need to look directly into the sources of the benchmarks.

Thank you again, and I will continue my research.

Best regards, Antoine