aiverify-foundation / moonshot-data

Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
Apache License 2.0
15 stars 15 forks source link

[MS-356] Benchmarking score bug #78

Closed imda-lionelteo closed 1 month ago

imda-lionelteo commented 1 month ago

Description

When running bbq recipe, the first run and the second run may provide different results. But when running subsequent runs, it gives consistent results.

Motivation and Context

Fix the bug where it gives different results from the first and second result. In the bbq dataset, there is an error where it uses the same dataset and miss out one other dataset.

Type of Change

How to Test

Steps to reproduce:

Start ms in cli: python -m moonshot cli interactive

Run a recipe: run_recipe "my new recipe runner" "['bbq']" "['openai-gpt35-turbo']" -n 1 -r 1 -s "You are an intelligent AI"

Run the same command again and the score may be different from the first run

Checklist

Please check all the boxes that apply to this pull request using "x":

Screenshots (if applicable)

[If the changes involve visual modifications, include screenshots or GIFs that demonstrate the changes.]

Additional Notes

[Add any additional information or context that might be relevant to reviewers.]

Developer Certificate of Origin ``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```
imda-benedictlee commented 1 month ago

Link to Jira Ticket: https://imda-dsl.atlassian.net/browse/MS-356?atlOrigin=eyJpIjoiOTI3NTI4NGFkZjBlNDQ5ODg1Njk0NDc1MWRiOGFmZDYiLCJwIjoiaiJ9