EveripediaNetwork / issues

Issues repo
9 stars 0 forks source link

Create a manually-curated 10-sample test split to evaluate already-trained EVMind LLMs #3002

Closed brunneis closed 1 month ago

brunneis commented 1 month ago

Current EVMind models have been trained with the initial Solidity dataset in full. In order to run evaluations without retraining, the dataset must be extended with a test split. This is also a good opportunity to manually select a limited number of smart contracts that could correlate with our expectations of EVMind for code generation.

brunneis commented 1 month ago

New collection with datasets to evaluate Solidity LLMs: https://huggingface.co/collections/braindao/iq-code-solbench-66b3c20b6ebb7b77e6643c41

For the NaïveJudge evaluation (LLM Judge):