-
## Evaluation short description
- Why is this evaluation interesting?
RewardBench is perhaps the _only_ evaluation suite to provide broad coverage of the strengths and weaknesses of reward models …
-
### Description
The goal is to develop a Tibetan text-to-speech (TTS) model that can convert Tibetan text into Tibetan speech. This project involves training a TTS model using filtered good audio qual…
-
### Developing Jupyter Notebooks / ESMValTool Recipes for ENSO Evaluation
The Model Evaluation and Diagnostics team at ACCESS-NRI (@flicj191, @rbeucher) is currently working on developing Jupyter n…
-
Any plan to support the latest Qwen2-VL model evaluation?
-
你好,可以提供一下对fivek数据集的evaluation代码以及训练好的模型吗?
-
Nice work! Shall we have the pre-trained models/weights for evaluation purposes also?
Thanks!
-
When I evaluate InternLM2-Math-Plus-7b in minif2f through this code, it fails. The model only generates one line "Here is the predicted next tactic:" without any tactics. If I let the model continue g…
-
Description:
Right now, the project focuses on specific machine learning models. We would like to add support for more models, such as Random Forest, K-Nearest Neighbors, or SVM, for donor segmenta…
-
### Question
I want to pretrain the model, but I see that the evaluation_strategy in pretrain.sh is set to "no." How can I determine if the model is trained well?
-
Hi, I meet the following error when I evaluate on AGIEval with num_fewshot=3. However, everything works normally with 0-shot.
```
2024-09-19:13:03:47,189 DEBUG [cache.py:33] requests-agieval_je…