-
-
Please review the paper again and understand what they evaluate and why? Metrics like PSNR/SSIM etc'
-
Hi, I am trying to evaluate the model RLHFlow/LLaMA3-iterative-DPO-final with MT Bench. I use the inference environment in ReadME and follow the scripts from https://github.com/lm-sys/FastChat/tree/ma…
-
When I used the pre-trained model 'raphaelsty/neural-cherche-sparse-embed' to evaluate the dataset, specifically, the arguana dataset, with a retrieval k value of 100, the result was very poor
{'map'…
-
Hello
First of all, thank you for publishing this code. I'm having difficulty in evaluating the trained model. Adopting eval.py form SCAN seems not straight forward and I'm not sure whether I've do…
-
Hi I have noticed that there are existing support for some nemo models. But it does not seem that there is a support for MegatronT5 Model. Anyone has ideas how to evaluate this model?
-
GluPredKit refactor for enhanced and strict model evaluation:
Refactor:
- [x] All models should be multiple output
- [x] Create multioutput models
- [x] Scikit models
- [x] …
-
When I evaluate InternLM2-Math-Plus-7b in minif2f through this code, it fails. The model only generates one line "Here is the predicted next tactic:" without any tactics. If I let the model continue g…
-
When i follow the example on this page:
https://docs.confident-ai.com/docs/metrics-introduction
and try to use Mistral-7B as evaluation-model, i always get this error when running the exact code …
-
Hello, I see you added new supported models. Can you provide an evaluation of them on SWE-bench so that it can be compared with the evaluations already done?
Thank you