-
I have remotely hosted vllm models. How to evaluate them?
-
* **What is the current behavior?**
The current problem format does not support simulators and surrogates.
* **Describe the solution you'd like**
The problem format should be updated to support s…
-
We need to figure out what metrics we can/should use to evaluate our models, and what data is needed to evaluate them.
Here we probably will make some distinction between evaluations during prototypi…
-
- [ ] [[2310.06770] SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](https://arxiv.org/abs/2310.06770)
# [SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](https://ar…
-
### Publisher
ACM TIST (ACM Transactions on Intelligent Systems and Technology)
### Link to The Paper
https://dl.acm.org/doi/pdf/10.1145/3641289
### Name of The Authors
Yupeng Chang, Xu Wang, Jin…
-
### Describe the feature
Dear OpenCompass Team,
I've encountered a challenge with OpenCompass when trying to evaluate a custom model that I developed. Currently, it seems that any action I want to…
-
Hi, thanks for the great work. I would like to know how to evaluate the generation performance of models. Specifically, I am interested in how to calculate FID and other metrics such as IS, and whethe…
-
Research and evaluate different LLM models (e.g., BERT, RoBERTa, XLNet) for their suitability in the bioinformatics domain.
-> Research and document the strengths and weaknesses of each model. Crea…
-
Optimal cross-validation
optimisation of hyper-parameters
Optimising for confusion matrix, not just accuracy
-
Could you please share the evaluation scripts and prompts that were used to generate the reported results in the paper?
Various parameters are involved in generating outputs, and it is crucial to …