-
Hi,
Great work!
Did you use the same prompt for all models evaluated on DREAM-1k?
If not, what prompts did you use for different models?
-
Hello, I used coco dataset when I was using model fine-tuning. I modified metric=['bbox', 'segm'] in val_evaluator according to the mmengine user manual, and the metric of bbox was obtained during eva…
-
Traceback (most recent call last):
File "evaluation.py", line 121, in
val_scores, val_samples = evaluate_metrics(model, dataloader_val, text_field)
File "evaluation.py", line 50, in evalua…
-
I noticed that NanoBeir takes in an empty dataset and returns no error during instantiation. But when the model gets passed, the error kinda seems confusing.
So -
1. is it fine to allow the evaluato…
-
### Describe the bug
In PyKEEN, when "create_inverse_triples" is set to false for ConvE, the evaluation becomes extremely slow.
![image](https://github.com/user-attachments/assets/bc6b96f1-f7bb-42…
g-yit updated
2 weeks ago
-
below evaluation is vllm settings for llama3.2 evaluation
```
lm_eval --model vllm \
--model_args pretrained=/home/jovyan/data-vol-1/models/meta-llama__Llama3.2-1B-Instruct,dtype=auto,gpu_m…
-
I want to be able to perform post-evaluation query filtering after evaluating a model on a retrieval benchmark. In other words, after evaluation is ran I want to be able to select a subset of the test…
-
### Model ID
google/gemma-2-27b
### Model type
Decoder model (e.g., GPT)
### Model languages
- [x] Danish
- [x] Swedish
- [x] Norwegian (Bokmål or Nynorsk)
- [x] Icelandic
- [x] Faroese
- [x] Ger…
-
系统版本:Ubuntu 22.04.3 LTS
colab上面测试和本地出现的错误是一样的
torch: 2.5.0+cu121
onnxruntime-gpu 1.20.0
执行python3 evaluate.py --landlord perfectdou --landlord_up douzero --landlord_down douzero
出现Impor…
-
Very wonderful work.
I notice that swe-bench evaluation requires files including
```
eval.sh: The evaluation script
patch.diff: The model's generated prediction
report.json: Summary of evaluatio…