-
Hi,
I evaluated your 7B model, but the results in new introduced benchmarks are not normal.
-
As a site user with any evaluation role, in order to build my trust using challenge.gov site, I would like to see consistent branding and styling from any page of the site.
Acceptance criteria:
- [ …
-
Thank you for this wonderful work.
I would like to ask whether you set top_p and temperature during the evaluation. Under what settings did you get the MGSM results in your paper? Did you average mu…
-
i think the readme.md has some issues regarding to the evals
i just notice it with piqa, the numbers are too low compared to the actual paper
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
llamafactory-version 0.8.3.dev0
python 3.11.9
AWS EC2 instance
### Reproduction
```
llamafactory…
-
It seems that UniAD and VAD used different eval metric protocal as noted by UAD and PARA-Drive paper
https://openaccess.thecvf.com/content/CVPR2024/papers/Weng_PARA-Drive_Parallelized_Architecture_…
-
when evalute ,there is a mistake: Error while finding module specification for 'eval.eval_h3d_offline' (ModuleNotFoundError: No module named 'eval'), where is the eval file or how to fix this
-
Hi!
I have a basic question, why the whole fine rune does need the eval dataset?
-
Description TBD -- Adding this to keep track
-
It seems that UniAD and VAD used different eval metric protocal as noted by UAD and PARA-Drive paper
https://openaccess.thecvf.com/content/CVPR2024/papers/Weng_PARA-Drive_Parallelized_Architecture_…