-
## Original Task
Citing from the original course task:
> Training a strong Hebrew Sentence Encoder from a pretrained Decoder While recent years
have brought many additions to the open-source set …
-
Ways to measure how well the model is performing:
Perplexity
Human Eval (optional)
This is on feat/evaluation:
https://github.com/MichiganDataScienceTeam/F24-mini-copilot/tree/feat/evaluation
…
-
Hi, where can I find the evaluation datasets? Couldn't find them on Github.
-
Hi, I noticed an issue with the must_include evaluation logic. While running the WebArena tasks through this repo, I found an inconsistency through task 231’s evaluation.
For task 231, the attempte…
-
Thank you for the great library!
I noticed that the "evaluation metric" section of readme is expected to arrive soon.
1. Do you have timeline when this will be?
2. Could you share what evaluation…
-
### User Story
As a challenge manager, in order to inform what evaluators should evaluate the submissions on, I would like to setup evaluation criteria.
**Acceptance criteria:**
- [x] A challenge m…
-
Hi! Thanks for having this amazing project! Is it possible to open-source the evaluation code? I understand the code depends on [ltu](https://github.com/YuanGongND/ltu/tree/main/src/ltu/eval)
I gen…
-
Hi... I want to evaluation your model on nuscenes on single gpu, but I failed... I get a key error when I'm fetching data.
![image](https://github.com/user-attachments/assets/03b22e7a-a595-42ea-b0e6-…
-
I have updated my transformers into 4.46.0, but the inference speed is extremely slow. Is there any solution?
-
What should we use in HVAC evaluations? @ozanbarism
Like in `scripted_compare_models.py` what should we be asking the LLM and then how to rank results?
https://github.com/bbartling/HvacGPT/blob/d…