-
But even when I set -dl nl the model seems to be finetuning, is this a bug @saattrupdan ?
_Originally posted by @usarth in https://github.com/ScandEval/ScandEval/issues/35#issuecomment-2336742…
-
### Model ID
google/gemma-2-27b-it
### Model type
Decoder model (e.g., GPT)
### Model languages
- [x] Danish
- [x] Swedish
- [x] Norwegian (Bokmål or Nynorsk)
- [x] Icelandic
- [x] Faroese
- [x] …
-
After reading your paper, I understand that you used a two-stage training process: the first stage involved training with Pile-NER, and the second stage involved fine-tuning with AnatEM and 17 other d…
-
Hello
First of all, thank you for publishing this code. I'm having difficulty in evaluating the trained model. Adopting eval.py form SCAN seems not straight forward and I'm not sure whether I've do…
-
Not sure if this feature belongs to this library or would it require a complete separate library. I am proposing the creation of a library where llm benchmarks can be ran. For example, evaluating a mo…
-
Hi, regarding the evaluation process mentioned in the article, I have some doubts. Is it first to use the GPT generation model to generate t-SMILES sequences, and then to reconstruct molecules based o…
-
`lm_eval --model hf --model_args pretrained=speakleash/Bielik-11B-v2.3-Instruct,dtype=bfloat16,max_length=2048,truncation=True,normalize_log_probs=False,trust_remote_code=True,truncat
e_strategy=leav…
-
-
![image](https://github.com/user-attachments/assets/e2ea1157-51af-4e48-81de-0e0fb5d0d57b)
-
When i follow the example on this page:
https://docs.confident-ai.com/docs/metrics-introduction
and try to use Mistral-7B as evaluation-model, i always get this error when running the exact code …