-
Hello author, I am not able to find the code to confirm the evaluation metric you stated in the paper(FID: 15.73) with your checkpoint. I have tried the FID evaluation codes available online but the r…
-
I propose adding a Model Evaluation and Benchmarking System to ML Nexus to help users assess their model performance on standardized datasets and compare it against benchmarked scores. This feature wo…
-
### Model ID
CohereForAI/aya-expanse-32b
### Model type
Decoder model (e.g., GPT)
### Model languages
- [X] Danish
- [X] Swedish
- [X] Norwegian (Bokmål or Nynorsk)
- [X] Icelandic
- [X] Faroese
…
-
Hi, thanks for your great work! Now I'm trying to evaluate model on VG dataset but meet some problems.
1. Only one file named `vg_stage1_predcls.zip` in the provided link in [Evaluation](https://gith…
-
Thx for your amazing work! I also notice that you haven't provided the **Pretrained model & Evaluation code**. Is there any possible that you would upload them?
Thanks again!
-
Hi,
While trying to run the evaluation on the pretrained model: https://github.com/fgnt/tssep_data/blob/master/egs/libri_css/README.md#steps-to-evaluate-a-pretrained-model
I got this error on t…
-
I tried running some CoT zeroshot evaluations, but they both failed. Am I doing something wrong?
### Command for mmlu_flan_cot_zeroshot
```
accelerate launch \
--multi_gpu \
--num_p…
-
Dear authors,
thank you for the great work in long-context multi-model evaluation. In the code base, I only saw the code for Azure, OpenAI, Gemini, and Anthropic, could you also provide the evalu…
-
**Describe the bug**
while running matrices **Knowledge retention**, getting error. I ensure that this is not all of the LLMTestcases. I am getting correct knowledge retention score for many inputs. …
-
I am trying to evaluate llm4decompile-6.7b-v1.5 using the methods you provided. The model weights were downloaded from the Hugging Face repository of the same name. However, I keep encountering an err…