-
https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#14-appendix-benchmark-specific-rules
Here, it is stated that feature caching is not allowed. What is the definition of…
-
### Your current environment
v0.5.2. vLLM env is not an issue so I will just skip the collection process
### 🐛 Describe the bug
I am running benchmark tests and notice one potential problem. …
-
Hello,
First of all, thank you for releasing the code for your great paper. I am currently looking into benchmarking SFBC on the datasets from the paper "Learning to simulate complex physics with g…
-
### Dataset name
hogskoleprovet
### Dataset link
https://www.hogskoleprovet.nu/gamla-hogskoleprov/
### Dataset languages
- [ ] Danish
- [X] Swedish
- [ ] Norwegian (Bokmål or Nynorsk)…
-
Hello, I would like to know some details about the performance comparison on the Visual CoT benchmark (Table 3) mentioned in the paper. For the datasets that are not highlighted in grey (such as Infog…
-
hello i would like to know if the weights of this implementation have been released, or if it has been implemented in any system. the paper is interesting.
are there any benchmarks with other exist…
-
Hello guys, thankyou for the amazing benchmarking work, I was curious as to when the train split of the dataset will be made public
-
We need to gather a wide array of count datasets to run experiments on to demonstrate the effectiveness of our proposed count regression methods. These should include:
- [ ] Multiple tabular datase…
-
Hi! Appreciate for the great dataset!
I tried to do evaluation for Mathvisita with Multimath-7B according to the instruction. And I noticed that '-use_multimath_prompt' is not used in the default Mul…
-
Hey folks!
I saw that someone asked the same question yesterday on the mailinglist, but nobody has answered so I thought I bring it here since I'm running into the same issue.
When I try to run …