-
The following test results in an "unexpected keyword" error:
```
from datasets import load_dataset
from instruction_following_eval import get_examples, evaluate_instruction_following
dataset =…
-
## Describe the bug
Prefacing this with saying I'm pretty new to nix so apologies if I've missed something obvious or this isn't the correct place for this issue.
I'm using nix-darwin. After…
-
Thanks for your brilliant work diffree!!!
I am interested in the process of dataset collection and the implementation process of the proposed evaluation metrics.
Can you share your email so that I …
-
Hi, thanks for your excellent work and code release. According to the README file, it is required to update the scenario .json file and route .xml file accordingly. However, there is no folder of lea…
-
I set the HOME to ""
```rust
//build.rs:14
let home_dir = "";
```
Then run cargo test:
```powershell
PS C:\Users\kiwi\rust\tesseract-rs> cargo test
Compiling tesseract-rs v0.1.18 (C:\Users\kiw…
-
-> Currently wildbench seems to take a long time for evaluation, can we add a variable to add num_workers for openai calls?
Also need some caching on results, so that I can run different evaluators…
-
Hi, this is an interesting work! Could you show more details and the code about your evaluation metrics? especially FDS and DS. Thanks.
-
Evaluate Mailchimp compared to other email management systems
-
### Describe the issue
Issue:
errors in MME evaluation
Command:
```
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh
```
Log:
```
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED …
-
Hi,
In 2.Pretrain_regenerator.py, there is only the process of training on the pretrain dataset for a fixed number of epochs (40 by default).
However, without the evaluation process in training, the…