Closed abzb1 closed 6 months ago
I think the error is caused at here (or anywhere else) line 9: doc_to_target: "{{multiple_choice_targets.index(targets[0])}}" I think the index method raises the ValueError
I'm looking at https://huggingface.co/datasets/hails/bigbench maybe this repo can shed light? lol
Oh, I found there are some subtasks that are not fit in the format. I did not conduct a complete survey, but some of the data from above and the data I encountered problems with are as follows.
strategyqa_zero_shot
ascii_word_recognition_zero_shot auto_categorization_zero_shot auto_debugging_zero_shot bridging_anaphora_resolution_barqa_zero_shot chess_state_tracking_zero_shot?row=2 chinese_remainder_theorem_zero_shot codenames_zero_shot conlang_translation_zero_shot cryptonite_zero_shot disfl_qa_zero_shot few_shot_nlg_zero_shot gem_zero_shot hindi_question_answering_zero_shot ...
Someone who want to evaluate big bench multiple choice should carefully watch the data whether it supports or not 😃
Hello,
I'm trying to evaluate some hf🤗 models on lm-eval. When I use the "bigbench_multiple_choice" task, I encounter a ValueError in certain subtasks. I'd appreciate help with resolving this.
below is my script
lm_eval --model hf \ --model_args pretrained=allenai/OLMo-7B,trust_remote_code=true \ --tasks bigbench_multiple_choice \ --device cuda:0 \ --batch_size auto \ --log_samples \ --output_path logit_result
I also tried with some other models(llama 3, mistral ), but it still makes error like "Task: bigbench_conlang_translation_multiple_choice] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended. .... File "", line 1, in top-level template code ValueError: 'The teacher carries taro here.' is not in list"
However, when I select only one specific task, for example, "bigbench_implicit_relations_multiple_choice," it runs without any issues. I suspect the error might be occurring during the task configuration stage. Do you have any ideas?
my environment is like below with RTX A6000 GPU, UBUNTU 22.04 absl-py 2.1.0 accelerate 0.29.3 aiohttp 3.9.5 aiosignal 1.3.1 attrs 23.2.0 certifi 2024.2.2 chardet 5.2.0 charset-normalizer 3.3.2 click 8.1.7 colorama 0.4.6 DataProperty 1.0.1 datasets 2.18.0 dill 0.3.8 evaluate 0.4.1 filelock 3.13.4 frozenlist 1.4.1 fsspec 2024.2.0 huggingface-hub 0.22.2 idna 3.7 Jinja2 3.1.3 joblib 1.4.0 jsonlines 4.0.0 lm_eval 0.4.2 /home/ohs/eval/lm-evaluation-harness lxml 5.2.1 MarkupSafe 2.1.5 mbstrdecoder 1.1.3 more-itertools 10.2.0 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 networkx 3.3 nltk 3.8.1 numexpr 2.10.0 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.19.3 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.1.105 packaging 24.0 pandas 2.2.2 pathvalidate 3.2.0 peft 0.10.0 pip 24.0 portalocker 2.8.2 psutil 5.9.8 pyarrow 15.0.2 pyarrow-hotfix 0.6 pybind11 2.12.0 pytablewriter 1.2.0 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.1 regex 2024.4.16 requests 2.31.0 responses 0.18.0 rouge-score 0.1.2 sacrebleu 2.4.2 safetensors 0.4.3 scikit-learn 1.4.2 scipy 1.13.0 setuptools 65.5.0 six 1.16.0 sqlitedict 2.1.0 sympy 1.12 tabledata 1.3.3 tabulate 0.9.0 tcolorpy 0.1.4 threadpoolctl 3.4.0 tokenizers 0.19.1 torch 2.2.2 tqdm 4.66.2 tqdm-multiprocess 0.0.11 transformers 4.41.0.dev0 /home/ohs/eval/transformers triton 2.2.0 typepy 1.3.2 typing_extensions 4.11.0 tzdata 2024.1 urllib3 2.2.1 word2number 1.1 xxhash 3.4.1 yarl 1.9.4 zstandard 0.22.0`