embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.98k stars 276 forks source link

Question/Bug? Double check functionality #602

Closed austinmw closed 5 months ago

austinmw commented 7 months ago

Hi, I tried to run the following:

from mteb import MTEB
from sentence_transformers import SentenceTransformer

evaluation = MTEB()
task_names = [t.metadata_dict["name"] for t in MTEB(task_langs=['en']).tasks]

model_name = "BAAI/bge-small-en"
model = SentenceTransformer(model_name)

for task in task_names:
    evaluation = MTEB(tasks=[task], task_langs=['en'], eval_splits=["test" if task not in ['MSMARCO'] else 'dev'])
    evaluation.run(model, output_folder=f"en_results/{model_name.split('/')[-1]}")

And got the following error:

BitextMining
    - BibleNLPBitextMining, s2s, crosslingual 1 / 1657 pairs

Error while evaluating BibleNLPBitextMining: BuilderConfig 'eng-eng' not found. Available: ['eng-aai', 'eng-aak', 'eng-aau', 'eng-aaz', 'eng-abt', 'eng-abx', 'eng-aby', 'eng-acf', 'eng-acr', 'eng-acu', 'eng-adz', 'eng-aer', 'eng-aey', 'eng-agd', 'eng-agg', 'eng-agm', 'eng-agn', 'eng-agr', 'eng-agt', 'eng-agu', 'eng-aia', 'eng-aii', 'eng-aka', 'eng-ake', 'eng-alp', 'eng-alq', 'eng-als', 'eng-aly', 'eng-ame', 'eng-amf', 'eng-amk', 'eng-amm', 'eng-amn', 'eng-amo', 'eng-amp', 'eng-amr', 'eng-amu', 'eng-amx', 'eng-anh', 'eng-anv', 'eng-aoi', 'eng-aoj', 'eng-aom', 'eng-aon', 'eng-apb', 'eng-ape', 'eng-apn', 'eng-apr', 'eng-apu', 'eng-apw', 'eng-apz', 'eng-arb', 'eng-are', 'eng-arl', 'eng-arn', 'eng-arp', 'eng-asm', 'eng-aso', 'eng-ata', 'eng-atb', 'eng-atd', 'eng-atg', 'eng-att', 'eng-auc', 'eng-aui', 'eng-auy', 'eng-avt', 'eng-awb', 'eng-awk', 'eng-awx', 'eng-azb', 'eng-azg', 'eng-azz', 'eng-bao', 'eng-bba', 'eng-bbb', 'eng-bbr', 'eng-bch', 'eng-bco', 'eng-bdd', 'eng-bea', 'eng-bef', 'eng-bel', 'eng-ben', 'eng-beo', 'eng-beu', 'eng-bgs', 'eng-bgt', 'eng-bhg', 'eng-bhl', 'eng-big', 'eng-bjk', 'eng-bjp', 'eng-bjr', 'eng-bjv', 'eng-bjz', 'eng-bkd', 'eng-bki', 'eng-bkq', 'eng-bkx', 'eng-blw', 'eng-blz', 'eng-bmh', 'eng-bmk', 'eng-bmr', 'eng-bmu', 'eng-bnp', 'eng-boa', 'eng-boj', 'eng-bon', 'eng-box', 'eng-bpr', 'eng-bps', 'eng-bqc', 'eng-bqp', 'eng-bre', 'eng-bsj', 'eng-bsn', 'eng-bsp', 'eng-bss', 'eng-buk', 'eng-bus', 'eng-bvd', 'eng-bvr', 'eng-bxh', 'eng-byr', 'eng-byx', 'eng-bzd', 'eng-bzh', 'eng-bzj', 'eng-caa', 'eng-cab', 'eng-cac', 'eng-caf', 'eng-cak', 'eng-cao', 'eng-cap', 'eng-car', 'eng-cav', 'eng-cax', 'eng-cbc', 'eng-cbi', 'eng-cbk', 'eng-cbr', 'eng-cbs', 'eng-cbt', 'eng-cbu', 'eng-cbv', 'eng-cco', 'eng-ceb', 'eng-cek', 'eng-ces', 'eng-cgc', 'eng-cha', 'eng-chd', 'eng-chf', 'eng-chk', 'eng-chq', 'eng-chz', 'eng-cjo', 'eng-cjv', 'eng-ckb', 'eng-cle', 'eng-clu', 'eng-cme', 'eng-cmn', 'eng-cni', 'eng-cnl', 'eng-cnt', 'eng-cof', 'eng-con', 'eng-cop', 'eng-cot', 'eng-cpa', 'eng-cpb', 'eng-cpc', 'eng-cpu', 'eng-cpy', 'eng-crn', 'eng-crx', 'eng-cso', 'eng-csy', 'eng-cta', 'eng-cth', 'eng-ctp', 'eng-ctu', 'eng-cub', 'eng-cuc', 'eng-cui', 'eng-cuk', 'eng-cut', 'eng-cux', 'eng-cwe', 'eng-cya', 'eng-daa', 'eng-dad', 'eng-dah', 'eng-dan', 'eng-ded', 'eng-deu', 'eng-dgc', 'eng-dgr', 'eng-dgz', 'eng-dhg', 'eng-dif', 'eng-dik', 'eng-dji', 'eng-djk', 'eng-djr', 'eng-dob', 'eng-dop', 'eng-dov', 'eng-dwr', 'eng-dww', 'eng-dwy', 'eng-ebk', 'eng-eko', 'eng-emi', 'eng-emp', 'eng-enq', 'eng-epo', 'eng-eri', 'eng-ese', 'eng-esk', 'eng-etr', 'eng-ewe', 'eng-faa', 'eng-fai', 'eng-far', 'eng-ffm', 'eng-for', 'eng-fra', 'eng-fue', 'eng-fuf', 'eng-fuh', 'eng-gah', 'eng-gai', 'eng-gam', 'eng-gaw', 'eng-gdn', 'eng-gdr', 'eng-geb', 'eng-gfk', 'eng-ghs', 'eng-glk', 'eng-gmv', 'eng-gng', 'eng-gnn', 'eng-gnw', 'eng-gof', 'eng-grc', 'eng-gub', 'eng-guh', 'eng-gui', 'eng-guj', 'eng-gul', 'eng-gum', 'eng-gun', 'eng-guo', 'eng-gup', 'eng-gux', 'eng-gvc', 'eng-gvf', 'eng-gvn', 'eng-gvs', 'eng-gwi', 'eng-gym', 'eng-gyr', 'eng-hat', 'eng-hau', 'eng-haw', 'eng-hbo', 'eng-hch', 'eng-heb', 'eng-heg', 'eng-hin', 'eng-hix', 'eng-hla', 'eng-hlt', 'eng-hmo', 'eng-hns', 'eng-hop', 'eng-hot', 'eng-hrv', 'eng-hto', 'eng-hub', 'eng-hui', 'eng-hun', 'eng-hus', 'eng-huu', 'eng-huv', 'eng-hvn', 'eng-ian', 'eng-ign', 'eng-ikk', 'eng-ikw', 'eng-ilo', 'eng-imo', 'eng-inb', 'eng-ind', 'eng-ino', 'eng-iou', 'eng-ipi', 'eng-isn', 'eng-ita', 'eng-iws', 'eng-ixl', 'eng-jac', 'eng-jae', 'eng-jao', 'eng-jic', 'eng-jid', 'eng-jiv', 'eng-jni', 'eng-jpn', 'eng-jvn', 'eng-kan', 'eng-kaq', 'eng-kbc', 'eng-kbh', 'eng-kbm', 'eng-kbq', 'eng-kdc', 'eng-kde', 'eng-kdl', 'eng-kek', 'eng-ken', 'eng-kew', 'eng-kgf', 'eng-kgk', 'eng-kgp', 'eng-khs', 'eng-khz', 'eng-kik', 'eng-kiw', 'eng-kiz', 'eng-kje', 'eng-kjs', 'eng-kkc', 'eng-kkl', 'eng-klt', 'eng-klv', 'eng-kmg', 'eng-kmh', 'eng-kmk', 'eng-kmo', 'eng-kms', 'eng-kmu', 'eng-kne', 'eng-knf', 'eng-knj', 'eng-knv', 'eng-kos', 'eng-kpf', 'eng-kpg', 'eng-kpj', 'eng-kpr', 'eng-kpw', 'eng-kpx', 'eng-kqa', 'eng-kqc', 'eng-kqf', 'eng-kql', 'eng-kqw', 'eng-ksd', 'eng-ksj', 'eng-ksr', 'eng-ktm', 'eng-kto', 'eng-kud', 'eng-kue', 'eng-kup', 'eng-kvg', 'eng-kvn', 'eng-kwd', 'eng-kwf', 'eng-kwi', 'eng-kwj', 'eng-kyc', 'eng-kyf', 'eng-kyg', 'eng-kyq', 'eng-kyz', 'eng-kze', 'eng-lac', 'eng-lat', 'eng-lbb', 'eng-lbk', 'eng-lcm', 'eng-leu', 'eng-lex', 'eng-lgl', 'eng-lid', 'eng-lif', 'eng-lin', 'eng-lit', 'eng-llg', 'eng-lug', 'eng-luo', 'eng-lww', 'eng-maa', 'eng-maj', 'eng-mal', 'eng-mam', 'eng-maq', 'eng-mar', 'eng-mau', 'eng-mav', 'eng-maz', 'eng-mbb', 'eng-mbc', 'eng-mbh', 'eng-mbj', 'eng-mbl', 'eng-mbs', 'eng-mbt', 'eng-mca', 'eng-mcb', 'eng-mcd', 'eng-mcf', 'eng-mco', 'eng-mcp', 'eng-mcq', 'eng-mcr', 'eng-mdy', 'eng-med', 'eng-mee', 'eng-mek', 'eng-meq', 'eng-met', 'eng-meu', 'eng-mgc', 'eng-mgh', 'eng-mgw', 'eng-mhl', 'eng-mib', 'eng-mic', 'eng-mie', 'eng-mig', 'eng-mih', 'eng-mil', 'eng-mio', 'eng-mir', 'eng-mit', 'eng-miz', 'eng-mjc', 'eng-mkj', 'eng-mkl', 'eng-mkn', 'eng-mks', 'eng-mle', 'eng-mlh', 'eng-mlp', 'eng-mmo', 'eng-mmx', 'eng-mna', 'eng-mop', 'eng-mox', 'eng-mph', 'eng-mpj', 'eng-mpm', 'eng-mpp', 'eng-mps', 'eng-mpt', 'eng-mpx', 'eng-mqb', 'eng-mqj', 'eng-msb', 'eng-msc', 'eng-msk', 'eng-msm', 'eng-msy', 'eng-mti', 'eng-mto', 'eng-mux', 'eng-muy', 'eng-mva', 'eng-mvn', 'eng-mwc', 'eng-mwe', 'eng-mwf', 'eng-mwp', 'eng-mxb', 'eng-mxp', 'eng-mxq', 'eng-mxt', 'eng-mya', 'eng-myk', 'eng-myu', 'eng-myw', 'eng-myy', 'eng-mzz', 'eng-nab', 'eng-naf', 'eng-nak', 'eng-nas', 'eng-nbq', 'eng-nca', 'eng-nch', 'eng-ncj', 'eng-ncl', 'eng-ncu', 'eng-ndg', 'eng-ndj', 'eng-nfa', 'eng-ngp', 'eng-ngu', 'eng-nhe', 'eng-nhg', 'eng-nhi', 'eng-nho', 'eng-nhr', 'eng-nhu', 'eng-nhw', 'eng-nhy', 'eng-nif', 'eng-nii', 'eng-nin', 'eng-nko', 'eng-nld', 'eng-nlg', 'eng-nna', 'eng-nnq', 'eng-noa', 'eng-nop', 'eng-not', 'eng-nou', 'eng-npi', 'eng-npl', 'eng-nsn', 'eng-nss', 'eng-ntj', 'eng-ntp', 'eng-ntu', 'eng-nuy', 'eng-nvm', 'eng-nwi', 'eng-nya', 'eng-nys', 'eng-nyu', 'eng-obo', 'eng-okv', 'eng-omw', 'eng-ong', 'eng-ons', 'eng-ood', 'eng-opm', 'eng-ory', 'eng-ote', 'eng-otm', 'eng-otn', 'eng-otq', 'eng-ots', 'eng-pab', 'eng-pad', 'eng-pah', 'eng-pan', 'eng-pao', 'eng-pes', 'eng-pib', 'eng-pio', 'eng-pir', 'eng-piu', 'eng-pjt', 'eng-pls', 'eng-plu', 'eng-pma', 'eng-poe', 'eng-poh', 'eng-poi', 'eng-pol', 'eng-pon', 'eng-por', 'eng-poy', 'eng-ppo', 'eng-prf', 'eng-pri', 'eng-ptp', 'eng-ptu', 'eng-pwg', 'eng-qub', 'eng-quc', 'eng-quf', 'eng-quh', 'eng-qul', 'eng-qup', 'eng-qvc', 'eng-qve', 'eng-qvh', 'eng-qvm', 'eng-qvn', 'eng-qvs', 'eng-qvw', 'eng-qvz', 'eng-qwh', 'eng-qxh', 'eng-qxn', 'eng-qxo', 'eng-rai', 'eng-reg', 'eng-rgu', 'eng-rkb', 'eng-rmc', 'eng-rmy', 'eng-ron', 'eng-roo', 'eng-rop', 'eng-row', 'eng-rro', 'eng-ruf', 'eng-rug', 'eng-rus', 'eng-rwo', 'eng-sab', 'eng-san', 'eng-sbe', 'eng-sbk', 'eng-sbs', 'eng-seh', 'eng-sey', 'eng-sgb', 'eng-sgz', 'eng-shj', 'eng-shp', 'eng-sim', 'eng-sja', 'eng-sll', 'eng-smk', 'eng-snc', 'eng-snn', 'eng-snp', 'eng-snx', 'eng-sny', 'eng-som', 'eng-soq', 'eng-soy', 'eng-spa', 'eng-spl', 'eng-spm', 'eng-spp', 'eng-sps', 'eng-spy', 'eng-sri', 'eng-srm', 'eng-srn', 'eng-srp', 'eng-srq', 'eng-ssd', 'eng-ssg', 'eng-ssx', 'eng-stp', 'eng-sua', 'eng-sue', 'eng-sus', 'eng-suz', 'eng-swe', 'eng-swh', 'eng-swp', 'eng-sxb', 'eng-tac', 'eng-taj', 'eng-tam', 'eng-tav', 'eng-taw', 'eng-tbc', 'eng-tbf', 'eng-tbg', 'eng-tbo', 'eng-tbz', 'eng-tca', 'eng-tcs', 'eng-tcz', 'eng-tdt', 'eng-tee', 'eng-tel', 'eng-ter', 'eng-tet', 'eng-tew', 'eng-tfr', 'eng-tgk', 'eng-tgl', 'eng-tgo', 'eng-tgp', 'eng-tha', 'eng-tif', 'eng-tim', 'eng-tiw', 'eng-tiy', 'eng-tke', 'eng-tku', 'eng-tlf', 'eng-tmd', 'eng-tna', 'eng-tnc', 'eng-tnk', 'eng-tnn', 'eng-tnp', 'eng-toc', 'eng-tod', 'eng-tof', 'eng-toj', 'eng-ton', 'eng-too', 'eng-top', 'eng-tos', 'eng-tpa', 'eng-tpi', 'eng-tpt', 'eng-tpz', 'eng-trc', 'eng-tsw', 'eng-ttc', 'eng-tte', 'eng-tuc', 'eng-tue', 'eng-tuf', 'eng-tuo', 'eng-tur', 'eng-tvk', 'eng-twi', 'eng-txq', 'eng-txu', 'eng-tzj', 'eng-tzo', 'eng-ubr', 'eng-ubu', 'eng-udu', 'eng-uig', 'eng-ukr', 'eng-uli', 'eng-ulk', 'eng-upv', 'eng-ura', 'eng-urb', 'eng-urd', 'eng-uri', 'eng-urt', 'eng-urw', 'eng-usa', 'eng-usp', 'eng-uvh', 'eng-uvl', 'eng-vid', 'eng-vie', 'eng-viv', 'eng-vmy', 'eng-waj', 'eng-wal', 'eng-wap', 'eng-wat', 'eng-wbi', 'eng-wbp', 'eng-wed', 'eng-wer', 'eng-wim', 'eng-wiu', 'eng-wiv', 'eng-wmt', 'eng-wmw', 'eng-wnc', 'eng-wnu', 'eng-wol', 'eng-wos', 'eng-wrk', 'eng-wro', 'eng-wrs', 'eng-wsk', 'eng-wuv', 'eng-xav', 'eng-xbi', 'eng-xed', 'eng-xla', 'eng-xnn', 'eng-xon', 'eng-xsi', 'eng-xtd', 'eng-xtm', 'eng-yaa', 'eng-yad', 'eng-yal', 'eng-yap', 'eng-yaq', 'eng-yby', 'eng-ycn', 'eng-yka', 'eng-yle', 'eng-yml', 'eng-yon', 'eng-yor', 'eng-yrb', 'eng-yre', 'eng-yss', 'eng-yuj', 'eng-yut', 'eng-yuw', 'eng-yva', 'eng-zaa', 'eng-zab', 'eng-zac', 'eng-zad', 'eng-zai', 'eng-zaj', 'eng-zam', 'eng-zao', 'eng-zap', 'eng-zar', 'eng-zas', 'eng-zat', 'eng-zav', 'eng-zaw', 'eng-zca', 'eng-zga', 'eng-zia', 'eng-ziw', 'eng-zlm', 'eng-zos', 'eng-zpc', 'eng-zpl', 'eng-zpm', 'eng-zpo', 'eng-zpq', 'eng-zpu', 'eng-zpv', 'eng-zpz', 'eng-zsr', 'eng-ztq', 'eng-zty', 'eng-zyp']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], [line 12](vscode-notebook-cell:?execution_count=1&line=12)
     [10](vscode-notebook-cell:?execution_count=1&line=10) for task in task_names:
     [11](vscode-notebook-cell:?execution_count=1&line=11)     evaluation = MTEB(tasks=[task], task_langs=['en'], eval_splits=["test" if task not in ['MSMARCO'] else 'dev'])
---> [12](vscode-notebook-cell:?execution_count=1&line=12)     evaluation.run(model, output_folder=f"en_results/{model_name.split('/')[-1]}")

File /opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:328, in MTEB.run(self, model, verbosity, output_folder, eval_splits, overwrite_results, raise_error, **kwargs)
    [324](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:324) logger.error(
    [325](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:325)     f"Error while evaluating {task.metadata_dict['name']}: {e}"
    [326](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:326) )
    [327](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:327) if raise_error:
--> [328](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:328)     raise e
    [329](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:329) logger.error(
    [330](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:330)     f"Please check all the error logs at: {self.err_logs_path}"
    [331](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:331) )
    [332](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:332) with open(self.err_logs_path, "a") as f_out:

File /opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:292, in MTEB.run(self, model, verbosity, output_folder, eval_splits, overwrite_results, raise_error, **kwargs)
    [290](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:290) logger.info(f"Loading dataset for {task.metadata_dict['name']}")
    [291](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:291) task.check_if_dataset_is_superseeded()
--> [292](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:292) task.load_data(eval_splits=task_eval_splits, **kwargs)
    [294](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:294) # run evaluation
    [295](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:295) task_results = {
    [296](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:296)     "mteb_version": version("mteb"),  # noqa: F405
    [297](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:297)     "dataset_revision": task.metadata_dict["dataset"].get(
   (...)
    [300](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:300)     "mteb_dataset_name": task.metadata_dict["name"],
    [301](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/evaluation/MTEB.py:301) }

File /opt/conda/lib/python3.10/site-packages/mteb/tasks/BitextMining/multilingual/BibleNLPBitextMining.py:913, in BibleNLPBitextMining.load_data(self, **kwargs)
    [911](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/tasks/BitextMining/multilingual/BibleNLPBitextMining.py:911)     self.dataset[lang] = self.dataset[self._swap_substrings(lang)]
    [912](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/tasks/BitextMining/multilingual/BibleNLPBitextMining.py:912) else:
--> [913](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/tasks/BitextMining/multilingual/BibleNLPBitextMining.py:913)     dataset = datasets.load_dataset(
    [914](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/tasks/BitextMining/multilingual/BibleNLPBitextMining.py:914)         name=self._transform_lang_name_hf(lang),
    [915](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/tasks/BitextMining/multilingual/BibleNLPBitextMining.py:915)         **self.metadata_dict["dataset"],
    [916](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/tasks/BitextMining/multilingual/BibleNLPBitextMining.py:916)     )
    [917](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/tasks/BitextMining/multilingual/BibleNLPBitextMining.py:917)     self.dataset[lang] = datasets.DatasetDict({"train": dataset})
    [918](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/mteb/tasks/BitextMining/multilingual/BibleNLPBitextMining.py:918)     seen_pairs.append(hf_lang_name)

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:2556, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, trust_remote_code, **config_kwargs)
   [2551](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2551) verification_mode = VerificationMode(
   [2552](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2552)     (verification_mode or VerificationMode.BASIC_CHECKS) if not save_infos else VerificationMode.ALL_CHECKS
   [2553](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2553) )
   [2555](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2555) # Create a dataset builder
-> [2556](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2556) builder_instance = load_dataset_builder(
   [2557](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2557)     path=path,
   [2558](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2558)     name=name,
   [2559](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2559)     data_dir=data_dir,
   [2560](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2560)     data_files=data_files,
   [2561](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2561)     cache_dir=cache_dir,
   [2562](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2562)     features=features,
   [2563](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2563)     download_config=download_config,
   [2564](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2564)     download_mode=download_mode,
   [2565](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2565)     revision=revision,
   [2566](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2566)     token=token,
   [2567](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2567)     storage_options=storage_options,
   [2568](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2568)     trust_remote_code=trust_remote_code,
   [2569](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2569)     _require_default_config_name=name is None,
   [2570](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2570)     **config_kwargs,
   [2571](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2571) )
   [2573](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2573) # Return iterable dataset in case of streaming
   [2574](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2574) if streaming:

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:2265, in load_dataset_builder(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, use_auth_token, storage_options, trust_remote_code, _require_default_config_name, **config_kwargs)
   [2263](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2263) builder_cls = get_dataset_builder_class(dataset_module, dataset_name=dataset_name)
   [2264](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2264) # Instantiate the dataset builder
-> [2265](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2265) builder_instance: DatasetBuilder = builder_cls(
   [2266](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2266)     cache_dir=cache_dir,
   [2267](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2267)     dataset_name=dataset_name,
   [2268](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2268)     config_name=config_name,
   [2269](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2269)     data_dir=data_dir,
   [2270](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2270)     data_files=data_files,
   [2271](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2271)     hash=dataset_module.hash,
   [2272](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2272)     info=info,
   [2273](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2273)     features=features,
   [2274](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2274)     token=token,
   [2275](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2275)     storage_options=storage_options,
   [2276](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2276)     **builder_kwargs,
   [2277](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2277)     **config_kwargs,
   [2278](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2278) )
   [2279](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2279) builder_instance._use_legacy_cache_dir_if_possible(dataset_module)
   [2281](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/load.py:2281) return builder_instance

File /opt/conda/lib/python3.10/site-packages/datasets/builder.py:371, in DatasetBuilder.__init__(self, cache_dir, dataset_name, config_name, hash, base_path, info, features, token, use_auth_token, repo_id, data_files, data_dir, storage_options, writer_batch_size, name, **config_kwargs)
    [369](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:369) if data_dir is not None:
    [370](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:370)     config_kwargs["data_dir"] = data_dir
--> [371](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:371) self.config, self.config_id = self._create_builder_config(
    [372](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:372)     config_name=config_name,
    [373](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:373)     custom_features=features,
    [374](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:374)     **config_kwargs,
    [375](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:375) )
    [377](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:377) # prepare info: DatasetInfo are a standardized dataclass across all datasets
    [378](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:378) # Prefill datasetinfo
    [379](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:379) if info is None:
    [380](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:380)     # TODO FOR PACKAGED MODULES IT IMPORTS DATA FROM src/packaged_modules which doesn't make sense

File /opt/conda/lib/python3.10/site-packages/datasets/builder.py:592, in DatasetBuilder._create_builder_config(self, config_name, custom_features, **config_kwargs)
    [590](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:590)     builder_config = self.builder_configs.get(config_name)
    [591](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:591)     if builder_config is None and self.BUILDER_CONFIGS:
--> [592](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:592)         raise ValueError(
    [593](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:593)             f"BuilderConfig '{config_name}' not found. Available: {list(self.builder_configs.keys())}"
    [594](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:594)         )
    [596](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:596) # if not using an existing config, then create a new config on the fly
    [597](https://vscode-remote+ssh-002dremote-002bdev.vscode-resource.vscode-cdn.net/opt/conda/lib/python3.10/site-packages/datasets/builder.py:597) if not builder_config:

ValueError: BuilderConfig 'eng-eng' not found. Available: ['eng-aai', 'eng-aak', 'eng-aau', 'eng-aaz', 'eng-abt', 'eng-abx', 'eng-aby', 'eng-acf', 'eng-acr', 'eng-acu', 'eng-adz', 'eng-aer', 'eng-aey', 'eng-agd', 'eng-agg', 'eng-agm', 'eng-agn', 'eng-agr', 'eng-agt', 'eng-agu', 'eng-aia', 'eng-aii', 'eng-aka', 'eng-ake', 'eng-alp', 'eng-alq', 'eng-als', 'eng-aly', 'eng-ame', 'eng-amf', 'eng-amk', 'eng-amm', 'eng-amn', 'eng-amo', 'eng-amp', 'eng-amr', 'eng-amu', 'eng-amx', 'eng-anh', 'eng-anv', 'eng-aoi', 'eng-aoj', 'eng-aom', 'eng-aon', 'eng-apb', 'eng-ape', 'eng-apn', 'eng-apr', 'eng-apu', 'eng-apw', 'eng-apz', 'eng-arb', 'eng-are', 'eng-arl', 'eng-arn', 'eng-arp', 'eng-asm', 'eng-aso', 'eng-ata', 'eng-atb', 'eng-atd', 'eng-atg', 'eng-att', 'eng-auc', 'eng-aui', 'eng-auy', 'eng-avt', 'eng-awb', 'eng-awk', 'eng-awx', 'eng-azb', 'eng-azg', 'eng-azz', 'eng-bao', 'eng-bba', 'eng-bbb', 'eng-bbr', 'eng-bch', 'eng-bco', 'eng-bdd', 'eng-bea', 'eng-bef', 'eng-bel', 'eng-ben', 'eng-beo', 'eng-beu', 'eng-bgs', 'eng-bgt', 'eng-bhg', 'eng-bhl', 'eng-big', 'eng-bjk', 'eng-bjp', 'eng-bjr', 'eng-bjv', 'eng-bjz', 'eng-bkd', 'eng-bki', 'eng-bkq', 'eng-bkx', 'eng-blw', 'eng-blz', 'eng-bmh', 'eng-bmk', 'eng-bmr', 'eng-bmu', 'eng-bnp', 'eng-boa', 'eng-boj', 'eng-bon', 'eng-box', 'eng-bpr', 'eng-bps', 'eng-bqc', 'eng-bqp', 'eng-bre', 'eng-bsj', 'eng-bsn', 'eng-bsp', 'eng-bss', 'eng-buk', 'eng-bus', 'eng-bvd', 'eng-bvr', 'eng-bxh', 'eng-byr', 'eng-byx', 'eng-bzd', 'eng-bzh', 'eng-bzj', 'eng-caa', 'eng-cab', 'eng-cac', 'eng-caf', 'eng-cak', 'eng-cao', 'eng-cap', 'eng-car', 'eng-cav', 'eng-cax', 'eng-cbc', 'eng-cbi', 'eng-cbk', 'eng-cbr', 'eng-cbs', 'eng-cbt', 'eng-cbu', 'eng-cbv', 'eng-cco', 'eng-ceb', 'eng-cek', 'eng-ces', 'eng-cgc', 'eng-cha', 'eng-chd', 'eng-chf', 'eng-chk', 'eng-chq', 'eng-chz', 'eng-cjo', 'eng-cjv', 'eng-ckb', 'eng-cle', 'eng-clu', 'eng-cme', 'eng-cmn', 'eng-cni', 'eng-cnl', 'eng-cnt', 'eng-cof', 'eng-con', 'eng-cop', 'eng-cot', 'eng-cpa', 'eng-cpb', 'eng-cpc', 'eng-cpu', 'eng-cpy', 'eng-crn', 'eng-crx', 'eng-cso', 'eng-csy', 'eng-cta', 'eng-cth', 'eng-ctp', 'eng-ctu', 'eng-cub', 'eng-cuc', 'eng-cui', 'eng-cuk', 'eng-cut', 'eng-cux', 'eng-cwe', 'eng-cya', 'eng-daa', 'eng-dad', 'eng-dah', 'eng-dan', 'eng-ded', 'eng-deu', 'eng-dgc', 'eng-dgr', 'eng-dgz', 'eng-dhg', 'eng-dif', 'eng-dik', 'eng-dji', 'eng-djk', 'eng-djr', 'eng-dob', 'eng-dop', 'eng-dov', 'eng-dwr', 'eng-dww', 'eng-dwy', 'eng-ebk', 'eng-eko', 'eng-emi', 'eng-emp', 'eng-enq', 'eng-epo', 'eng-eri', 'eng-ese', 'eng-esk', 'eng-etr', 'eng-ewe', 'eng-faa', 'eng-fai', 'eng-far', 'eng-ffm', 'eng-for', 'eng-fra', 'eng-fue', 'eng-fuf', 'eng-fuh', 'eng-gah', 'eng-gai', 'eng-gam', 'eng-gaw', 'eng-gdn', 'eng-gdr', 'eng-geb', 'eng-gfk', 'eng-ghs', 'eng-glk', 'eng-gmv', 'eng-gng', 'eng-gnn', 'eng-gnw', 'eng-gof', 'eng-grc', 'eng-gub', 'eng-guh', 'eng-gui', 'eng-guj', 'eng-gul', 'eng-gum', 'eng-gun', 'eng-guo', 'eng-gup', 'eng-gux', 'eng-gvc', 'eng-gvf', 'eng-gvn', 'eng-gvs', 'eng-gwi', 'eng-gym', 'eng-gyr', 'eng-hat', 'eng-hau', 'eng-haw', 'eng-hbo', 'eng-hch', 'eng-heb', 'eng-heg', 'eng-hin', 'eng-hix', 'eng-hla', 'eng-hlt', 'eng-hmo', 'eng-hns', 'eng-hop', 'eng-hot', 'eng-hrv', 'eng-hto', 'eng-hub', 'eng-hui', 'eng-hun', 'eng-hus', 'eng-huu', 'eng-huv', 'eng-hvn', 'eng-ian', 'eng-ign', 'eng-ikk', 'eng-ikw', 'eng-ilo', 'eng-imo', 'eng-inb', 'eng-ind', 'eng-ino', 'eng-iou', 'eng-ipi', 'eng-isn', 'eng-ita', 'eng-iws', 'eng-ixl', 'eng-jac', 'eng-jae', 'eng-jao', 'eng-jic', 'eng-jid', 'eng-jiv', 'eng-jni', 'eng-jpn', 'eng-jvn', 'eng-kan', 'eng-kaq', 'eng-kbc', 'eng-kbh', 'eng-kbm', 'eng-kbq', 'eng-kdc', 'eng-kde', 'eng-kdl', 'eng-kek', 'eng-ken', 'eng-kew', 'eng-kgf', 'eng-kgk', 'eng-kgp', 'eng-khs', 'eng-khz', 'eng-kik', 'eng-kiw', 'eng-kiz', 'eng-kje', 'eng-kjs', 'eng-kkc', 'eng-kkl', 'eng-klt', 'eng-klv', 'eng-kmg', 'eng-kmh', 'eng-kmk', 'eng-kmo', 'eng-kms', 'eng-kmu', 'eng-kne', 'eng-knf', 'eng-knj', 'eng-knv', 'eng-kos', 'eng-kpf', 'eng-kpg', 'eng-kpj', 'eng-kpr', 'eng-kpw', 'eng-kpx', 'eng-kqa', 'eng-kqc', 'eng-kqf', 'eng-kql', 'eng-kqw', 'eng-ksd', 'eng-ksj', 'eng-ksr', 'eng-ktm', 'eng-kto', 'eng-kud', 'eng-kue', 'eng-kup', 'eng-kvg', 'eng-kvn', 'eng-kwd', 'eng-kwf', 'eng-kwi', 'eng-kwj', 'eng-kyc', 'eng-kyf', 'eng-kyg', 'eng-kyq', 'eng-kyz', 'eng-kze', 'eng-lac', 'eng-lat', 'eng-lbb', 'eng-lbk', 'eng-lcm', 'eng-leu', 'eng-lex', 'eng-lgl', 'eng-lid', 'eng-lif', 'eng-lin', 'eng-lit', 'eng-llg', 'eng-lug', 'eng-luo', 'eng-lww', 'eng-maa', 'eng-maj', 'eng-mal', 'eng-mam', 'eng-maq', 'eng-mar', 'eng-mau', 'eng-mav', 'eng-maz', 'eng-mbb', 'eng-mbc', 'eng-mbh', 'eng-mbj', 'eng-mbl', 'eng-mbs', 'eng-mbt', 'eng-mca', 'eng-mcb', 'eng-mcd', 'eng-mcf', 'eng-mco', 'eng-mcp', 'eng-mcq', 'eng-mcr', 'eng-mdy', 'eng-med', 'eng-mee', 'eng-mek', 'eng-meq', 'eng-met', 'eng-meu', 'eng-mgc', 'eng-mgh', 'eng-mgw', 'eng-mhl', 'eng-mib', 'eng-mic', 'eng-mie', 'eng-mig', 'eng-mih', 'eng-mil', 'eng-mio', 'eng-mir', 'eng-mit', 'eng-miz', 'eng-mjc', 'eng-mkj', 'eng-mkl', 'eng-mkn', 'eng-mks', 'eng-mle', 'eng-mlh', 'eng-mlp', 'eng-mmo', 'eng-mmx', 'eng-mna', 'eng-mop', 'eng-mox', 'eng-mph', 'eng-mpj', 'eng-mpm', 'eng-mpp', 'eng-mps', 'eng-mpt', 'eng-mpx', 'eng-mqb', 'eng-mqj', 'eng-msb', 'eng-msc', 'eng-msk', 'eng-msm', 'eng-msy', 'eng-mti', 'eng-mto', 'eng-mux', 'eng-muy', 'eng-mva', 'eng-mvn', 'eng-mwc', 'eng-mwe', 'eng-mwf', 'eng-mwp', 'eng-mxb', 'eng-mxp', 'eng-mxq', 'eng-mxt', 'eng-mya', 'eng-myk', 'eng-myu', 'eng-myw', 'eng-myy', 'eng-mzz', 'eng-nab', 'eng-naf', 'eng-nak', 'eng-nas', 'eng-nbq', 'eng-nca', 'eng-nch', 'eng-ncj', 'eng-ncl', 'eng-ncu', 'eng-ndg', 'eng-ndj', 'eng-nfa', 'eng-ngp', 'eng-ngu', 'eng-nhe', 'eng-nhg', 'eng-nhi', 'eng-nho', 'eng-nhr', 'eng-nhu', 'eng-nhw', 'eng-nhy', 'eng-nif', 'eng-nii', 'eng-nin', 'eng-nko', 'eng-nld', 'eng-nlg', 'eng-nna', 'eng-nnq', 'eng-noa', 'eng-nop', 'eng-not', 'eng-nou', 'eng-npi', 'eng-npl', 'eng-nsn', 'eng-nss', 'eng-ntj', 'eng-ntp', 'eng-ntu', 'eng-nuy', 'eng-nvm', 'eng-nwi', 'eng-nya', 'eng-nys', 'eng-nyu', 'eng-obo', 'eng-okv', 'eng-omw', 'eng-ong', 'eng-ons', 'eng-ood', 'eng-opm', 'eng-ory', 'eng-ote', 'eng-otm', 'eng-otn', 'eng-otq', 'eng-ots', 'eng-pab', 'eng-pad', 'eng-pah', 'eng-pan', 'eng-pao', 'eng-pes', 'eng-pib', 'eng-pio', 'eng-pir', 'eng-piu', 'eng-pjt', 'eng-pls', 'eng-plu', 'eng-pma', 'eng-poe', 'eng-poh', 'eng-poi', 'eng-pol', 'eng-pon', 'eng-por', 'eng-poy', 'eng-ppo', 'eng-prf', 'eng-pri', 'eng-ptp', 'eng-ptu', 'eng-pwg', 'eng-qub', 'eng-quc', 'eng-quf', 'eng-quh', 'eng-qul', 'eng-qup', 'eng-qvc', 'eng-qve', 'eng-qvh', 'eng-qvm', 'eng-qvn', 'eng-qvs', 'eng-qvw', 'eng-qvz', 'eng-qwh', 'eng-qxh', 'eng-qxn', 'eng-qxo', 'eng-rai', 'eng-reg', 'eng-rgu', 'eng-rkb', 'eng-rmc', 'eng-rmy', 'eng-ron', 'eng-roo', 'eng-rop', 'eng-row', 'eng-rro', 'eng-ruf', 'eng-rug', 'eng-rus', 'eng-rwo', 'eng-sab', 'eng-san', 'eng-sbe', 'eng-sbk', 'eng-sbs', 'eng-seh', 'eng-sey', 'eng-sgb', 'eng-sgz', 'eng-shj', 'eng-shp', 'eng-sim', 'eng-sja', 'eng-sll', 'eng-smk', 'eng-snc', 'eng-snn', 'eng-snp', 'eng-snx', 'eng-sny', 'eng-som', 'eng-soq', 'eng-soy', 'eng-spa', 'eng-spl', 'eng-spm', 'eng-spp', 'eng-sps', 'eng-spy', 'eng-sri', 'eng-srm', 'eng-srn', 'eng-srp', 'eng-srq', 'eng-ssd', 'eng-ssg', 'eng-ssx', 'eng-stp', 'eng-sua', 'eng-sue', 'eng-sus', 'eng-suz', 'eng-swe', 'eng-swh', 'eng-swp', 'eng-sxb', 'eng-tac', 'eng-taj', 'eng-tam', 'eng-tav', 'eng-taw', 'eng-tbc', 'eng-tbf', 'eng-tbg', 'eng-tbo', 'eng-tbz', 'eng-tca', 'eng-tcs', 'eng-tcz', 'eng-tdt', 'eng-tee', 'eng-tel', 'eng-ter', 'eng-tet', 'eng-tew', 'eng-tfr', 'eng-tgk', 'eng-tgl', 'eng-tgo', 'eng-tgp', 'eng-tha', 'eng-tif', 'eng-tim', 'eng-tiw', 'eng-tiy', 'eng-tke', 'eng-tku', 'eng-tlf', 'eng-tmd', 'eng-tna', 'eng-tnc', 'eng-tnk', 'eng-tnn', 'eng-tnp', 'eng-toc', 'eng-tod', 'eng-tof', 'eng-toj', 'eng-ton', 'eng-too', 'eng-top', 'eng-tos', 'eng-tpa', 'eng-tpi', 'eng-tpt', 'eng-tpz', 'eng-trc', 'eng-tsw', 'eng-ttc', 'eng-tte', 'eng-tuc', 'eng-tue', 'eng-tuf', 'eng-tuo', 'eng-tur', 'eng-tvk', 'eng-twi', 'eng-txq', 'eng-txu', 'eng-tzj', 'eng-tzo', 'eng-ubr', 'eng-ubu', 'eng-udu', 'eng-uig', 'eng-ukr', 'eng-uli', 'eng-ulk', 'eng-upv', 'eng-ura', 'eng-urb', 'eng-urd', 'eng-uri', 'eng-urt', 'eng-urw', 'eng-usa', 'eng-usp', 'eng-uvh', 'eng-uvl', 'eng-vid', 'eng-vie', 'eng-viv', 'eng-vmy', 'eng-waj', 'eng-wal', 'eng-wap', 'eng-wat', 'eng-wbi', 'eng-wbp', 'eng-wed', 'eng-wer', 'eng-wim', 'eng-wiu', 'eng-wiv', 'eng-wmt', 'eng-wmw', 'eng-wnc', 'eng-wnu', 'eng-wol', 'eng-wos', 'eng-wrk', 'eng-wro', 'eng-wrs', 'eng-wsk', 'eng-wuv', 'eng-xav', 'eng-xbi', 'eng-xed', 'eng-xla', 'eng-xnn', 'eng-xon', 'eng-xsi', 'eng-xtd', 'eng-xtm', 'eng-yaa', 'eng-yad', 'eng-yal', 'eng-yap', 'eng-yaq', 'eng-yby', 'eng-ycn', 'eng-yka', 'eng-yle', 'eng-yml', 'eng-yon', 'eng-yor', 'eng-yrb', 'eng-yre', 'eng-yss', 'eng-yuj', 'eng-yut', 'eng-yuw', 'eng-yva', 'eng-zaa', 'eng-zab', 'eng-zac', 'eng-zad', 'eng-zai', 'eng-zaj', 'eng-zam', 'eng-zao', 'eng-zap', 'eng-zar', 'eng-zas', 'eng-zat', 'eng-zav', 'eng-zaw', 'eng-zca', 'eng-zga', 'eng-zia', 'eng-ziw', 'eng-zlm', 'eng-zos', 'eng-zpc', 'eng-zpl', 'eng-zpm', 'eng-zpo', 'eng-zpq', 'eng-zpu', 'eng-zpv', 'eng-zpz', 'eng-zsr', 'eng-ztq', 'eng-zty', 'eng-zyp']

Is this error expected for the configuration preferences I set? If so, would you mind please telling me how to essentially say "run all tasks for english"? Thanks!

KennethEnevoldsen commented 7 months ago

Hi @austinmw, thanks for always reporting these bugs - unexpectedly the tasks_langs=["en"] does in fact not run filter datasets to only run on English tasks. Instead what it does is that it filters to all the "en" subsets of MTEB.

I agree that this is counter-intuitive and I am slowly working toward fixing this in the following utility function:

tasks = mteb.get_tasks(languages=["eng"])
evaluation = MTEB(tasks = tasks)

This will get you all tasks that contain English (but also other languages in case of multilingual datasets).

I address this (along with the bug you discovered in #604 so that once it is merged the code above should be what you are looking for.

austinmw commented 7 months ago

Sounds good, thanks again!

KennethEnevoldsen commented 5 months ago

@austinmw just letting you know that this is now the default mode of interaction with MTEB. The previous approach will raise a deprecation warning

austinmw commented 5 months ago

Thanks!