embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.83k stars 246 forks source link

SIB200Classification HF error #1019

Closed Muennighoff closed 3 months ago

Muennighoff commented 3 months ago

INFO:mteb.cli:Running with parameters: Namespace(model='sentence-transformers/all-MiniLM-L12-v2', task_types=None, categories=None, tasks=['SIB200Classification'], languages=None, device=None, output_folder='/data/niklas/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, overwrite=False, func=<function run at 0x7f71eac93010>) INFO:mteb.models:Model not found in model registry, assuming it is a sentence-transformers model. INFO:mteb.models:Attempting to extract metadata by loading the model (sentence-transformers/all-MiniLM-L12-v2) using sentence-transformers. /env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( INFO:mteb.evaluation.MTEB:

Evaluating 1 tasks:

─────────────────────────────── Selected tasks ──────────────────────────────── Classification

INFO:mteb.evaluation.MTEB:

** Evaluating SIB200Classification ** INFO:mteb.evaluation.MTEB:Loading dataset for SIB200Classification ERROR:mteb.evaluation.MTEB:Error while evaluating SIB200Classification: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/datasets/mteb/sib200/paths-info/a74d7350ea12af010cfb1c21e34f1f81fd2e615b Traceback (most recent call last): File "/env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status response.raise_for_status() File "/env/lib/conda/gritkto/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/datasets/mteb/sib200/paths-info/a74d7350ea12af010cfb1c21e34f1f81fd2e615b

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/env/lib/conda/gritkto/bin/mteb", line 8, in sys.exit(main()) File "/data/niklas/mteb/mteb/cli.py", line 370, in main args.func(args) File "/data/niklas/mteb/mteb/cli.py", line 118, in run eval.run( File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 388, in run raise e File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 328, in run task.load_data(eval_splits=task_eval_splits, kwargs) File "/data/niklas/mteb/mteb/abstasks/MultiSubsetLoader.py", line 16, in load_data self.slow_load() File "/data/niklas/mteb/mteb/abstasks/MultiSubsetLoader.py", line 45, in slow_load self.dataset[lang] = datasets.load_dataset( File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/load.py", line 2592, in load_dataset builder_instance = load_dataset_builder( File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/load.py", line 2301, in load_dataset_builder builder_instance: DatasetBuilder = builder_cls( File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/builder.py", line 374, in init self.config, self.config_id = self._create_builder_config( File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/builder.py", line 627, in _create_builder_config builder_config._resolve_data_files( File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/builder.py", line 213, in _resolve_data_files self.data_files = self.data_files.resolve(base_path, download_config) File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/data_files.py", line 814, in resolve out[key] = data_files_patterns_list.resolve(base_path, download_config) File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/data_files.py", line 767, in resolve resolve_pattern( File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/data_files.py", line 384, in resolve_pattern for filepath, info in fs.glob(pattern, detail=True, glob_kwargs).items() File "/env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/hf_file_system.py", line 407, in glob return super().glob(path, *kwargs) File "/env/lib/conda/gritkto/lib/python3.10/site-packages/fsspec/spec.py", line 580, in glob return {path: self.info(path)} File "/env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/hf_file_system.py", line 527, in info paths_info = self._api.get_paths_info( File "/env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(args, **kwargs) File "/env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3045, in get_paths_info hf_raise_for_status(response) File "/env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 371, in hf_raise_for_status raise HfHubHTTPError(str(e), response=response) from e huggingface_hub.utils._errors.HfHubHTTPError: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/datasets/mteb/sib200/paths-info/a74d7350ea12af010cfb1c21e34f1f81fd2e615b

KennethEnevoldsen commented 3 months ago

For ref: #1014

Muennighoff commented 3 months ago

After retrying this worked 🤔 https://github.com/embeddings-benchmark/results/blob/0bbf426f826396f1210ea71ad6035494156c3ab3/sentence-transformers__all-MiniLM-L6-v2/8b3219a92973c328a8e22fadcfa821b5dc75636a/SIB200Classification.json

KennethEnevoldsen commented 3 months ago

Hmm must be on the HF side