embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.97k stars 275 forks source link

[MIEB] `royokong/e5-v` not runnable #1334

Open Muennighoff opened 4 weeks ago

Muennighoff commented 4 weeks ago
INFO:mteb.cli:Running with parameters: Namespace(model='royokong/e5-v', task_types=None, categories=None, tasks=['BLINKIT2IRetrieval'], languages=None, device=None, output_folder='/data/niklas/mieb/results-mieb-final', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=4, overwrite=False, save_predictions=False, func=<function run at 0x7f1f791d16c0>)

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]
Downloading shards:  25%|██▌       | 1/4 [01:58<05:55, 118.58s/it]
Downloading shards:  50%|█████     | 2/4 [03:55<03:55, 117.77s/it]
Downloading shards:  75%|███████▌  | 3/4 [05:55<01:58, 118.88s/it]
Downloading shards: 100%|██████████| 4/4 [06:40<00:00, 89.48s/it] 
Downloading shards: 100%|██████████| 4/4 [06:40<00:00, 100.10s/it]
Traceback (most recent call last):
  File "/env/lib/conda/gritkto4/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mieb/mteb/mteb/cli.py", line 346, in main
    args.func(args)
  File "/data/niklas/mieb/mteb/mteb/cli.py", line 115, in run
    model = mteb.get_model(args.model, args.model_revision, device=device)
  File "/data/niklas/mieb/mteb/mteb/models/__init__.py", line 57, in get_model
    model = meta.load_model(**kwargs)
  File "/data/niklas/mieb/mteb/mteb/model_meta.py", line 95, in load_model
    model: Encoder | EncoderWithQueryCorpusEncode = loader(**kwargs)  # type: ignore
  File "/data/niklas/mieb/mteb/mteb/models/e5_v.py", line 24, in __init__
    self.model = LlavaNextForConditionalGeneration.from_pretrained(
  File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4096, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: LlavaNextForConditionalGeneration.__init__() got an unexpected keyword argument 'device'
gowitheflow-1998 commented 4 weeks ago

thanks for raising this. Something to do with cli that I didn't think of. In mteb/models/e5_v.py, could you add a pop device before passing **kwargs into LlavaNextForConditionalGeneration for now?

self.device = kwargs.pop("device")
self.model = LlavaNextForConditionalGeneration.from_pretrained(
    model_name, **kwargs
)

will open a PR with all issues soon.