UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.41k stars 2.4k forks source link

Issue generating model cards with sentence-transformers[train] v3.0.1 #2739

Open smerrill opened 1 month ago

smerrill commented 1 month ago

Hey all; I'm back with this error. I have not really had much time to check it out, but I suspect that it might be related to local datasets because as before my project uses CachedGISTEmbedLoss and a local Dataset loaded from a parquet file on disk.

Error while generating model card:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 1112, in _create_model_card
    model_card = generate_model_card(self)
  File "/usr/local/lib/python3.10/site-packages/sentence_transformers/model_card.py", line 977, in generate_model_card
    model_card = ModelCard.from_template(card_data=model.model_card_data, template_path=template_path, hf_emoji="🤗")
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/repocard.py", line 414, in from_template
    return super().from_template(card_data, template_path, template_str, **template_kwargs)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/repocard.py", line 324, in from_template
    kwargs = card_data.to_dict().copy()
  File "/usr/local/lib/python3.10/site-packages/sentence_transformers/model_card.py", line 904, in to_dict
    self.set_widget_examples(dataset)
  File "/usr/local/lib/python3.10/site-packages/sentence_transformers/model_card.py", line 423, in set_widget_examples
    for idx, sample in enumerate(
  File "/usr/local/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2450, in __iter__
    yield self._getitem(
  File "/usr/local/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2845, in _getitem
    pa_subtable = query_table(self._data, key, indices=self._indices)
  File "/usr/local/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 592, in query_table
    pa_subtable = _query_table_with_indices_mapping(table, key, indices=indices)
  File "/usr/local/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 62, in _query_table_with_indices_mapping
    return _query_table(table, key)
  File "/usr/local/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 86, in _query_table
    return table.fast_slice(key % table.num_rows, 1)
ZeroDivisionError: integer division or modulo by zero
Consider opening an issue on https://github.com/UKPLab/sentence-transformers/issues with this traceback.
Skipping model card creation.
tomaarsen commented 1 month ago

Hello!

Hmm, this is an odd one. It seems like the model card is trying to slice a Dataset to get some samples, but the dataset is empty? Or rather, the number of rows is 0. Do you have an evaluation dataset in your training script? And is it non-empty?

smerrill commented 1 month ago

Yes - this is using a local train and eval dataset along with a populated IR evaluator. I'll try to get a minimal reproduction this week.