UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.21k stars 2.47k forks source link

Bug: cannot import name '__version__' from 'datasets' #2817

Open projectNoob opened 3 months ago

projectNoob commented 3 months ago

I installed sentence-transformers with pip.\ Then I imported it and called SentenceTransformer(something) and that happened.\ Did I do something wrong?

tomaarsen commented 3 months ago

Hello!

No, you didn't do anything wrong :) I think this line triggered: https://github.com/UKPLab/sentence-transformers/blob/b188ce15eeec10a81c1cae93370e78dea75c5a97/sentence_transformers/model_card.py#L220-L221

Normally, this imports the datasets Python module (https://pypi.org/project/datasets/), but I think in your case I think you might have a local file or folder calls datasets, which it tried to import here. I think the easiest solution is to install the datasets module that Sentence Transformers uses:

pip install datasets

Let me know if that doesn't help!

Sultanax commented 3 months ago

Hi! I'm having a similar issue.

My issue was ImportError: cannot import name 'Dataset' from 'datasets', which referred to

https://github.com/UKPLab/sentence-transformers/blob/b56a987d644ae455ebdc3d2eccc9955348d2778e/sentence_transformers/model_card.py#L32-L33

Like @projectNoob, I also had a folder called "datasets". I have datasets 2.20.0 installed. Therefore, as a quick fix, I tried renaming my folder into data_sets, but am now getting this error:

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I've also ran my same exact code about a week ago, but this issue only came up now. I've always had my folder be called 'datasets' but this issue only showed up within the last few days. Could you point me to any commits that could have caused this change? I'll try working through this as well, but please let me know if you were able to pinpoint the exact problem. Thanks so much for all your hard work :)

(For reference, I will include my full error)

Traceback (most recent call last):
  File "/Users/sultana/Downloads/AVS/main.py", line 17, in <module>
    from dataset import get_training_set, get_validation_set, get_test_set
  File "/Users/sultana/Downloads/AVS/dataset.py", line 1, in <module>
    from data_sets.saliency_db import saliency_db
  File "/Users/sultana/Downloads/AVS/data_sets/saliency_db.py", line 12, in <module>
    from sentence_transformers import SentenceTransformer
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/sentence_transformers/__init__.py", line 7, in <module>
    from sentence_transformers.cross_encoder.CrossEncoder import CrossEncoder
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/sentence_transformers/cross_encoder/__init__.py", line 1, in <module>
    from .CrossEncoder import CrossEncoder
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 18, in <module>
    from sentence_transformers.SentenceTransformer import SentenceTransformer
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 27, in <module>
    from sentence_transformers.model_card import SentenceTransformerModelCardData, generate_model_card
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/sentence_transformers/model_card.py", line 33, in <module>
    from datasets import Dataset, DatasetDict, Value
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/datasets/__init__.py", line 17, in <module>
    from .arrow_dataset import Dataset
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 59, in <module>
    import pandas as pd
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/pandas/compat/__init__.py", line 18, in <module>
    from pandas.compat.numpy import (
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
    from pandas.util.version import Version
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/pandas/util/__init__.py", line 2, in <module>
    from pandas.util._decorators import (  # noqa:F401
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/pandas/util/_decorators.py", line 14, in <module>
    from pandas._libs.properties import cache_readonly
  File "/Users/sultana/anaconda3/lib/python3.11/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
tomaarsen commented 3 months ago

Hello!

It looks like this new issue originates from

import pandas as pd

which might indicate that there is an issue with your pandas or numpy install. If you try python -c "import pandas; print(pandas.__version__)", does that print your pandas version or does it give a similar error? If the latter, you might want to consider reinstalling pandas or numpy. If the former, please let me know and we'll try and debug it further.

Sultanax commented 3 months ago

Thanks so much! I was able to solve my issue. It was just a compatability issue between the numpy version I was using and some other dependecies I needed!

However, is there a workaround for the "datasets" issue? I initially renamed my folder, but would it be possible to have a Python module and directory folder have the same name?