MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
5.97k stars 747 forks source link

random openai issue with plain bertopic use #1973

Open manas007 opened 4 months ago

manas007 commented 4 months ago

Hello, Note, this was running as on April 22nd.

!pip install -U bertopic

import bertopic bertopic.version

Error log:

AttributeError: module 'openai' has no attribute 'OpenAI'

AttributeError Traceback (most recent call last) File , line 1 ----> 1 import bertopic 2 bertopic.version

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-390f0499-9705-4ba6-ba4a-010185dffaa9/lib/python3.10/site-packages/bertopic/init.py:1 ----> 1 from bertopic._bertopic import BERTopic 3 version = "0.16.1" 5 all = [ 6 "BERTopic", 7 ]

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-390f0499-9705-4ba6-ba4a-010185dffaa9/lib/python3.10/site-packages/bertopic/_bertopic.py:48 46 from bertopic import plotting 47 from bertopic.cluster import BaseCluster ---> 48 from bertopic.backend import BaseEmbedder 49 from bertopic.representation._mmr import mmr 50 from bertopic.backend._utils import select_backend

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-390f0499-9705-4ba6-ba4a-010185dffaa9/lib/python3.10/site-packages/bertopic/backend/init.py:8 6 # OpenAI Embeddings 7 try: ----> 8 from bertopic.backend._openai import OpenAIBackend 9 except ModuleNotFoundError: 10 msg = "pip install openai \n\n"

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-390f0499-9705-4ba6-ba4a-010185dffaa9/lib/python3.10/site-packages/bertopic/backend/_openai.py:9 5 from typing import List, Mapping, Any 6 from bertopic.backend import BaseEmbedder ----> 9 class OpenAIBackend(BaseEmbedder): 10 """ OpenAI Embedding Model 11 12 Arguments: (...) 32 ``` 33 """ 34 def init(self, 35 client: openai.OpenAI, 36 embedding_model: str = "text-embedding-ada-002", 37 delay_in_seconds: float = None, 38 batch_size: int = None, 39 generator_kwargs: Mapping[str, Any] = {}):

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-390f0499-9705-4ba6-ba4a-010185dffaa9/lib/python3.10/site-packages/bertopic/backend/_openai.py:35, in OpenAIBackend() 9 class OpenAIBackend(BaseEmbedder): 10 """ OpenAI Embedding Model 11 12 Arguments: (...) 32 ``` 33 """ 34 def init(self, ---> 35 client: openai.OpenAI, 36 embedding_model: str = "text-embedding-ada-002", 37 delay_in_seconds: float = None, 38 batch_size: int = None, 39 generator_kwargs: Mapping[str, Any] = {}): 40 super().init() 41 self.client = client

AttributeError: module 'openai' has no attribute 'OpenAI'

MaartenGr commented 4 months ago

Have you tried installing the latest version of openai?

manas007 commented 4 months ago

@MaartenGr is it required?

manas007 commented 4 months ago

@MaartenGr can you suggest why is installing openai required for bertopic ? the documentation does not mention that you need to install that as part of bertopic.

as per doc (https://maartengr.github.io/BERTopic/index.html) Installation, with sentence-transformers, can be done using pypi pip install bertopic

MaartenGr commented 4 months ago

@manas007 Sure! The installation of BERTopic installs the necessary packages needed for everything the base functionality. Since BERTopic is a highly modular package, there are many extensions that you can use that require additional packages. Installing them all at once would clutter the dependencies and likely result in a bunch of dependency conflicts.

This means that whenever you use certain extensions, like the OpenAI offering, the documentation will state that you additionally need to install that specific package.

This also relates to production settings where installing dozens of packages is not helpful, so providing a relative minimal installation is generally preferred. Adding packages with pip is easy, removing cannot be done easily with pip.

manas007 commented 4 months ago

@MaartenGr do you mean that i am "required" to install openai along with bertopic , even if i have no intention to use the languague model ?

please note, even with pip install openai, the error does not go away . so this is independent of the openai install.

can you please advise ?

MaartenGr commented 4 months ago

do you mean that i am "required" to install openai along with bertopic , even if i have no intention to use the languague model ?

No, that's definitely not the case. You can use BERTopic without needing to install openai. Looking at your code and error, it must be a problem with your environment. I just installed BERTopic a couple of times with pip install bertopic in fresh environments and I do not get this issue. Could you try installing BERTopic from a completely new and empty environment?

manas007 commented 4 months ago

thanks, will try that.

csq-dr commented 3 months ago

Hi! I'm trying to use the following line to load BERTopic (version 0.16.2 installed from pypi) to run some codes in a GPU environment: from bertopic import BERTopic But got the below error message:

AttributeError: module 'openai' has no attribute 'OpenAI'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File <command-3593022780386937>, line 1
----> 1 from bertopic import BERTopic

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/bertopic/__init__.py:1
----> 1 from bertopic._bertopic import BERTopic
      3 __version__ = "0.16.2"
      5 __all__ = [
      6     "BERTopic",
      7 ]

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/bertopic/_bertopic.py:48
     46 from bertopic import plotting
     47 from bertopic.cluster import BaseCluster
---> 48 from bertopic.backend import BaseEmbedder
     49 from bertopic.representation._mmr import mmr
     50 from bertopic.backend._utils import select_backend

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/bertopic/backend/__init__.py:8
      6 # OpenAI Embeddings
      7 try:
----> 8     from bertopic.backend._openai import OpenAIBackend
      9 except ModuleNotFoundError:
     10     msg = "`pip install openai` \n\n"

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/bertopic/backend/_openai.py:9
      5 from typing import List, Mapping, Any
      6 from bertopic.backend import BaseEmbedder
----> 9 class OpenAIBackend(BaseEmbedder):
     10     """ OpenAI Embedding Model
     11 
     12     Arguments:
   (...)
     32     ```
     33     """
     34     def __init__(self,
     35                  client: openai.OpenAI,
     36                  embedding_model: str = "text-embedding-ada-002",
     37                  delay_in_seconds: float = None,
     38                  batch_size: int = None,
     39                  generator_kwargs: Mapping[str, Any] = {}):

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/bertopic/backend/_openai.py:35, in OpenAIBackend()
      9 class OpenAIBackend(BaseEmbedder):
     10     """ OpenAI Embedding Model
     11 
     12     Arguments:
   (...)
     32     ```
     33     """
     34     def __init__(self,
---> 35                  client: openai.OpenAI,
     36                  embedding_model: str = "text-embedding-ada-002",
     37                  delay_in_seconds: float = None,
     38                  batch_size: int = None,
     39                  generator_kwargs: Mapping[str, Any] = {}):
     40         super().__init__()
     41         self.client = client

AttributeError: module 'openai' has no attribute 'OpenAI'

Is the openai module required for loading BERTopic in GPU environment? I also have a CPU environment in which I can load BERTopic without any issues, and I don't have openai module installed in both environments. Thank you!

manas007 commented 3 months ago

@MaartenGr looks like others are facing the same issue.

MaartenGr commented 3 months ago

@csq-dr You mentioned you installed BERTopic using pip, did you also install openai using pip install openai? You need to have that installed before you can use OpenAI's offering.

@manas007 Thank you. Did you try installing openai using pip install openai and then restarting the notebook?

manas007 commented 3 months ago

@MaartenGr yes. i tried that. no luck yet. will keep you posted

csq-dr commented 3 months ago

@MaartenGr Thanks for the reply. I don't plan to use openai due to project restriction and plan to use embedding model(s) from HF. I'm wondering if there is a way I can use BERTopic without installing openai module?

MaartenGr commented 3 months ago

@manas007 @csq-dr

I just created a new environment as follows:

conda create -n bertopic_env python=3.10
conda activate bertopoic_env

Then installed BERTopic as follows:

pip install bertopic

Which installed BERTopic v0.16.2. Then, I tried to run the code you both mentioned which gave me no problems:

from bertopic import BERTopic

In other words, have you both tried creating a completely new environment and created a fresh install of BERTopic? BERTopic does not require openai and it should be possible to run it without it. However, you might have dependency conflicts or pre-existing installations of openai (or even an openai.py file somewhere) that might cause some problems. What typically works best is simply hit the refresh button and start with a new environment.

csq-dr commented 3 months ago

@MaartenGr Thank you! I tried the code in a brand new environment with 0.16.2 which gave the error message. I've just tried version 0.16.0 which was installed and loaded smoothly without any error messages. By the way, I also find that version 0.16.2 doesn't work even with openai module installed. Could it be an issue caused by openai.OpenAI() being a class method not a class attribute?

MaartenGr commented 3 months ago

@csq-dr Could you go step by step how you created a new environment and how you installed BERTopic? As you can see in my response above I can't seem to reproduce the issue following those steps.

Could it be an issue caused by openai.OpenAI() being a class method not a class attribute?

I don't think so since you can use classes as type hints. What is happening here is that openai is imported first with import openai. After that, it attempts to access openai.OpenAI.

Since you started from a fresh environment, how can it be that is does not give an error when it runs import openai? I think that either an old or different version of openai is installed or that you have a file called openai.py somewhere near your working directory.

amitca71 commented 3 months ago

same thing i got with databricks version 14.3LTS (python version 3.10.12). i changed to databricks 15.1 and it fixed (python 3.11.0), and it fixed

csq-dr commented 3 months ago

@MaartenGr I'm using Databricks same as @amitca71 but with version 13.3 LTS (Python 3.10) for both clusters. My GPU cluster got openai preinstalled but my CPU cluster didn't have it. I'll try to load bertopic with Python 3.11 see if the issue got solved.

Edit: updating cluster to 15.1 solves the loading issue. Thank you!

MaartenGr commented 3 months ago

Glad to hear that a different environment solved the issue. It seems that since openai was pre-installed, it might have been a pre-1.0 version that is not suitable with BERTopic as that version was deprecated by OpenAI. Aside from changing environments, either uninstalling openai or upgrading openai might also be a solution for those that wish to use those LLMs.