MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6k stars 752 forks source link

Issue with BERTopic Installation - ModuleNotFoundError for importlib-metadata #1669

Open sdave-connexion opened 9 months ago

sdave-connexion commented 9 months ago

Hello Support Team,

I am encountering an issue with the BERTopic package in Python, specifically a ModuleNotFoundError related to the importlib-metadata module. Despite BERTopic's documentation stating compatibility with Python 3.7

I have looked into this but couldn't find the solution: -

As while importing BERTopic, it needs umap: https://github.com/MaartenGr/BERTopic/blob/master/bertopic/_bertopic.py#L38

which also imports importlib.metadata: https://github.com/lmcinnes/umap/blob/master/umap/__init__.py#L36

The error clearly says the package is missing but it is being installed automatically: importlib-metadata==6.7.0 importlib-resources==5.12.0 ​ We tried different package versions but this didn't help.

my current requirements file looks like this: -

bertopic[spacy]==0.15.0 plotly seaborn matplotlib wordcloud urllib3<2.0,>=1.21.1 nbformat>=4.2.0 enchant ipywidgets protobuf openai==0.28.1 spacy==3.7.2 pydantic==1.10.13

The code environment does build up but when we try to load the libraries we get the error ``

-- coding: utf-8 --

import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu

import dependencies

import pandas as pd from bertopic import BERTopic from sentence_transformers import SentenceTransformer, util from umap import UMAP from hdbscan import HDBSCAN from sklearn.feature_extraction.text import CountVectorizer from bertopic.vectorizers import ClassTfidfTransformer import numpy as np from bertopic.representation import KeyBERTInspired, MaximalMarginalRelevance, OpenAI, PartOfSpeech ``

The error is follwing: -



`` ModuleNotFoundError Traceback (most recent call last)

in 5 #import dependencies 6 import pandas as pd ----> 7 from bertopic import BERTopic 8 from sentence_transformers import SentenceTransformer, util 9 from umap import UMAP /data/dataiku/dss_data/code-envs/python/topicmodel_x1/lib/python3.7/site-packages/bertopic/__init__.py in ----> 1 from bertopic._bertopic import BERTopic 2 3 __version__ = "0.15.0" 4 5 __all__ = [ /data/dataiku/dss_data/code-envs/python/topicmodel_x1/lib/python3.7/site-packages/bertopic/_bertopic.py in 36 # Models 37 import hdbscan ---> 38 from umap import UMAP 39 from sklearn.preprocessing import normalize 40 from sklearn import __version__ as sklearn_version /data/dataiku/dss_data/code-envs/python/topicmodel_x1/lib/python3.7/site-packages/umap/__init__.py in 34 import numba 35 ---> 36 from importlib.metadata import version, PackageNotFoundError 37 38 try: ModuleNotFoundError: No module named 'importlib.metadata' -------- `` I would appreciate any guidance or assistance you can provide in resolving this issue. If any additional information is required, please let me know. Thank you for your time and support. Best regards, Shantanu Dave
MaartenGr commented 9 months ago

Ah, it seems that the documentation can be updated in that respect. Although python 3.7 was initially supported, many dependencies have dropped it since officially 3.7 is not getting security updates I believe. Perhaps using 3.8 or higher would fix this.