Closed AlfanDindaR closed 1 year ago
Could you share the full error message? I believe there are some frames missing that might be relevant. Also, BERTopic by itself does not use tensorflow but pytorch, so if you are not using a tensorflow-based embedding model, then it would be no problem to set protobuf to 4.2.1.
RAPIDS pip packages are not compatible with the Tensorflow pip package due to Tensorflow's protobuf constraint, as noted on the RAPIDS pip page. To use cuML with Tensorflow, I recommend using conda environments and installing Tensorflow from the conda-forge channel (which relaxes the constraint) or a Docker container containing a Tensorflow package without the tight constraint (such as the one linked on the RAPIDS website).
Could you share the full error message? I believe there are some frames missing that might be relevant. Also, BERTopic by itself does not use tensorflow but pytorch, so if you are not using a tensorflow-based embedding model, then it would be no problem to set protobuf to 4.2.1.
This is error found while i'm import bertopic after install CUML @MaartenGr
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[<ipython-input-3-08334298937f>](https://localhost:8080/#) in <module>
----> 1 import bertopic
12 frames
[/usr/local/lib/python3.8/dist-packages/google/protobuf/descriptor.py](https://localhost:8080/#) in __new__(cls, name, full_name, index, number, type, cpp_type, label, default_value, message_type, enum_type, containing_type, is_extension, extension_scope, options, serialized_options, has_default_value, containing_oneof, json_name, file, create_key)
558 has_default_value=True, containing_oneof=None, json_name=None,
559 file=None, create_key=None): # pylint: disable=redefined-builtin
--> 560 _message.Message._CheckCalledFromGeneratedFile()
561 if is_extension:
562 return _message.default_pool.FindExtensionByName(full_name)
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
But after downgrade cuml into version 22.12 it's works well,
!pip install --quiet cudf-cu11==22.12 dask-cudf-cu11 --extra-index-url=https://pypi.nvidia.com
!pip install --quiet cuml-cu11==22.12 --extra-index-url=https://pypi.nvidia.com
but i found error again while training topic
Batches: 100%
32/32 [00:08<00:00, 8.27it/s]
2023-02-27 03:13:31,999 - BERTopic - Transformed documents to Embeddings
2023-02-27 03:13:32,121 - BERTopic - The dimensionality reduction algorithm did not contain the `y` parameter and therefore the `y` parameter was not used
this is my HDBSCAN and UMAP Code
from cuml.cluster import HDBSCAN
from cuml.manifold import UMAP
cluster_model = HDBSCAN(
min_cluster_size=10,
metric='euclidean',
cluster_selection_method='eom',
prediction_data=True
) # Clustering Model
umap_model = UMAP(
n_neighbors=15,
n_components=5,
min_dist=0.0,
)
Sorry to ask again sir @MaartenGr , are we can used cuml library to create topic model on version 0.14?
Yes, you should be able to use cuml with BERTopic 0.14.
Due to inactivity, I'll be closing this issue. Let me know if you want me to re-open the issue!
Hi @MaartenGr i get error while used cuml, this is my error found
this error showing about protobuf conflict, bertopic need protobuf >= 3.9 but cuml need protobuf==4.21, how can i solve this issue?