MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.11k stars 763 forks source link

Pip install Bertopic fails on fabric notebooks #1606

Open joshedOpenAi opened 12 months ago

joshedOpenAi commented 12 months ago

Seeing this error related to hbdscan when trying to install this on fabric notebook

-normalizer<4,>=2 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests->transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic) (3.2.0) Requirement already satisfied: idna<4,>=2.5 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests->transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests->transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests->transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic) (2023.7.22) Requirement already satisfied: mpmath>=0.19 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from sympy->torch>=1.6.0->sentence-transformers>=0.4.1->bertopic) (1.3.0) Downloading bertopic-0.15.0-py2.py3-none-any.whl (143 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.4/143.4 kB 47.6 MB/s eta 0:00:00 Using cached Cython-0.29.36-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB) Downloading tbb-2021.10.0-py2.py3-none-manylinux1_x86_64.whl (4.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 190.8 MB/s eta 0:00:00 Downloading torchvision-0.16.0-cp310-cp310-manylinux1_x86_64.whl (6.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.9/6.9 MB 163.2 MB/s eta 0:00:0000:01 Downloading torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl (670.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 670.2/670.2 MB 9.5 MB/s eta 0:00:00:00:0100:01 Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 8.5 MB/s eta 0:00:00:00:0100:01 Downloading triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.2/89.2 MB 73.1 MB/s eta 0:00:00:00:0100:01 Downloading nvidia_nvjitlink_cu12-12.3.52-py3-none-manylinux1_x86_64.whl (20.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.5/20.5 MB 130.6 MB/s eta 0:00:0000:0100:01 Building wheels for collected packages: hdbscan, umap-learn, pynndescent Building wheel for hdbscan (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for hdbscan (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [51 lines of output] running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-cpython-310 creating build/lib.linux-x86_64-cpython-310/hdbscan copying hdbscan/validity.py -> build/lib.linux-x86_64-cpython-310/hdbscan copying hdbscan/robust_singlelinkage.py -> build/lib.linux-x86_64-cpython-310/hdbscan copying hdbscan/prediction.py -> build/lib.linux-x86_64-cpython-310/hdbscan copying hdbscan/plots.py -> build/lib.linux-x8664-cpython-310/hdbscan copying hdbscan/hdbscan.py -> build/lib.linux-x86_64-cpython-310/hdbscan copying hdbscan/flat.py -> build/lib.linux-x86_64-cpython-310/hdbscan copying hdbscan/init.py -> build/lib.linux-x86_64-cpython-310/hdbscan creating build/lib.linux-x86_64-cpython-310/hdbscan/tests copying hdbscan/tests/test_rsl.py -> build/lib.linux-x86_64-cpython-310/hdbscan/tests copying hdbscan/tests/test_prediction_utils.py -> build/lib.linux-x86_64-cpython-310/hdbscan/tests copying hdbscan/tests/test_hdbscan.py -> build/lib.linux-x86_64-cpython-310/hdbscan/tests copying hdbscan/tests/test_flat.py -> build/lib.linux-x86_64-cpython-310/hdbscan/tests copying hdbscan/tests/init.py -> build/lib.linux-x86_64-cpython-310/hdbscan/tests running build_ext cythoning hdbscan/_hdbscan_tree.pyx to hdbscan/_hdbscan_tree.c /tmp/pip-build-env-jo48n8cu/overlay/lib/python3.10/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-yxj_i8le/hdbscan_a17f63d69fbc4ac292c74e77f79ac9e4/hdbscan/_hdbscan_tree.pyx tree = Parsing.p_module(s, pxd, full_module_name) cythoning hdbscan/_hdbscan_linkage.pyx to hdbscan/_hdbscan_linkage.c /tmp/pip-build-env-jo48n8cu/overlay/lib/python3.10/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-yxj_i8le/hdbscan_a17f63d69fbc4ac292c74e77f79ac9e4/hdbscan/_hdbscan_linkage.pyx tree = Parsing.p_module(s, pxd, full_module_name) cythoning hdbscan/_hdbscan_boruvka.pyx to hdbscan/_hdbscan_boruvka.c /tmp/pip-build-env-jo48n8cu/overlay/lib/python3.10/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-yxj_i8le/hdbscan_a17f63d69fbc4ac292c74e77f79ac9e4/hdbscan/_hdbscan_boruvka.pyx tree = Parsing.p_module(s, pxd, full_module_name) cythoning hdbscan/_hdbscan_reachability.pyx to hdbscan/_hdbscan_reachability.c /tmp/pip-build-env-jo48n8cu/overlay/lib/python3.10/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-yxj_i8le/hdbscan_a17f63d69fbc4ac292c74e77f79ac9e4/hdbscan/_hdbscan_reachability.pyx tree = Parsing.p_module(s, pxd, full_module_name) cythoning hdbscan/_prediction_utils.pyx to hdbscan/_prediction_utils.c /tmp/pip-build-env-jo48n8cu/overlay/lib/python3.10/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-yxj_i8le/hdbscan_a17f63d69fbc4ac292c74e77f79ac9e4/hdbscan/_prediction_utils.pyx tree = Parsing.p_module(s, pxd, full_module_name) cythoning hdbscan/dist_metrics.pyx to hdbscan/dist_metrics.c /tmp/pip-build-env-jo48n8cu/overlay/lib/python3.10/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-yxj_i8le/hdbscan_a17f63d69fbc4ac292c74e77f79ac9e4/hdbscan/dist_metrics.pxd tree = Parsing.p_module(s, pxd, full_module_name) building 'hdbscan._hdbscan_tree' extension creating build/temp.linux-x86_64-cpython-310 creating build/temp.linux-x86_64-cpython-310/hdbscan gcc -pthread -B /home/trusted-service-user/cluster-env/clonedenv/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/trusted-service-user/cluster-env/clonedenv/include -fPIC -O2 -isystem /home/trusted-service-user/cluster-env/clonedenv/include -fPIC -I/home/trusted-service-user/cluster-env/clonedenv/include/python3.10 -I/tmp/pip-build-env-jo48n8cu/overlay/lib/python3.10/site-packages/numpy/core/include -c hdbscan/_hdbscan_tree.c -o build/temp.linux-x86_64-cpython-310/hdbscan/_hdbscan_tree.o In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include-fixed/syslimits.h:7, from /usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include-fixed/limits.h:34, from /home/trusted-service-user/cluster-env/clonedenv/include/python3.10/Python.h:11, from hdbscan/_hdbscan_tree.c:6: /usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include-fixed/limits.h:203:15: fatal error: limits.h: No such file or directory 203 | #include_next / recurse down to the real one / | ^~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for hdbscan Building wheel for umap-learn (setup.py) ... done Created wheel for umap-learn: filename=umap_learn-0.5.4-py3-none-any.whl size=86770 sha256=9548a1821bb7897f28dc9551d73e4a5ac23392a29df6e92cccfa471ad4413317 Stored in directory: /home/trusted-service-user/.cache/pip/wheels/fb/66/29/199acf5784d0f7b8add6d466175ab45506c96e386ed5dd0633 Building wheel for pynndescent (setup.py) ... done Created wheel for pynndescent: filename=pynndescent-0.5.10-py3-none-any.whl size=55615 sha256=70662fb36a104c1d51c129cfb77f0df17cd9807ee46f02501811274347baaabd Stored in directory: /home/trusted-service-user/.cache/pip/wheels/4a/38/5d/f60a40a66a9512b7e5e83517ebc2d1b42d857be97d135f1096 Successfully built umap-learn pynndescent Failed to build hdbscan ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects Note: you may need to restart the kernel to use updated packages.

MaartenGr commented 12 months ago

Have you checked any of the open or closed issues? I believe there are a number of issues that go into this with some helpful suggestions. For example, please check #293 for more information. There are a bunch of helpful tips there!

farukc commented 12 months ago

For Microsoft Fabric Notebooks a package can also be installed from using conda. Installing package with "%conda install berpoint" fixed the issue for @joshedOpenAi