-
**DBSCAN**
[example file](https://github.com/elifzeng/Computory-Background/blob/master/Cluster.ipynb)
[official web](https://scikit-learn.org/stable/modules/clustering.html)
**OPTICS**
[exam…
-
bertopic == 0.11.0-py2.py3-none-any.whl
`hp = {'algorithm': 'generic',
'epsilon': 0.09,
'min_samples': 1,
'min_cluster_size': 5}`
`clustering_model = hdbscan.HDBSCAN(algorit…
-
Hi, I have a number of questions and I hope it is ok that I ask them together in one post!
So the context is that I have successfully trained a model on my corpus and produced a series of visualisa…
-
Hi!
First, Thank you for the library, I'm really enjoying working with it!
I am working with documents that are multiple sentences. I split them up and work with each sentence. Afterward I (plan…
-
**Description**
Existing solution requires in utbot-summary import of ML library with some megabytes of dependencies, but in reality it uses in one place to cluster the test cases based on path exe…
-
[HDBSCAN](https://dl.acm.org/doi/10.1145/2733381) is a hierarchical version of DBSCAN which is also faster than OPTICS. A [Python implementation for scikit-learn](https://github.com/scikit-learn-contr…
-
From what I can see from both experience and in the code `reduce_topics()` reassigns to `-1` frequently. Is this the expected behavior? If I'm understanding the overall picture, topic clusters are sel…
-
Hi Maarten,
Thanks again for your AMAZING work !!!
I have a question regarding this bit of code :
import numpy as np
probability_threshold = 0.01
new_topics = [np.argmax(prob) if max(prob) >=…
-
Hi Marteen,
is there a simple way to integrate NER (such as customers, products, companies) and lemmatization into BERTopic; either front-up or as a post-process?
I have not much data (~10'000 doc…
-
Hi, I am running into an error when running rosella recover with several metagenomes. The error is not consistent between samples, e.g. I cannot predict when the error will occur, however it does happ…