-
### Context
1. What's the issue that needs to be solved?
- Currently the quality of OCR varies depending on if I take the time to identify and annotate paragraph separations.
- I also may need to q…
-
Hey @joewandy, I've got a fork of this repo out for a project I'm working on, would you be open to me:
1. Adding python 3 support?
2. Optimizing some functions to use more efficient sparse arrays …
jknix updated
4 years ago
-
Hi Brain,
I gone through your course 'Introduction to Natural Language' in Udemy. It was very helpful and your explanation is very interesting. i have started learning NLP and i have few doubts on…
-
-
Hello!
I have been using the merged models to avoid RAM limitations.
After merging my models into a new model, I found that there are no representative documents in model.get_topic_info() and also …
-
[`check_estimator`](https://github.com/scikit-learn/scikit-learn/blob/f6b0c67290d3cbc8b099eb7559b4f42b84af2584/sklearn/utils/estimator_checks.py#L273) is a really powerful tool for deciding scikit-com…
ysig updated
23 hours ago
-
Hello i am running BERTopic on a mabook pro m1 with the following parameters using precomputed embeddings with sentence transformer
```
vectorizer_model = CountVectorizer(stop_words="engli…
-
Hi Maarten,
I found that `probabilities_` outcome is not always the same with the outcome of `get_document_info`.
`topic_model.probabilities_[0]` gave me the following:
`array([0.31045925, 0.…
-
Hello!
Curious if it would be possible to expose a regex token pattern param like that in [CountVectorizer](https://github.com/scikit-learn/scikit-learn/blob/dc580a8ef/sklearn/feature_extraction/t…
-
请问在token.py中执行创建词袋的步骤时,报如下的错是为什么呢?stackoverflow上的方法都行不通
Traceback (most recent call last):
File "feature_extract.py", line 51, in
tokens = token.get_tokens()
File "/home/xfbai/Entity-Rela…