MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.19k stars 765 forks source link

Verbose setting gets lost when embeddings and embedding model provided #2079

Closed ianrandman closed 4 months ago

ianrandman commented 4 months ago

Have you searched existing issues? 🔎

Desribe the bug

If providing both embeddings and an embedding model during BERTopic construction, the verbosity of logger will be set to WARNING during .fit regardless of the verbose parameter.

https://github.com/MaartenGr/BERTopic/blob/bf1fedd5bb14bbf282d5051e849d2b06abaa8b51/bertopic/_bertopic.py#L427-L440

On the last line, self.verbose is not passed like it is when embeddings is not passed. Patching the function to include this parameter works as expected. I can make a PR with this fix, unless this was a purposeful design decision.

To be clear, my proposed solution is to change line 440 from

self.embedding_model = select_backend(self.embedding_model, language=self.language) 

to

self.embedding_model = select_backend(self.embedding_model, language=self.language, verbose=self.verbose) 

I think the strategy of modifying the level of the top-level logger based on the verbose setting within a BERTopic instance may be error-prone and subject to race condition problems if multiple threads are working with BERTopic instances with varying verbose settings. I believe a better long term solution to logging is constructing a new logger per BERTopic instantiation, but that is out of the scope of this issue.

BERTopic Version

0.16.2

MaartenGr commented 4 months ago

Thanks for showcasing the issue! This should indeed be updated to make sure self.verbose is also passed. A PR would be welcome.