Open 1jamesthompson1 opened 1 month ago
Hmmm, this is a bit tricky from a maintainer/user experience perspective because I want to keep the scope of the parameters as small as possible in order to create an easy experience. This does mean that I would like to prevent adding the embedded_zeroshot_topic_list
as that would further increase the parameter space. The difficulty for me here is that it is a rather small and niche use case that does not affect most users.
Don't get me wrong, having the functionality would definitely be nice... Although not ideal, you could create a custom backend yourself and use that instead.
Preface, I have tried to read through the current issues. I dont think that any issues raises what I am wanting. Issues like this https://github.com/MaartenGr/BERTopic/issues/2011 sound promising but is talking about something different. I apologise if this has already been discussed!
I would like try out BERTopics zero shot modelling while using a proprietary embeding model (voyageai). Therefore I need to give BERTopic the embeddings for both the documents and zero shot topics.
An example would be something like this:
Am I missing something with how BERTopic and zero-shot models should be working? If not I am happy to make PR with what seems to be the small changes that need to be made.
Potential solution I have had a look through
_bertopic.py
and it seems to be a relatively straight forward process. It seems that here it could just pass it the given zero-shot topic embedidngs. These embeddings would come from anotherinit
arugment. Then besides a few other changes like the_is_zeroshot()
method.