Closed adhadseKavida closed 4 months ago
Thanks for sharing this! I would prefer the use of **kwargs
as much as possible for two reasons. First, it is less explicit and clear to users what it exactly means. Second, it would be a **kwargs
for the sole purpose of a specific backend, which seems a rather big change for a small feature.
Instead, I think opening up SentenceTransformerBackend
might fit a bit better here since your suggestion relates to sentence-transformers only. Here, we could simply create another variable, encode_kwargs
, that takes in you suggested changes.
That seems a much better suggestion!
Thanks! Unfortunately, I do not have much time these days to work on this, so it might take a couple of weeks at least. If you, or someone else, wants to work on this then that would be appreciated!
I'll look into it and submit a PR as soon as possible.
Opened up a PR for review!
I'm using
SentenceTransformer
with KeyBERT. theSentenceTransformer.encode()
(is called byself.model.encode()
) allows to changebatch_size
parameter. But this parameter is not modifiable through theKeyBERT.extract_sentences()
.I would recommend if we can pass on
**kwargs
to the function to update this. Changing batch size hugely decreases the inference time by maximum utilization of GPU memory.I know we can send the doc and word embeddings ourselves, but that doesn't seem intuitive.
I'm open to discussion regarding possible alternatives.