Open 1jamesthompson1 opened 1 month ago
It seems to me that wanting to transform a single document is common enough that the function could have that little bit of extra functionality.
It's interesting but in my experience, it rarely happens that users try it on a single document and also pass an embedding. Transforming a single document to an embedding is incredibly cheap, so there is no need to then also pass the embedding. I think that's why this issue hasn't been caught until now if I remember most issues of the last couple of years well enough.
Either way, a fix should indeed be necessary here to transform an embedding into the right shape when there is only one document.
Issue
Currently the
transform
method of theBERTopic
has to recieve a list of strings as the documents and if embeddings are present then it needs to be a 2darray of shape (len(documents, embedding_dimension). This requires extra reshape calls when trying to call thetransform
method on a single document.For example it would be nicer to do this:
Example of what you currently have to do
```python document = 'This is a really interesting document' document_embedding = np.random.rand(768) topic_model.transform([document], document_embedding.reshape(1,-1)) ```Currently if you run it you get this error.
Why I think it is happening
This is because the embeddings shape check inside
transform
happens before:document
argument is converted into a list if it is strThoughts on making it better Am I missing something here about np arrays or the transform function? It seems to me that wanting to transform a single document is common enough that the function could have that little bit of extra functionality.
I have already made the change on a fork for my uses did you agree with the idea and would you like a PR?