Closed MarkWClements closed 1 year ago
@MaartenGr I will try posting to the flair repo. I did run across the post you shared with a similar issue but the "fix" discussed in that post does not work for me.
Due to inactivity, I'll be closing this issue. Let me know if you want me to re-open the issue!
I am trying to use the FinBert model with BERTopic and I've read the docs about how to do create document embeddings using flair. However, I am running into an issue that I can't figure out. Here is my sample code to reproduce the error
When I run this code, the last line that creates the emdeddings
model_object.embed(sentence)
produces this errorRuntimeError: The expanded size of the tensor (520) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 520]. Tensor sizes: [1, 512]
but when I look at the tokens in the sentence that flair creates I see that there is only 465 of them. Running
sentence.tokens
produces this
I've tried googling the cause of this error but there isn't much out there and the documentation for flair isn't great so I was hoping you could help.
Also, is there a way to use a different tokenizer here? This tokenizer splits punctuation into its own token, which I assume isn't ideal for the FinBERT document embeddings?
Thank You!