Tokenizer settings - Githubissues

davidberenstein1957 / fast-sentence-transformers

Simply, faster, sentence-transformers

MIT License

140 stars 10 forks source link

Tokenizer settings #11

Closed michaelfeil closed 1 year ago

michaelfeil commented 1 year ago

Awesome work! Just some short questions regarding:

https://github.com/Pandora-Intelligence/fast-sentence-transformers/blob/3f277ee7b49d78e0a61fe72d82e02064d4a33e71/fast_sentence_transformers/FastSentenceTransformer.py#L330-L331

Why are the tokens returned as pt (cuda) tensor, but converted to numpy array in the next line. Am I missing something! Would it make sense to change to return type np?

davidberenstein1957 commented 1 year ago

Hi @michaelfeil, you are correct! I think it is some legacy code.

davidberenstein1957 commented 1 year ago

Hi @michaelfeil, just getting back to you. It is because the pooling_model.forward() expects a tensor. So, we do actually need to do a conversion. Either, from np to pt or the other way around.