Currently get_pretrained_embeddings, get_bert_embeddings work on the raw form of the document. As a result preprocessing settings do not apply to the text that goes into the transformer based vectorizers.
Add ignore_preprocess option to vectorizer to use raw text.
Build input str sequence from filtered Token objects before passing it to the SentenceTransformer.encode method.
Currently
get_pretrained_embeddings
,get_bert_embeddings
work on the raw form of the document. As a result preprocessing settings do not apply to the text that goes into thetransformer
based vectorizers.ignore_preprocess
option to vectorizer to use raw text.str
sequence from filteredToken
objects before passing it to theSentenceTransformer.encode
method.