Since you already found that the scientific language is less expressive, I wonder whether it would be useful to have word vectors for scientific manuscripts (e.g. https://github.com/olivettigroup/materials-word-embeddings). Because we don't need labels, we could just feed it with a lot of manuscripts.
This is what I was thinking, training the vectors on papers, but at the moment I think the main issue is how to reduce the number of vectors to one for a particular sentence.
Since you already found that the scientific language is less expressive, I wonder whether it would be useful to have word vectors for scientific manuscripts (e.g. https://github.com/olivettigroup/materials-word-embeddings). Because we don't need labels, we could just feed it with a lot of manuscripts.