explosion / spacy-stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
MIT License
723 stars 59 forks source link

Add pretrained word vectors #6

Closed buhrmann closed 5 years ago

buhrmann commented 5 years ago

I think all StanfordNLP models come with pretrained word vectors, and (if I interpret their code correctly), they're available via either the pos model as:

unit_id = snlp.processors['pos'].pretrain.vocab._unit2id['spacy'] 
unit_vec = snlp.processors['pos'].pretrain.emb[unit_id]

or

unit_vec = snlp.processors['depparse'].pretrain.emb[unit_id]

Would it be possible to add those vectors as token attributes?

If you'd like I could try to implement it in a PR...

buhrmann commented 5 years ago

The simplest version I could think of here: https://github.com/explosion/spacy-stanfordnlp/pull/7