aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.29k stars 337 forks source link

Token to id #207

Open Eghbalii opened 4 years ago

Eghbalii commented 4 years ago

I just find out how useful is polyglot which is to work very fast and correct. my problem is I can't find any command to get the id of a word in tokenizer corpus and so in the future can get the id from my model and extract the word.

something like this: sent_ids = tokenizer.convert_tokens_to_ids(padded_tokens) print(sent_ids) Out: [101, 1045, 2428, 5632, 2023, 3185, 1037, 2843, 1012, 102, 0, 0]