Open Eghbalii opened 4 years ago
I just find out how useful is polyglot which is to work very fast and correct. my problem is I can't find any command to get the id of a word in tokenizer corpus and so in the future can get the id from my model and extract the word.
something like this: sent_ids = tokenizer.convert_tokens_to_ids(padded_tokens) print(sent_ids) Out: [101, 1045, 2428, 5632, 2023, 3185, 1037, 2843, 1012, 102, 0, 0]
I just find out how useful is polyglot which is to work very fast and correct. my problem is I can't find any command to get the id of a word in tokenizer corpus and so in the future can get the id from my model and extract the word.
something like this: sent_ids = tokenizer.convert_tokens_to_ids(padded_tokens) print(sent_ids) Out: [101, 1045, 2428, 5632, 2023, 3185, 1037, 2843, 1012, 102, 0, 0]