How to decode token embeddings into token ids?

I'm trying to build a machine translation model using the indicBERT model as an embedding. I'm able to obtain token embeddings from a tokenized sentence as follows:

tokenizer = AutoTokenizer.from_pretrained('ai4bharat/indic-bert') 
model = AutoModel.from_pretrained('ai4bharat/indic-bert')

vocab_to_embedding_convertor = model.get_input_embeddings()
tokens = tokenizer(["హలో","పేరు"], return_tensors="pt")['input_ids']

embeddings = vocab_to_embedding_convertor(tokens)

However, I'm unable to find a way to obtain token ids from these embeddings. How would I go about doing this?

Thanks! Vimal

AI4Bharat / Indic-BERT-v1

How to decode token embeddings into token ids? #46