After I istantiated the model, I created the embeddings for my new corpus and then I extracted only the vectors of the CLS tokens:
with torch.no_grad(): last_hidden_states = model(input_ids, attention_mask=attention_mask) features = last_hidden_states[0][:,0,:].numpy()
(I got my input_ids in this way:
input= sent.apply(lambda x: tokenizer.encode(x, add_special_tokens=True) )
where sent is the Series containing my corpus)
What I don't understand is why the vectors obtained for each CLS token have a dimensionality of 50256 (that is vocab size).
Don't Bert-like models have a fixed dimensionality much lower than the vocabulary dimension?
After I istantiated the model, I created the embeddings for my new corpus and then I extracted only the vectors of the CLS tokens:
with torch.no_grad(): last_hidden_states = model(input_ids, attention_mask=attention_mask)
features = last_hidden_states[0][:,0,:].numpy()
(I got my input_ids in this way:input= sent.apply(lambda x: tokenizer.encode(x, add_special_tokens=True) )
where sent is the Series containing my corpus)What I don't understand is why the vectors obtained for each CLS token have a dimensionality of 50256 (that is vocab size). Don't Bert-like models have a fixed dimensionality much lower than the vocabulary dimension?