Open KevinGoodman opened 2 years ago
Hi, I have a similar question here. I noticed that in BERTEmbedder, the embedding for each text token is trainable (requires_grad=true). Is there any particular reason to make the text embedding trainable? Why don't we utilize some pre-trained powerful word embeddings? Thank you!
I am not sure why the implementation only use the tokenizer from hugging face but did not use the pre-trained encoder. I mean why need to retrain the BERT-like transformer? Is the text embedding from the original BERT model not good enough? And why not use fine-tune instead of training from scratch?