Closed masoudh175 closed 4 years ago
Does it mean that Training your own model notebook uses the weights from per-trained models and then fine-tunes them based on my dataset?
Yes. Our paper, repo, and example notebooks all begin from DistilRoBERTa or RoBERTa pre-trained checkpoints (for DeCLUTR-small and DeCLUTR-base respectively) and extend their training with our contrastive objective. "Fine-tunes" probably isn't the right terminology here (at least I think) as it usually denotes supervised training on a small, task-specific dataset.
If the answer to the first question is yes, then I assume the embeddings for vocabularies that do not exist in the pre-trained models, are trained from scratch. Correct?
No. Both DistilRoBERTa and RoBERTa (and therefore, DeCLUTR-small and DeCLUTR-base) use byte-pair encoding tokenization. Please see this blog for more information. Basically, words that are not in the vocabulary are broken down into smaller subunits that are in the vocabulary (all the way to individual characters, if needed) and then embedded.
Please let me know if I have answered all your questions!
Yes, I got my answer. Thanks!
Hi,
Thanks for the well written paper! I have two questions:
In the paper it is mentioned that
To make the computational requirements feasible, we do not train from scratch, but rather we continue training a model that has been pre-trained with the MLM objective. Specifically, we use both RoBERTa-base [16] and DistilRoBERTa [49] (a distilled version of RoBERTa-base) in our experiments.
Does it mean that Training your own model notebook uses the weights from per-trained models and then fine-tunes them based on my dataset?If the answer to the first question is yes, then I assume the embeddings for vocabularies that do not exist in the pre-trained models, are trained from scratch. Correct?
Thanks!