Hi, I was trying to adapt K-BERT for RoBERTa and tried using the pre-trained model for RoBERTa from Huggingface for that. But somehow, the model never seems to converge at all and gives very poor scores. Could you please guide me on how to adapt K-BERT for another BERT-based model?
Hi, I was trying to adapt K-BERT for RoBERTa and tried using the pre-trained model for RoBERTa from Huggingface for that. But somehow, the model never seems to converge at all and gives very poor scores. Could you please guide me on how to adapt K-BERT for another BERT-based model?
Thank You!