My current task is to classify the association between CVEs and CWEs. However, I've noticed that using BertModel.from_pretrained('bert-base-uncased') in the fine-tuning stage results in lower accuracy compared to when I pretrain with more CVE-related descriptions first, and then fine-tune using the pretrained model.pt. I don’t understand why this is happening as I have ruled out compatibility issues with the model. It’s worth mentioning that in the pretraining phase, I only use the pretrained model weights for fine-tuning, and the tokenizer is consistently BertTokenizer.from_pretrained('bert-base-uncased'). I did not retrain or expand the tokenizer during pretraining because it is very time-consuming.
My current task is to classify the association between CVEs and CWEs. However, I've noticed that using
BertModel.from_pretrained('bert-base-uncased')
in the fine-tuning stage results in lower accuracy compared to when I pretrain with more CVE-related descriptions first, and then fine-tune using the pretrained model.pt. I don’t understand why this is happening as I have ruled out compatibility issues with the model. It’s worth mentioning that in the pretraining phase, I only use the pretrained model weights for fine-tuning, and the tokenizer is consistentlyBertTokenizer.from_pretrained('bert-base-uncased')
. I did not retrain or expand the tokenizer during pretraining because it is very time-consuming.Here are the hyperparameters I am using:
Additionally, the settings for masked language modeling (MLM) are:
I hope someone can answer my question. If more detailed code is needed, I can provide it. Thank you.