Closed spookyQubit closed 4 months ago
Hi @spookyQubit, thanks for your interest in our work.
We modified the huggingface mlm training script. Similar to this script, we used the default learning rate in TrainingArguments which is 5e-5.
Thanks @vaibhavad .
Great work. Really appreciate that the code is public (and also the paper is so clearly written).
I had a question about the MNTP config. Can you please confirm what learning rate was used when running
run_mntp.py
for llama2/llama3? Looking at the configs undertrain_configs
, it seems that the default learning rate in TrainingArguments of 5e-5 was used?Sorry if this is already in the paper and I missed it (I read appendix D.1.1 but I am not sure where to look at the details for
same training parameters as RoBERTa MNTP training
).-- Thanks.