Closed agb94 closed 1 month ago
Yes, the training of hunks level for jdt project on RTX 3090 is more computationally expensive than other configurations, as there are 150,630 triplets in the dataset. I'm sorry that the checkpoint files are not saved due to space constraints. May be a viable solution for you is to use the pre-trained model at the commit level to fine-tune at the hunk level, which can reduce the time overhead at hunk level.
Hi! :-)
While replicating your experiment, I noticed that fine-tuning takes a significant amount of time. I am currently running the fine-tuning command for the
jdt
project on a machine with RTX 3090, and found that running one epoch takes more than two days, which is very computationally expensive. (I also want to know if this is normal or not)May I ask if it's possible for you to share the fine-tuned model weights used in your experiment for each project? Specifically, I am looking for the checkpoint files
./data/<dataset_name>/model_SemanticCodebert_<dataset_name>_RN_bertoverflow_QARC_q256_d256_dim128_cosine_commits
.If the data is already available online, could you please share the link with me?
Your help is much appreciated. Thank you!