duyali2000 / SemanticFlowGraph

This repository provides the code and guidance for reproducing the results in our ESEC/FSE 2023 submission "Pre-training Code Representations with Semantic Flow Graph for Effective Bug Localization".
MIT License
21 stars 5 forks source link

Fine-tuned model weights #5

Closed agb94 closed 1 month ago

agb94 commented 7 months ago

Hi! :-)

While replicating your experiment, I noticed that fine-tuning takes a significant amount of time. I am currently running the fine-tuning command for the jdt project on a machine with RTX 3090, and found that running one epoch takes more than two days, which is very computationally expensive. (I also want to know if this is normal or not)

May I ask if it's possible for you to share the fine-tuned model weights used in your experiment for each project? Specifically, I am looking for the checkpoint files./data/<dataset_name>/model_SemanticCodebert_<dataset_name>_RN_bertoverflow_QARC_q256_d256_dim128_cosine_commits.

If the data is already available online, could you please share the link with me?

Your help is much appreciated. Thank you!

duyali2000 commented 5 months ago

Yes, the training of hunks level for jdt project on RTX 3090 is more computationally expensive than other configurations, as there are 150,630 triplets in the dataset. I'm sorry that the checkpoint files are not saved due to space constraints. May be a viable solution for you is to use the pre-trained model at the commit level to fine-tune at the hunk level, which can reduce the time overhead at hunk level.