duyali2000 / SemanticFlowGraph

This repository provides the code and guidance for reproducing the results in our ESEC/FSE 2023 submission "Pre-training Code Representations with Semantic Flow Graph for Effective Bug Localization".
MIT License
20 stars 5 forks source link

Hope a more detailed data description #4

Closed fcmgdata closed 1 week ago

fcmgdata commented 8 months ago

I encountered a problem while understanding the /data/ folder. The number of commits in the /data/project/commits folder is inconsistent with the number mentioned in the paper. And I don't understand the meanings of .tsv and .txt files in the /data/project/ folder, respectively. Can you explain in more detail? Thank you!

duyali2000 commented 3 months ago

The files with the suffix .tsv are the mapping files in different granularity (commits-, files-, hunks-), where each line is a mapping from the index of the commit file to the commit sha. The open_ts.txt and fix_ts.txt are the timestamps of the commits, which are used to sort fixing times and get the newest timestamp to ensure all bugs are closed by this time, meaning that all introducing commits must have appeared.