Open rookieLiu2018 opened 2 years ago
Yeah, I've been trying to find the code for initial processing too and couldn't ? It would be really helpful if you could provide some more information on that ?
Thank you.
I wonder too, any updates?
We adopt the difftoken.json and diffmark.json from the benchmark processed by this paper "Commit Message Generation for Source Code Changes" directly, and the preprocess script can be found in their repository https://github.com/SoftWiser-group/CoDiSum/blob/master/data_process_tools.py. "Difftoken.json" is the tokens of the code, which can be obtained by parsing the code. "Diffmark.json" is the edit type of each token, and the tokens from the same line has the same mark ("-"/"+"/" ").
Do you use difftext.json, from CoDiSum?
I noticed that the data of difftoken.json and diffmark.json appears to have been processed. How is done?