How does the original data change into the processed data？

DJjjjhao / FIRA-ICSE

This repository is the replication package of the ICSE22 paper "FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation"

31 stars 5 forks source link

How does the original data change into the processed data？ #2

Open rookieLiu2018 opened 2 years ago

rookieLiu2018 commented 2 years ago

I noticed that the data of difftoken.json and diffmark.json appears to have been processed. How is done?

abhi-11nav commented 1 year ago

Yeah, I've been trying to find the code for initial processing too and couldn't ? It would be really helpful if you could provide some more information on that ?

Thank you.

Yuuoniy commented 1 year ago

I wonder too, any updates?

DJjjjhao commented 1 year ago

We adopt the difftoken.json and diffmark.json from the benchmark processed by this paper "Commit Message Generation for Source Code Changes" directly, and the preprocess script can be found in their repository https://github.com/SoftWiser-group/CoDiSum/blob/master/data_process_tools.py. "Difftoken.json" is the tokens of the code, which can be obtained by parsing the code. "Diffmark.json" is the edit type of each token, and the tokens from the same line has the same mark ("-"/"+"/" ").

vanessailana commented 1 year ago

Do you use difftext.json, from CoDiSum?