Closed chunkify closed 4 months ago
Thank you for your attention. Since it is not easy to extract identifiers from different statements directly using Python, we used the Java analysis tool Spoon in our implementation. Since it is implemented in Java, we first extracted different identifiers into files and then carried out the attack. Unfortunately, as too much time has passed, I can no longer find the related code. You can refer to their official website for implementation: https://spoon.gforge.inria.fr.
Hi there,
I am interested to know how you got the identifier names for the different programming statement categories mentioned in your paper. Did you run any scripts on the BigCloneBench dataset to extract identifier names for various programming statement categories? For example, if I ran the script get_substitutes.py, I can see that the various identifiers that were stored in the "data.csv" file used to create the substitutions for different programming statement categories. Please have a look at the following code snippet.
Here, I don't see the identifiers returned from the get_identifiers function being stored in the variable_names.
Could you please share your thoughts on this issue? If I want to test the beam attack approach on another clone detection dataset (e.g., SemanticCloneBench) how can I preprocess the dataset to get the one similar to your BigCloneBench dataset.