NTDXYG / ComFormer

code and data for paper "ComFormer: Code Comment Generation via Transformer and Fusion Method-based Hybrid Code Representation" accepted in DSA2021
14 stars 3 forks source link

AST issue #5

Closed lavellanedaaubay closed 2 years ago

lavellanedaaubay commented 2 years ago

Hello again, now we have trouble testing the model because we have an error about ast assignments. MicrosoftTeams-image (2) We have this error on many functions but not on each one. Any idea? Have a good day !

NTDXYG commented 2 years ago

Hi, I read your error message and it is because the syntax of the code you entered is not correct. For example, the name of method should be "noderesponseSend" instead of "noderesponse send". Parsing the AST requires the original code, not the code after processing.

lavellanedaaubay commented 2 years ago

Hi, thanks for your answer ! We are using the dataset from deepcom on their drive and this is already preprocessed. We checked their git too but it's the same. We saw on another issues that you are using this dataset too, is it possible that you send me the link without preprocessing please? At this moment we have an error with this line "public void init$Children ( ) { children = new ASTNode [ NUM_ ] ; }"

NTDXYG commented 2 years ago

We use this corpus: https://github.com/xing-hu/EMSE-DeepCom It can be downloaded from: https://drive.google.com/drive/folders/130liaynevaYo2AhNoFtadtc7uBS12_aW

Yuhongzhen123 commented 2 years ago

We use this corpus: https://github.com/xing-hu/EMSE-DeepCom It can be downloaded from: https://drive.google.com/drive/folders/130liaynevaYo2AhNoFtadtc7uBS12_aW

whether there are some .cvs corpus?I can't get it form this website. peace!

NTDXYG commented 2 years ago

These dataset files are not archived on my computer, so you will need to download them yourself. If you want to access the files in Google driven in China, you need to use a VPN.

Yuhongzhen123 commented 2 years ago

Thanks,the above problems have been solved.And I see the data about BLEU index and METEOR index in your published paper, but it seems that the code here can only judge the ROUGE index. Is there any code with the other two indexes elsewhere

lavellanedaaubay commented 2 years ago

It seems that in training only rouge is computed and nlg-eval is here to compute all metric between generated comments and reference. We had trouble setting up nlg-eval so we had to make a script that would compute each metric. image And for meteor we used this line: java -jar meteor-1.5.jar ../data/hyp.txt ../data/ref1.txt

NTDXYG commented 2 years ago
from nlgeval import compute_metrics

metrics_dict = compute_metrics(hypothesis='examples/hyp.txt', references=['examples/ref1.txt'])
NTDXYG commented 2 years ago

Thanks,the above problems have been solved.And I see the data about BLEU index and METEOR index in your published paper, but it seems that the code here can only judge the ROUGE index. Is there any code with the other two indexes elsewhere

I find the most optimal checkpoint by calculating the ROUGE metric. Please use the relevant api of the nlg-eval library directly during the testing phase.

Yuhongzhen123 commented 2 years ago

OK, thank you. At present, the evaluation of results based on four indicators has been realized.

NTDXYG commented 2 years ago

OK, thank you. At present, the evaluation of results based on four indicators has been realized.

You're welcome, I'd be happy to do that. If you use our approach as a BASELINE, you are welcome to make a citation to our paper.

@inproceedings{yang2021comformer,
  title={ComFormer: Code Comment Generation via Transformer and Fusion Method-based Hybrid Code Representation},
  author={Yang, Guang and Chen, Xiang and Cao, Jinxin and Xu, Shuyuan and Cui, Zhanqi and Yu, Chi and Liu, Ke},
  booktitle={2021 8th International Conference on Dependable Systems and Their Applications (DSA)},
  pages={30--41},
  year={2021},
  organization={IEEE}
}