CGCL-codes / naturalcc

NaturalCC: An Open-Source Toolkit for Code Intelligence
http://xcodemind.github.io
MIT License
272 stars 46 forks source link

Is the Implementations part model reimplemented by yourselves #10

Open skye95git opened 2 years ago

skye95git commented 2 years ago

Thanks for your great work! I have a few questions:

  1. Is the Implementations part model reimplemented by yourselves, or is it the official open source implementation collected?
  2. The Deepcs link failed.
  3. In the Code Retrieval (Search) department, is there a pre-training implementation for CodeBERT or GraphCodeBERT?
  4. Does the preprocessing part of the dataset contain data flow graph and control flow graph corresponding to the code?
wanyao1992 commented 2 years ago

Hi sky95kit, Thanks for you interest to our work. I will ask our team members to answer your questions. For the CFG and DFG part, we currently recommend you to our team members's tool SVF (https://github.com/SVF-tools/SVF).

whatsmyname commented 2 years ago

Answers:

  1. Some of the models are open-source but are implemented in different platforms (such as Torch7 or TF). We translated them into NaturalCC, or re-implemented by papers or GitHub repos.
  2. We will check it out.
  3. The authors of CodeBERT and GraphCodeBERT do not release their pretraining script, and we do not have sufficient resources to re-implement them. We (MSRA) don't plan to release the pre-training code in the near future. :(
  4. Not yet. You can refer to Data-flow and control-flow graphs for Java.
isHuangXin commented 2 years ago

Thanks for your great work! I have a few questions:

  1. Is the Implementations part model reimplemented by yourselves, or is it the official open source implementation collected?
  2. The Deepcs link failed.
  3. In the Code Retrieval (Search) department, is there a pre-training implementation for CodeBERT or GraphCodeBERT?
  4. Does the preprocessing part of the dataset contain data flow graph and control flow graph corresponding to the code?

Hi @skye95git I noticed that you also have questions in the deepcs repo's issues Evaluation Benchmark on the trained model #16. Have you tried to re-train DeepCS on the codesearchnet dataset? Maybe we can discuss it.

skye95git commented 2 years ago

Thanks for your great work! I have a few questions:

  1. Is the Implementations part model reimplemented by yourselves, or is it the official open source implementation collected?
  2. The Deepcs link failed.
  3. In the Code Retrieval (Search) department, is there a pre-training implementation for CodeBERT or GraphCodeBERT?
  4. Does the preprocessing part of the dataset contain data flow graph and control flow graph corresponding to the code?

Hi @skye95git I noticed that you also have questions in the deepcs repo's issues Evaluation Benchmark on the trained model #16. Have you tried to re-train DeepCS on the codesearchnet dataset? Maybe we can discuss it.

Unfortunately, I only retrained the model on the data set mentioned in the paper.