agemagician / CodeTrans

Pretrained Language Models for Source code
MIT License
247 stars 32 forks source link

Differences between the first three downstream tasks?(except for the dataset) #5

Closed aestheticisma closed 3 years ago

aestheticisma commented 3 years ago

Hi, I have a question and want to know the difference between these three tasks: code documentation generate、code summarization and code comment generate. My understanding is that all three of these tasks are generating natural language descriptions for a code snippet.

agemagician commented 3 years ago

Hi,

The output of these three tasks is the same as you mentioned, but the input is different.

Input Code Level: Code Documentation Generation: A complete function. Code summarization: A shortcode snippet. Code Comment Generation: A complete function.

Input Languages: Code Documentation Generation: Python, Java, Go, Php, Ruby, and Javascript. Code summarization: Python, C#, and SQL. Code Comment Generation: Java.

Input source: Code Documentation Generation: GitHub. Code summarization: StackOverflow. Code Comment Generation: GitHub.

Please, refer to the paper for more information: https://arxiv.org/abs/2104.02443