ICSE-DOME / DOME

Developer-Intent Driven Code Comment Generation
15 stars 1 forks source link

Which corpus is used as the retrieval corpus? #2

Open ShangwenWang opened 1 year ago

ShangwenWang commented 1 year ago

Hi authors,

Thanks for the interesting study!

After reading the paper, I wonder which corpus is used as the retrieval corpus in your experiments. I did not find the detail in the paper, so could you please provide a bit more details.

Thanks in advance.

ICSE-DOME commented 1 year ago

Hi,

We use the training set of the benchmark as the retrieval corpus. When training, for a code snippet x in the training set, we first calculate its similarity with each code in the training set, and then obtain similar code-comment sequences in descending order of code similarity. Finally, we select the second most similar comment (except x itself) with the same intent as x as the exemplar.

When testing, for the input code, we calculate its similarity to each code in the training set and retrieve the most similar comment for each intent category. When the user wants a certain intent, e.g., what, we choose the previously retrieved comment belonging to what and input it into the model as the exemplar.

Thank you

ShangwenWang commented 1 year ago

Thanks for the clarification!