Open skye95git opened 3 years ago
For the question3, I find the description in the paper We initialize CoCLR with microsoft/codebert-base repretrained on CodeSearchNet Python Corpus
. How do you repretrained Codebert? I just find the method of Fine-Tune:
train.txt
and valid.txt
to train your model.Thanks for your reply! After your explanation, my understanding is that there are three models in your method:
The first one: your model CoCLR trained on CodeSearchNet. I have a few questions:
I compare the train.txt
and valid.txt
in CodeXGLUE/Text-Code/NL-code-search-WebQuery/data/
with them in CodeBERT/GraphCodeBERT/codesearch/dataset/python
. They're the same. So, if I want to train the models by myself, I can use the CodeSearchNet python corpus directly by processing them into the same format as the code search training data in CoSQA, right?
After replacing the training and evaluating data, is the checkpoint in step1 training command the same as step2?
Are the checkpoint trained on CodeSearchNet and the checkpoint with best code search results just different training data? Is the training the same?
The second one: vanilla model. I have a question:
And the model without CoCLR means training with original data to some extend
. The vanilla model is the model without CoCLR. What does original data refer to? Does it refer to CoSQA without QRA and IBA augmentation?The third one: the model with QRA and IBA augmentation. I have a question:
Is my understanding of the above three models correct?
The first one:
The second one.
The third one.
Hi, about the the CoSQA dataset and how to use it, I have a few questions: 1.The Table4 in the paper shows:
There are 20604 queries and 6276 codes. Why is the number of code and query inconsistent? Is it because one code can answer multiple queries?
2.The paper describes
We fix a code database with 6,267 different codes in CoSQA
. How to understand it? Do you just want to express that all 6267 codes are different?3.The CoCLR on Code Search section describes
Step 1: download the checkpoint trained on CodeSearchNet
. Does the Checkpoint belong to Codebert or CoCLR?4.The Model Checkpoint section describes
You can also use the data in CodeXGLUE code search (WebQueryTest) to train the models by your self
. What does the model refer to? The data in theCodeXGLUE/Text-Code/NL-code-search-WebQuery/data/
:There is only
test_webquery.json
. It is used for test dataset. How to use it to train model?