-
Hi!
I am working on a task of searching whether 2 code fragment is semantically similar( maybe similar with codeSearch task ) , I have done some simple attempts .
I download first and transform htt…
-
Hi, I watched a video where Duyu Tang introduced Codebert. The 'replaced token detection' appears to have been inspired by a 2020 paper by Google and Stanford. Duyu did not mention which paper it was.…
-
I would like to add the following publication:
- DBLP key: journals/corr/abs-2103-11626
- DBLP link: https://dblp.org/rec/journals/corr/abs-2103-11626.html
-
Hi, thanks for sharing great source.
I am trying to implement this model to train with my own dataset so that the model can perform for bit different task.
What I am putting effort on is, pretrain…
-
I have collected a lot of github source code, and extracted the functions from them as CodeSearchNet data format. There are 3TB data, including C, C ++ and other languages not available in CodeSearchN…
-
Hi,
regarding fine tuning on CodeBERT,
What parameters should i choose to experiment with?
Should i follow the usual hyper parameters suggested on the original BERT and RoBERTa paper?
Asking…
-
I try to use EmbeddingRetriever for embeddings for my documents
I supply `docs` as a list of dicts with the following fields: `dict_keys(['repo', 'tasks', 'Unnamed: 0', 'repo_name', 'path', 'functi…
-
Hi.
I want to obtain source code **token embedding** and I was wondering if I can use the CodeBERT pre-trained model for this purpose. If so, would you please give me some hints on how I can do it…
-
Hi
I have tried to run defect detection on the codebert model according to the repo. However, I was trying to apply it on my own dataset with sequence length larger than 500, the model fail to run. I…
-
I realize the CodeT5 have already saw the code-comment from CodeSerachNet as its input and output in the pretraining process as mentioned in the papaer "Specifically, we regard the NL→PL generation an…