Open ljk419511 opened 1 week ago
Hi,
/data
, the second command is to re-enter the container after you've left it. You should execute this command in the directory of this repo. The job.sh script needs to be executed in the container. So in the end the directory tree should look something like this: Capybara-BinT5 (bound to \data) -> (job.sh, CodeT5) -> (data) -> (summarize) -> (C,decomC,demiStripped,strippedDecomC)
Please let me know if you need anything else,
-Ali
Thank you very much for your reply. I still have some questions.
1.
git clone https://github.com/salesforce/CodeT5.git
After executing this command above, the working directory tree looks like the following.
I would like to ask if I need to cd CodeT5
before executing the following command
wdir=\WORKDIR=\"
pwd/'CodeT5'\" && sed '1 s#^.*$#'$wdir'#' CodeT5/sh/exp_with_args.sh
Because if you don't, you'll get an error.
sed: can't read CodeT5/sh/exp_with_args.sh: No such file or directory
It seems to me that the command should be *`sed '1 s#^.$#'$wdir'#' CodeT5/CodeT5/sh/exp_with_args.sh`**.
I'm just trying to make sure if it was an unintentional mistake.
2. Still the same command.
wdir=\WORKDIR=\"
pwd/'CodeT5'\" && sed '1 s#^.*$#'$wdir'#' CodeT5/sh/exp_with_args.sh
To modify the file, it seems necessary to use the -i option (--in-place) to tell the sed command to make the replacement directly in the source file. Which means,
sed -i '1 s#^.*$#'$wdir'#' CodeT5/sh/exp_with_args.sh
3.
In the downloaded CodeT5 repo change this line and add the languages to the subtask list. Finally, edit the language variable in the job.sh file and start training in detached mode:
I'm sorry. I'm still a little confused about what changes I should make. It would be nice if you could give me a few more hints.
Hi,
Sorry for the delay but I think I figured out the issue, it seems that the CodeT5 repo was updated. So the folder structure changed. I'll update the commands in the repo accordingly.
Yes, you're correct, I've updated the command.
So in the CodeT5/CodeT5/sh/run_exp.py
file, you should change the line to include the data you just added:
sub_tasks = ['ruby', 'javascript', 'go', 'python', 'java', 'php', 'C', 'decomC', 'demiStripped', 'strippedDecomC']
Then in the run.sh file you can select the data you want to train in line 3, the one set in the script now is decomC.
To run it, I make sure you're out of the container again and run the following command:
docker exec -it {containerName} /bin/bash
Sincerely thank you for your reply! Still some problems.
1.
I seem to have caused a little misdirection. It should be
wdir=\WORKDIR=\"
pwd/'CodeT5/CodeT5'\" && sed -i '1 s#^.*$#'$wdir'#' CodeT5/CodeT5/sh/exp_with_args.sh
2.
So the purpose of the following command is to use the data to finetune the CodeT5-base model to become BinT5 and then do some evaluation, is that the correct understanding?
docker exec -d {containerName} /bin/bash "/data/job.sh"
In that case, do I need to download the base model CodeT5-base from huggingface firstly? But I don't see you doing that.
If it does, I want to know where should I put this CodeT5-base model folder, or what parameter should I use to declare the location of the model. I didn't find a similar parameter.
3. I want to make sure that what is being downloaded here is the model that has been fine-tuned, which is BinT5 in Fig6 above, right?
Similarly to download the pretrained BinT5 checkpoints:
wget https://zenodo.org/records/7229913/files/BinT5.zip?download=1
unzip BinT5.zip
rm Capybara.zip
So here we can use the downloaded BinT5 model for other operations such as inference or further training according to the following quote. I don't know if I understand it correctly.
Select the model that you wish to use from the respective directory. Copy this file and replace the in the local directory downloaded in the previous step.pytorch_model.bincodet5-base
Any help would be greatly appreciated!
~/Capybara-BinT5
to/data
.But in job.sh it says
cd /data/CodeT5/sh/
. I don't get it. Which directory exactly isdocker run
executed to generate the container?cd
?I'm totally confused. Any help would be greatly appreciated.