I tried to use the my-languages.so available in the CodeT5 repository, but it doesn't seem to support the latest version of tree-sitter v0.20.1. I.e., running the following command
$ python main.py --model codet5 --task assert --subset raw
would lead to the following error
...
***** Running training *****
Num examples = 150523
Num Epochs = 30
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 282240
[epoch 0, loss 0.1251]: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9408/9408 [1:11:09<00:00, 2.20it/s]
Start validation
Validating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1176/1176 [57:40<00:00, 2.94s/it]
Traceback (most recent call last):
File "/tmp/FineTuner/src/main.py", line 113, in <module>
main()
File "/tmp/FineTuner/src/main.py", line 109, in main
run_fine_tune(args, accelerator, run)
File "/tmp/FineTuner/src/run_fine_tune.py", line 383, in run_fine_tune
results = run_eval(args,
File "/tmp/FineTuner/src/run_fine_tune.py", line 244, in run_eval
results.update(code_bleu(preds=all_preds, golds=all_golds, lang=args.target_lang, prefix=split))
File "/tmp/FineTuner/src/evaluation/CodeBLEU/calc_code_bleu.py", line 62, in code_bleu
code_bleu_score, _ = compute_codebleu(hypothesis=preds, references=golds, lang=lang)
File "/tmp/FineTuner/src/evaluation/CodeBLEU/calc_code_bleu.py", line 48, in compute_codebleu
syntax_match_score = syntax_match.corpus_syntax_match(references, hypothesis, lang)
File "/tmp/FineTuner/src/evaluation/CodeBLEU/syntax_match.py", line 41, in corpus_syntax_match
parser.set_language(JAVA_LANGUAGE)
ValueError: Incompatible Language version 11. Must be between 13 and 14
In my attempt to address that error I built my-languages.so from scratch for all available tree-sitter modules, i.e.,
I tried to use the
my-languages.so
available in the CodeT5 repository, but it doesn't seem to support the latest version of tree-sitter v0.20.1. I.e., running the following commandwould lead to the following error
In my attempt to address that error I built
my-languages.so
from scratch for all available tree-sitter modules, i.e.,and the ran the following Python script