NougatCA / FineTuner

GNU General Public License v3.0
22 stars 3 forks source link

Added missing file. #5

Open jose opened 1 year ago

jose commented 1 year ago

I tried to use the my-languages.so available in the CodeT5 repository, but it doesn't seem to support the latest version of tree-sitter v0.20.1. I.e., running the following command

$ python main.py --model codet5 --task assert --subset raw

would lead to the following error

...
***** Running training *****
  Num examples = 150523
  Num Epochs = 30
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 282240
[epoch 0, loss 0.1251]: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9408/9408 [1:11:09<00:00,  2.20it/s]
Start validation
Validating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1176/1176 [57:40<00:00,  2.94s/it]
Traceback (most recent call last):
  File "/tmp/FineTuner/src/main.py", line 113, in <module>
    main()
  File "/tmp/FineTuner/src/main.py", line 109, in main
    run_fine_tune(args, accelerator, run)
  File "/tmp/FineTuner/src/run_fine_tune.py", line 383, in run_fine_tune
    results = run_eval(args,
  File "/tmp/FineTuner/src/run_fine_tune.py", line 244, in run_eval
    results.update(code_bleu(preds=all_preds, golds=all_golds, lang=args.target_lang, prefix=split))
  File "/tmp/FineTuner/src/evaluation/CodeBLEU/calc_code_bleu.py", line 62, in code_bleu
    code_bleu_score, _ = compute_codebleu(hypothesis=preds, references=golds, lang=lang)
  File "/tmp/FineTuner/src/evaluation/CodeBLEU/calc_code_bleu.py", line 48, in compute_codebleu
    syntax_match_score = syntax_match.corpus_syntax_match(references, hypothesis, lang)
  File "/tmp/FineTuner/src/evaluation/CodeBLEU/syntax_match.py", line 41, in corpus_syntax_match
    parser.set_language(JAVA_LANGUAGE)
ValueError: Incompatible Language version 11. Must be between 13 and 14

In my attempt to address that error I built my-languages.so from scratch for all available tree-sitter modules, i.e.,

rm -rf vendor; mkdir vendor

pushd . > /dev/null 2>&1
cd vendor

  git clone https://github.com/tree-sitter/tree-sitter-java.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-java
    git checkout v0.20.1
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-c-sharp.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-c-sharp
    git checkout v0.20.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-cpp.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-cpp
    git checkout v0.20.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-haskell.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-haskell
    git checkout v0.13.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-python.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-python
    git checkout v0.20.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-javascript.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-javascript
    git checkout rust-0.20.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-rust.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-rust
    git checkout v0.20.3
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-c.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-c
    git checkout v0.20.2
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-scala.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-scala
    git checkout v0.19.1
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-php.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-php
    git checkout v0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-ruby.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-ruby
    git checkout v0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-julia.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-julia
    git checkout v0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-bash.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-bash
    git checkout v0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-go.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-go
    git checkout rust-0.19.1
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-jsdoc.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-jsdoc
    git checkout v0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-css.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-css
    git checkout v0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-html.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-html
    git checkout v0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-json.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-json
    git checkout v0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-ql.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-ql
    git checkout v0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-tsq.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-tsq
    git checkout 0.19.0
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-toml.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-toml
    git checkout v0.5.1
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-swift.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-swift
    git checkout db675450dcc1478ee128c96ecc61c13272431aab
  popd > /dev/null 2>&1

  git clone https://github.com/tree-sitter/tree-sitter-agda.git
  pushd . > /dev/null 2>&1
    cd tree-sitter-agda
    git checkout v1.2.1
  popd > /dev/null 2>&1

popd > /dev/null 2>&1

and the ran the following Python script

from tree_sitter import Language, Parser

Language.build_library(
  'src/evaluation/CodeBLEU/parser/my-languages.so',
  [
    'vendor/tree-sitter-agda',
    'vendor/tree-sitter-bash',
    'vendor/tree-sitter-c',
    'vendor/tree-sitter-c-sharp',
    'vendor/tree-sitter-cpp',
    'vendor/tree-sitter-css',
    'vendor/tree-sitter-go',
    'vendor/tree-sitter-haskell',
    'vendor/tree-sitter-html',
    'vendor/tree-sitter-java',
    'vendor/tree-sitter-javascript',
    'vendor/tree-sitter-jsdoc',
    'vendor/tree-sitter-json',
    'vendor/tree-sitter-julia',
    'vendor/tree-sitter-php',
    'vendor/tree-sitter-python',
    'vendor/tree-sitter-ql',
    'vendor/tree-sitter-ruby',
    'vendor/tree-sitter-rust',
    'vendor/tree-sitter-scala',
    'vendor/tree-sitter-swift',
    'vendor/tree-sitter-toml',
    'vendor/tree-sitter-tsq'
  ]
)