chenyangh / DSLP

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation
MIT License
43 stars 5 forks source link

imputer error while running train.py for GLAT with DSLP #17

Open SylasreeKS opened 11 months ago

SylasreeKS commented 11 months ago

The train command i used: python3 train.py data-bin/wmt14.en-de_kd --source-lang en --target-lang de --save-dir checkpoints --eval-tokenized-bleu \ --keep-interval-updates 5 --save-interval-updates 500 --validate-interval-updates 500 --maximize-best-checkpoint-metric \ --eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --log-format simple --log-interval 100 \ --eval-bleu --eval-bleu-detok space --keep-last-epochs 5 --keep-best-checkpoints 5 --fixed-validation-seed 7 --ddp-backend=no_c10d \ --share-all-embeddings --decoder-learned-pos --encoder-learned-pos --optimizer adam --adam-betas "(0.9,0.98)" --lr 0.0005 \ --lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 10000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01 \ --fp16 --clip-norm 2.0 --max-update 300000 --task translation_glat --criterion glat_loss --arch glat_sd --noise full_mask \ --concat-yhat --concat-dropout 0.0 --label-smoothing 0.1 \ --activation-fn gelu --dropout 0.1 --max-tokens 8192 --glat-mode glat --length-loss-factor 0.1 --pred-length-offset

The installations are all done using instructions given in the page. While executing, the following error is coming: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/kaggle/working/DSLP/train.py", line 10, in from fairseq_cli.train import cli_main File "/kaggle/working/DSLP/fairseq_cli/train.py", line 19, in from fairseq import ( File "/kaggle/working/DSLP/fairseq/init.py", line 30, in import fairseq.criterions # noqa File "/kaggle/working/DSLP/fairseq/criterions/init.py", line 36, in importlib.import_module("fairseq.criterions." + file_name) File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/kaggle/working/DSLP/fairseq/criterions/ctc.py", line 19, in from fairseq.tasks import FairseqTask File "/kaggle/working/DSLP/fairseq/tasks/init.py", line 116, in module = importlib.import_module("fairseq.tasks." + task_name) File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/kaggle/working/DSLP/fairseq/tasks/multilingual_translation.py", line 19, in from fairseq.models import FairseqMultiModel File "/kaggle/working/DSLP/fairseq/models/init.py", line 208, in module = importlib.import_module("fairseq.models." + model_name) File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/kaggle/working/DSLP/fairseq/models/nat/init.py", line 27, in from .nat_ctc_sd_ss import * File "/kaggle/working/DSLP/fairseq/models/nat/nat_ctc_sd_ss.py", line 18, in from fairseq.torch_imputer import best_alignment, imputer_loss File "/kaggle/working/DSLP/fairseq/torch_imputer/init.py", line 1, in from .imputer import imputer_loss, ImputerLoss, best_alignment, ctc_decode File "/kaggle/working/DSLP/fairseq/torch_imputer/imputer.py", line 11, in imputer = load( File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile _write_ninja_file_and_build_library( File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library _run_ninja_build( File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'imputer_fn': [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=imputer_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_60,code=sm_60 --compiler-options '-fPIC' -std=c++17 -c /kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu -o imputer.cuda.o FAILED: imputer.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=imputer_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_60,code=sm_60 --compiler-options '-fPIC' -std=c++17 -c /kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu -o imputer.cuda.o /kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(332): error: identifier "THCudaCheck" is undefined

/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(753): error: identifier "THCudaCheck" is undefined

/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(817): error: identifier "THCudaCheck" is undefined

/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(842): error: identifier "THCudaCheck" is undefined

/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(859): error: identifier "THCudaCheck" is undefined

5 errors detected in the compilation of "/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu". ninja: build stopped: subcommand failed.

I ran these codes on Google Colab, Kaggle Notebook with the following environment - python 3.10, numpy 1.22.0

Please help me solve this error as I have tried all the possible ways to resolvw this issue.