facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.53k stars 6.41k forks source link

Levenshtein Transformer throws cuda error: CUDA error: invalid device function #3289

Open EdieLu opened 3 years ago

EdieLu commented 3 years ago

The following script throws cuda error - CUDA error: invalid device function (when swapping to nonautoregressive_transformer, there is no CUDA error)

export CUDA_VISIBLE_DEVICES=0

fairseq-train \ $base/data-bin/gec-clc \ --save-dir $expdir/exp-lavt/checkpoints \ --ddp-backend=legacy_ddp \ --task translation_lev \ --criterion nat_loss \ --arch levenshtein_transformer \ --noise random_delete \ --share-all-embeddings \ --optimizer adam --adam-betas '(0.9,0.98)' \ --lr 0.0005 --lr-scheduler inverse_sqrt \ --stop-min-lr '1e-09' --warmup-updates 10000 \ --warmup-init-lr '1e-07' --label-smoothing 0.1 \ --dropout 0.3 --weight-decay 0.01 \ --decoder-learned-pos \ --encoder-learned-pos \ --apply-bert-init \ --log-format 'simple' --log-interval 100 \ --fixed-validation-seed 7 \ --max-tokens 8000 \ --save-interval-updates 10000 \ --max-update 300000

chenweihua91 commented 3 years ago

I got the same error, did you solve it?

EdieLu commented 3 years ago

I got the same error, did you solve it?

Most probably due to libnat cuda not properly installed. Error was traced back to _get_ins_targets_cuda in fairseq/models/nat/levenshtein_utils.py

stale[bot] commented 3 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!