Closed MikeyBeez closed 3 years ago
Can you try again with CUDA_LAUNCH_BLOCKING=1? Python doesn't give accurate stack traces for cuda errors, except when using this slower debug mode.
Typically this error occurs when the model is asked to read/write a sentence that's too long, but we have protections for that. Seeing the true stack trace may help identify.
I'm guessing that's an environment variable:
export CUDA_LAUNCH_BLOCKING=1
❯ python parlai/scripts/safe_interactive.py -t blended_skill_talk -mf zoo:blender/blender_3B/model -bs=1
12:21:30 | Overriding opt["task"] to blended_skill_talk (previously: internal:blended_skill_talk,wizard_of_wikipedia,convai2:normalized,empathetic_dialogues)
12:21:30 | Overriding opt["model_file"] to /home/bard/ParlAI/data/models/blender/blender_3B/model (previously: /checkpoint/edinan/20200331/finetune_bst_gen_baseline_convai2_normal/de6/model)
12:21:30 | Loading model with --beam-block-full-context false
12:21:30 | Using CUDA
12:21:30 | loading dictionary from /home/bard/ParlAI/data/models/blender/blender_3B/model.dict
12:21:30 | num words = 8008
12:21:30 | TransformerGenerator: full interactive mode on.
12:21:59 | Total parameters: 2,696,268,800 (2,695,613,440 trainable)
12:21:59 | Loading existing model params from /home/bard/ParlAI/data/models/blender/blender_3B/model
12:22:03 | Opt:
12:22:03 | activation: gelu
12:22:03 | adafactor_eps: '[1e-30, 0.001]'
12:22:03 | adam_eps: 1e-08
12:22:03 | add_p1_after_newln: False
12:22:03 | aggregate_micro: False
12:22:03 | allow_missing_init_opts: False
12:22:03 | attention_dropout: 0.0
12:22:03 | batchsize: 128
12:22:03 | beam_block_full_context: False
12:22:03 | beam_block_list_filename: None
12:22:03 | beam_block_ngram: 3
12:22:03 | beam_context_block_ngram: 3
12:22:03 | beam_delay: 30
12:22:03 | beam_length_penalty: 0.65
12:22:03 | beam_min_length: 20
12:22:03 | beam_size: 10
12:22:03 | betas: '[0.9, 0.999]'
12:22:03 | bpe_add_prefix_space: True
12:22:03 | bpe_debug: False
12:22:03 | bpe_dropout: None
12:22:03 | bpe_merge: /home/bard/ParlAI/data/models/blender/blender_3B/model.dict-merges.txt
12:22:03 | bpe_vocab: /home/bard/ParlAI/data/models/blender/blender_3B/model.dict-vocab.json
12:22:03 | compute_tokenized_bleu: False
12:22:03 | datapath: /home/bard/ParlAI/data
12:22:03 | datatype: train
12:22:03 | delimiter: ' '
12:22:03 | dict_class: parlai.core.dict:DictionaryAgent
12:22:03 | dict_endtoken: end
12:22:03 | dict_file: /home/bard/ParlAI/data/models/blender/blender_3B/model.dict
12:22:03 | dict_include_test: False
12:22:03 | dict_include_valid: False
12:22:03 | dict_initpath: None
12:22:03 | dict_language: english
12:22:03 | dict_loaded: True
12:22:03 | dict_lower: False
12:22:03 | dict_max_ngram_size: -1
12:22:03 | dict_maxexs: -1
12:22:03 | dict_maxtokens: -1
12:22:03 | dict_minfreq: 0
12:22:03 | dict_nulltoken: null
12:22:03 | dict_starttoken: start
12:22:03 | dict_textfields: text,labels
12:22:03 | dict_tokenizer: bytelevelbpe
12:22:03 | dict_unktoken: unk
12:22:03 | display_add_fields:
12:22:03 | display_examples: False
12:22:03 | display_partner_persona: True
12:22:03 | display_prettify: False
12:22:03 | download_path: None
12:22:03 | dropout: 0.1
12:22:03 | dynamic_batching: None
12:22:03 | embedding_projection: random
12:22:03 | embedding_size: 2560
12:22:03 | embedding_type: random
12:22:03 | embeddings_scale: True
12:22:03 | eval_batchsize: None
12:22:03 | evaltask: None
12:22:03 | ffn_size: 10240
12:22:03 | force_fp16_tokens: True
12:22:03 | fp16: True
12:22:03 | fp16_impl: mem_efficient
12:22:03 | gpu: -1
12:22:03 | gradient_clip: 0.1
12:22:03 | hide_labels: False
12:22:03 | history_add_global_end_token: end
12:22:03 | history_reversed: False
12:22:03 | history_size: -1
12:22:03 | image_cropsize: 224
12:22:03 | image_mode: raw
12:22:03 | image_size: 256
12:22:03 | include_checked_sentence: True
12:22:03 | include_initial_utterances: False
12:22:03 | include_knowledge: True
12:22:03 | include_knowledge_separator: False
12:22:03 | include_personas: True
12:22:03 | inference: beam
12:22:03 | init_model: /checkpoint/parlai/zoo/meena/20200319_meenav0data_tall_2.7B_adamoptimizer/20200319_13.3ppl_200kupdates/model
12:22:03 | init_opt: None
12:22:03 | interactive_mode: True
12:22:03 | interactive_task: True
12:22:03 | invsqrt_lr_decay_gamma: -1
12:22:03 | label_truncate: 128
12:22:03 | label_type: response
12:22:03 | learn_positional_embeddings: False
12:22:03 | learningrate: 7e-06
12:22:03 | local_human_candidates_file: None
12:22:03 | log_every_n_secs: 10.0
12:22:03 | loglevel: info
12:22:03 | lr_scheduler: reduceonplateau
12:22:03 | lr_scheduler_decay: 0.5
12:22:03 | lr_scheduler_patience: 3
12:22:03 | max_lr_steps: -1
12:22:03 | max_train_time: 27647.999999999996
12:22:03 | metrics: default
12:22:03 | model: transformer/generator
12:22:03 | model_file: /home/bard/ParlAI/data/models/blender/blender_3B/model
12:22:03 | model_parallel: True
12:22:03 | momentum: 0
12:22:03 | multitask_weights: '[1.0, 3.0, 3.0, 3.0]'
12:22:03 | n_decoder_layers: 24
12:22:03 | n_encoder_layers: 2
12:22:03 | n_heads: 32
12:22:03 | n_layers: 2
12:22:03 | n_positions: 128
12:22:03 | n_segments: 0
12:22:03 | nesterov: True
12:22:03 | no_cuda: False
12:22:03 | num_epochs: -1
12:22:03 | num_topics: 5
12:22:03 | numthreads: 1
12:22:03 | nus: [0.7]
12:22:03 | optimizer: mem_eff_adam
12:22:03 | output_scaling: 1.0
12:22:03 | override: "{'task': 'blended_skill_talk', 'model_file': '/home/bard/ParlAI/data/models/blender/blender_3B/model'}"
12:22:03 | parlai_home: /checkpoint/edinan/20200331/finetune_bst_gen_baseline_convai2_normal/ParlAI
12:22:03 | person_tokens: False
12:22:03 | rank_candidates: False
12:22:03 | relu_dropout: 0.0
12:22:03 | remove_political_convos: False
12:22:03 | safe_personas_only: True
12:22:03 | safety: all
12:22:03 | save_after_valid: True
12:22:03 | save_every_n_secs: -1
12:22:03 | share_word_embeddings: True
12:22:03 | short_final_eval: False
12:22:03 | show_advanced_args: False
12:22:03 | single_turn: False
12:22:03 | skip_generation: False
12:22:03 | special_tok_lst: None
12:22:03 | split_lines: False
12:22:03 | starttime: Mar31_06-04
12:22:03 | task: blended_skill_talk
12:22:03 | temperature: 1.0
12:22:03 | tensorboard_log: False
12:22:03 | text_truncate: 128
12:22:03 | topk: 10
12:22:03 | topp: 0.9
12:22:03 | train_experiencer_only: False
12:22:03 | truncate: 128
12:22:03 | update_freq: 2
12:22:03 | use_reply: label
12:22:03 | validation_cutoff: 1.0
12:22:03 | validation_every_n_epochs: 0.25
12:22:03 | validation_every_n_secs: -1
12:22:03 | validation_max_exs: -1
12:22:03 | validation_metric: ppl
12:22:03 | validation_metric_mode: min
12:22:03 | validation_patience: 10
12:22:03 | validation_share_agent: False
12:22:03 | variant: prelayernorm
12:22:03 | verbose: False
12:22:03 | warmup_rate: 0.0001
12:22:03 | warmup_updates: 100
12:22:03 | weight_decay: None
12:22:03 | Current ParlAI commit: 5104b2b954808ba4d0b92271dea0e771ace2924f
Enter [DONE] if you want to end the episode, [EXIT] to quit.
12:22:03 | Overriding opt["model"] to transformer/classifier (previously: transformer_classifier)
12:22:03 | Overriding opt["model_file"] to /home/bard/ParlAI/data/models/dialogue_safety/single_turn/model (previously: /checkpoint/edinan/20190828/safety_reddit/contiguous-dropout=0_multitask-weights=0.5,0.1,0.1,0.4,0.2_lr=5e-05_lr-scheduler-patience=3_lr-scheduler-decay=0.9_warmupupdates=1000/model)
12:22:03 | Overriding opt["print_scores"] to True (previously: False)
12:22:03 | Overriding opt["data_parallel"] to False (previously: True)
12:22:03 | Using CUDA
12:22:03 | loading dictionary from /home/bard/ParlAI/data/models/dialogue_safety/single_turn/model.dict
12:22:03 | num words = 54944
12:22:05 | Loading existing model parameters from /home/bard/ParlAI/data/models/dialogue_safety/single_turn/model
12:22:06 | Total parameters: 128,042,498 (128,042,498 trainable)
12:22:06 | creating task(s): blended_skill_talk
[ loading personas.. ]
[NOTE: In the BST paper both partners have a persona. You can choose to ignore yours, the model never sees it. In the Blender paper, this was not used for humans. You can also turn personas off with --include-personas False]
[context]: your persona: i am a registered nurse.
your persona: my favorite movie is pretty woman.
Enter Your Message: Hello
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [37,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [38,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [39,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [40,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [41,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [42,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [43,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [44,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [45,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [46,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [47,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [48,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [49,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [50,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [51,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [52,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [53,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [54,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [35,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [20,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [23,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize
failed.
Traceback (most recent call last):
File "parlai/scripts/safe_interactive.py", line 87, in
Thanks, that's a very different part of the code, so glad we have the true stacktrace.
It's my pleasure. Please let me know if I start becoming a pain. I don't have certain filters nor restraints.
Can you try just a run of display_model with the same arguments instead of safe_interactive?
Do you have a second GPU by chance? If so, can you try adding --model-parallel true
?
I have a strange setup with three GPUs. I have a 1050Ti with 4GBs of memory, and I have a Tesla K80 TPU with 24GBs. The K80 has two TPU chips. I set export CUDA_VISIBLE_DEVICES=2,1, so I don't use the 1050. display_model.py works. I have to be careful with some parallel settings because the 1050 gets grabbed and I get an out of memmory error, but --model-parallel true works fine here. Here's the output:
❯ python parlai/scripts/display_model.py -t blended_skill_talk -mf zoo:blender/blender_3B/model -bs=1
11:49:40 | Overriding opt["task"] to blended_skill_talk (previously: internal:blended_skill_talk,wizard_of_wikipedia,convai2:normalized,empathetic_dialogues)
11:49:40 | Overriding opt["model_file"] to /home/bard/ParlAI/data/models/blender/blender_3B/model (previously: /checkpoint/edinan/20200331/finetune_bst_gen_baseline_convai2_normal/de6/model)
11:49:40 | Loading model with --beam-block-full-context false
11:49:40 | Using CUDA
11:49:40 | loading dictionary from /home/bard/ParlAI/data/models/blender/blender_3B/model.dict
11:49:40 | num words = 8008
11:50:10 | Total parameters: 2,696,268,800 (2,695,613,440 trainable)
11:50:11 | Loading existing model params from /home/bard/ParlAI/data/models/blender/blender_3B/model
11:50:14 | creating task(s): blended_skill_talk
11:50:14 | Loading ParlAI text data: /home/bard/ParlAI/data/blended_skill_talk/valid.txt
11:50:14 | Opt:
11:50:14 | activation: gelu
11:50:14 | adafactor_eps: '[1e-30, 0.001]'
11:50:14 | adam_eps: 1e-08
11:50:14 | add_p1_after_newln: False
11:50:14 | aggregate_micro: False
11:50:14 | allow_missing_init_opts: False
11:50:14 | attention_dropout: 0.0
11:50:14 | batchsize: 128
11:50:14 | beam_block_full_context: False
11:50:14 | beam_block_list_filename: None
11:50:14 | beam_block_ngram: 3
11:50:14 | beam_context_block_ngram: 3
11:50:14 | beam_delay: 30
11:50:14 | beam_length_penalty: 0.65
11:50:14 | beam_min_length: 20
11:50:14 | beam_size: 10
11:50:14 | betas: '[0.9, 0.999]'
11:50:14 | bpe_add_prefix_space: True
11:50:14 | bpe_debug: False
11:50:14 | bpe_dropout: None
11:50:14 | bpe_merge: /home/bard/ParlAI/data/models/blender/blender_3B/model.dict-merges.txt
11:50:14 | bpe_vocab: /home/bard/ParlAI/data/models/blender/blender_3B/model.dict-vocab.json
11:50:14 | compute_tokenized_bleu: False
11:50:14 | datapath: /home/bard/ParlAI/data
11:50:14 | datatype: train
11:50:14 | delimiter: ' '
11:50:14 | dict_class: parlai.core.dict:DictionaryAgent
11:50:14 | dict_endtoken: end
11:50:14 | dict_file: /home/bard/ParlAI/data/models/blender/blender_3B/model.dict
11:50:14 | dict_include_test: False
11:50:14 | dict_include_valid: False
11:50:14 | dict_initpath: None
11:50:14 | dict_language: english
11:50:14 | dict_loaded: True
11:50:14 | dict_lower: False
11:50:14 | dict_max_ngram_size: -1
11:50:14 | dict_maxexs: -1
11:50:14 | dict_maxtokens: -1
11:50:14 | dict_minfreq: 0
11:50:14 | dict_nulltoken: null
11:50:14 | dict_starttoken: start
11:50:14 | dict_textfields: text,labels
11:50:14 | dict_tokenizer: bytelevelbpe
11:50:14 | dict_unktoken: unk
11:50:14 | display_add_fields:
11:50:14 | display_examples: False
11:50:14 | download_path: None
11:50:14 | dropout: 0.1
11:50:14 | dynamic_batching: None
11:50:14 | embedding_projection: random
11:50:14 | embedding_size: 2560
11:50:14 | embedding_type: random
11:50:14 | embeddings_scale: True
11:50:14 | eval_batchsize: None
11:50:14 | evaltask: None
11:50:14 | ffn_size: 10240
11:50:14 | force_fp16_tokens: True
11:50:14 | fp16: True
11:50:14 | fp16_impl: mem_efficient
11:50:14 | gpu: -1
11:50:14 | gradient_clip: 0.1
11:50:14 | hide_labels: False
11:50:14 | history_add_global_end_token: end
11:50:14 | history_reversed: False
11:50:14 | history_size: -1
11:50:14 | image_cropsize: 224
11:50:14 | image_mode: raw
11:50:14 | image_size: 256
11:50:14 | include_checked_sentence: True
11:50:14 | include_knowledge: True
11:50:14 | include_knowledge_separator: False
11:50:14 | inference: beam
11:50:14 | init_model: /checkpoint/parlai/zoo/meena/20200319_meenav0data_tall_2.7B_adamoptimizer/20200319_13.3ppl_200kupdates/model
11:50:14 | init_opt: None
11:50:14 | interactive_mode: False
11:50:14 | invsqrt_lr_decay_gamma: -1
11:50:14 | label_truncate: 128
11:50:14 | label_type: response
11:50:14 | learn_positional_embeddings: False
11:50:14 | learningrate: 7e-06
11:50:14 | log_every_n_secs: 10.0
11:50:14 | loglevel: info
11:50:14 | lr_scheduler: reduceonplateau
11:50:14 | lr_scheduler_decay: 0.5
11:50:14 | lr_scheduler_patience: 3
11:50:14 | max_lr_steps: -1
11:50:14 | max_train_time: 27647.999999999996
11:50:14 | metrics: default
11:50:14 | model: transformer/generator
11:50:14 | model_file: /home/bard/ParlAI/data/models/blender/blender_3B/model
11:50:14 | model_parallel: True
11:50:14 | momentum: 0
11:50:14 | multitask_weights: '[1.0, 3.0, 3.0, 3.0]'
11:50:14 | n_decoder_layers: 24
11:50:14 | n_encoder_layers: 2
11:50:14 | n_heads: 32
11:50:14 | n_layers: 2
11:50:14 | n_positions: 128
11:50:14 | n_segments: 0
11:50:14 | nesterov: True
11:50:14 | no_cuda: False
11:50:14 | num_epochs: -1
11:50:14 | num_examples: 10
11:50:14 | num_topics: 5
11:50:14 | numthreads: 1
11:50:14 | nus: [0.7]
11:50:14 | optimizer: mem_eff_adam
11:50:14 | output_scaling: 1.0
11:50:14 | override: "{'task': 'blended_skill_talk', 'model_file': '/home/bard/ParlAI/data/models/blender/blender_3B/model'}"
11:50:14 | parlai_home: /checkpoint/edinan/20200331/finetune_bst_gen_baseline_convai2_normal/ParlAI
11:50:14 | person_tokens: False
11:50:14 | rank_candidates: False
11:50:14 | relu_dropout: 0.0
11:50:14 | remove_political_convos: False
11:50:14 | save_after_valid: True
11:50:14 | save_every_n_secs: -1
11:50:14 | share_word_embeddings: True
11:50:14 | short_final_eval: False
11:50:14 | show_advanced_args: False
11:50:14 | skip_generation: False
11:50:14 | special_tok_lst: None
11:50:14 | split_lines: False
11:50:14 | starttime: Mar31_06-04
11:50:14 | task: blended_skill_talk
11:50:14 | temperature: 1.0
11:50:14 | tensorboard_log: False
11:50:14 | text_truncate: 128
11:50:14 | topk: 10
11:50:14 | topp: 0.9
11:50:14 | train_experiencer_only: False
11:50:14 | truncate: 128
11:50:14 | update_freq: 2
11:50:14 | use_reply: label
11:50:14 | validation_cutoff: 1.0
11:50:14 | validation_every_n_epochs: 0.25
11:50:14 | validation_every_n_secs: -1
11:50:14 | validation_max_exs: -1
11:50:14 | validation_metric: ppl
11:50:14 | validation_metric_mode: min
11:50:14 | validation_patience: 10
11:50:14 | validation_share_agent: False
11:50:14 | variant: prelayernorm
11:50:14 | verbose: False
11:50:14 | warmup_rate: 0.0001
11:50:14 | warmup_updates: 100
11:50:14 | weight_decay: None
11:50:14 | Current ParlAI commit: 4fd58a3ed7ea9dac692abf6a9981219c8ef5b7bd
BTW, I interrupted that run. Then I re-ran it with --model-parallel true, and it ran fine to the end.
Here's the same job with export CUDA_LAUNCH_BLOCKING=1
❯ python parlai/scripts/display_model.py -t blended_skill_talk -mf zoo:blender/blender_3B/model -bs=1 --model-parallel true
12:05:55 | Overriding opt["task"] to blended_skill_talk (previously: internal:blended_skill_talk,wizard_of_wikipedia,convai2:normalized,empathetic_dialogues)
12:05:55 | Overriding opt["model_file"] to /home/bard/ParlAI/data/models/blender/blender_3B/model (previously: /checkpoint/edinan/20200331/finetune_bst_gen_baseline_convai2_normal/de6/model)
12:05:55 | Loading model with --beam-block-full-context false
12:05:55 | Using CUDA
12:05:55 | loading dictionary from /home/bard/ParlAI/data/models/blender/blender_3B/model.dict
12:05:55 | num words = 8008
12:06:24 | Total parameters: 2,696,268,800 (2,695,613,440 trainable)
12:06:25 | Loading existing model params from /home/bard/ParlAI/data/models/blender/blender_3B/model
12:06:27 | creating task(s): blended_skill_talk
12:06:27 | Loading ParlAI text data: /home/bard/ParlAI/data/blended_skill_talk/valid.txt
12:06:27 | Opt:
12:06:27 | activation: gelu
12:06:27 | adafactor_eps: '[1e-30, 0.001]'
12:06:27 | adam_eps: 1e-08
12:06:27 | add_p1_after_newln: False
12:06:27 | aggregate_micro: False
12:06:27 | allow_missing_init_opts: False
12:06:27 | attention_dropout: 0.0
12:06:27 | batchsize: 128
12:06:27 | beam_block_full_context: False
12:06:27 | beam_block_list_filename: None
12:06:27 | beam_block_ngram: 3
12:06:27 | beam_context_block_ngram: 3
12:06:27 | beam_delay: 30
12:06:27 | beam_length_penalty: 0.65
12:06:27 | beam_min_length: 20
12:06:27 | beam_size: 10
12:06:27 | betas: '[0.9, 0.999]'
12:06:27 | bpe_add_prefix_space: True
12:06:27 | bpe_debug: False
12:06:27 | bpe_dropout: None
12:06:27 | bpe_merge: /home/bard/ParlAI/data/models/blender/blender_3B/model.dict-merges.txt
12:06:27 | bpe_vocab: /home/bard/ParlAI/data/models/blender/blender_3B/model.dict-vocab.json
12:06:27 | compute_tokenized_bleu: False
12:06:27 | datapath: /home/bard/ParlAI/data
12:06:27 | datatype: train
12:06:27 | delimiter: ' '
12:06:27 | dict_class: parlai.core.dict:DictionaryAgent
12:06:27 | dict_endtoken: end
12:06:27 | dict_file: /home/bard/ParlAI/data/models/blender/blender_3B/model.dict
12:06:27 | dict_include_test: False
12:06:27 | dict_include_valid: False
12:06:27 | dict_initpath: None
12:06:27 | dict_language: english
12:06:27 | dict_loaded: True
12:06:27 | dict_lower: False
12:06:27 | dict_max_ngram_size: -1
12:06:27 | dict_maxexs: -1
12:06:27 | dict_maxtokens: -1
12:06:27 | dict_minfreq: 0
12:06:27 | dict_nulltoken: null
12:06:27 | dict_starttoken: start
12:06:27 | dict_textfields: text,labels
12:06:27 | dict_tokenizer: bytelevelbpe
12:06:27 | dict_unktoken: unk
12:06:27 | display_add_fields:
12:06:27 | display_examples: False
12:06:27 | download_path: None
12:06:27 | dropout: 0.1
12:06:27 | dynamic_batching: None
12:06:27 | embedding_projection: random
12:06:27 | embedding_size: 2560
12:06:27 | embedding_type: random
12:06:27 | embeddings_scale: True
12:06:27 | eval_batchsize: None
12:06:27 | evaltask: None
12:06:27 | ffn_size: 10240
12:06:27 | force_fp16_tokens: True
12:06:27 | fp16: True
12:06:27 | fp16_impl: mem_efficient
12:06:27 | gpu: -1
12:06:27 | gradient_clip: 0.1
12:06:27 | hide_labels: False
12:06:27 | history_add_global_end_token: end
12:06:27 | history_reversed: False
12:06:27 | history_size: -1
12:06:27 | image_cropsize: 224
12:06:27 | image_mode: raw
12:06:27 | image_size: 256
12:06:27 | include_checked_sentence: True
12:06:27 | include_knowledge: True
12:06:27 | include_knowledge_separator: False
12:06:27 | inference: beam
12:06:27 | init_model: /checkpoint/parlai/zoo/meena/20200319_meenav0data_tall_2.7B_adamoptimizer/20200319_13.3ppl_200kupdates/model
12:06:27 | init_opt: None
12:06:27 | interactive_mode: False
12:06:27 | invsqrt_lr_decay_gamma: -1
12:06:27 | label_truncate: 128
12:06:27 | label_type: response
12:06:27 | learn_positional_embeddings: False
12:06:27 | learningrate: 7e-06
12:06:27 | log_every_n_secs: 10.0
12:06:27 | loglevel: info
12:06:27 | lr_scheduler: reduceonplateau
12:06:27 | lr_scheduler_decay: 0.5
12:06:27 | lr_scheduler_patience: 3
12:06:27 | max_lr_steps: -1
12:06:27 | max_train_time: 27647.999999999996
12:06:27 | metrics: default
12:06:27 | model: transformer/generator
12:06:27 | model_file: /home/bard/ParlAI/data/models/blender/blender_3B/model
12:06:27 | model_parallel: True
12:06:27 | momentum: 0
12:06:27 | multitask_weights: '[1.0, 3.0, 3.0, 3.0]'
12:06:27 | n_decoder_layers: 24
12:06:27 | n_encoder_layers: 2
12:06:27 | n_heads: 32
12:06:27 | n_layers: 2
12:06:27 | n_positions: 128
12:06:27 | n_segments: 0
12:06:27 | nesterov: True
12:06:27 | no_cuda: False
12:06:27 | num_epochs: -1
12:06:27 | num_examples: 10
12:06:27 | num_topics: 5
12:06:27 | numthreads: 1
12:06:27 | nus: [0.7]
12:06:27 | optimizer: mem_eff_adam
12:06:27 | output_scaling: 1.0
12:06:27 | override: "{'task': 'blended_skill_talk', 'model_file': '/home/bard/ParlAI/data/models/blender/blender_3B/model', 'model_parallel': True}"
12:06:27 | parlai_home: /checkpoint/edinan/20200331/finetune_bst_gen_baseline_convai2_normal/ParlAI
12:06:27 | person_tokens: False
12:06:27 | rank_candidates: False
12:06:27 | relu_dropout: 0.0
12:06:27 | remove_political_convos: False
12:06:27 | save_after_valid: True
12:06:27 | save_every_n_secs: -1
12:06:27 | share_word_embeddings: True
12:06:27 | short_final_eval: False
12:06:27 | show_advanced_args: False
12:06:27 | skip_generation: False
12:06:27 | special_tok_lst: None
12:06:27 | split_lines: False
12:06:27 | starttime: Mar31_06-04
12:06:27 | task: blended_skill_talk
12:06:27 | temperature: 1.0
12:06:27 | tensorboard_log: False
12:06:27 | text_truncate: 128
12:06:27 | topk: 10
12:06:27 | topp: 0.9
12:06:27 | train_experiencer_only: False
12:06:27 | truncate: 128
12:06:27 | update_freq: 2
12:06:27 | use_reply: label
12:06:27 | validation_cutoff: 1.0
12:06:27 | validation_every_n_epochs: 0.25
12:06:27 | validation_every_n_secs: -1
12:06:27 | validation_max_exs: -1
12:06:27 | validation_metric: ppl
12:06:27 | validation_metric_mode: min
12:06:27 | validation_patience: 10
12:06:27 | validation_share_agent: False
12:06:27 | variant: prelayernorm
12:06:27 | verbose: False
12:06:27 | warmup_rate: 0.0001
12:06:27 | warmup_updates: 100
12:06:27 | weight_decay: None
12:06:28 | Current ParlAI commit: 4fd58a3ed7ea9dac692abf6a9981219c8ef5b7bd
I'm going to close this issue since it seems like --model-parallel true with careful options works well. Reopen if you have further questions.
While it's fine to have the heterogeneous setup, our implementation assumes a homogenous one, and therefore may distribute weights non-optimally across the devices.
Thank you Stephen. I just returned today from a business trip to Arkansas. I think it-s fine to close this.
Cheers all. It's always something with me. ;)
Bug description python parlai/scripts/safe_interactive.py -t blended_skill_talk -mf zoo:blender/blender_3B/model
I get the prompt and enter Hello
RuntimeError: CUDA error: device-side assert triggered
Reproduction steps Enter steps to reproduce the behavior.
Expected behavior Give a clear and concise description of what you expected to happen.
Logs [ loading personas.. ]
[NOTE: In the BST paper both partners have a persona. You can choose to ignore yours, the model never sees it. In the Blender paper, this was not used for humans. You can also turn personas off with --include-personas False]
[context]: your persona: i now live in new mexico. your persona: i grew up in nevada. Enter Your Message: Hello Traceback (most recent call last): File "parlai/scripts/safe_interactive.py", line 87, in
SafeInteractive.main()
File "/home/bard/ParlAI/parlai/core/script.py", line 111, in main
return cls._run_args(None)
File "/home/bard/ParlAI/parlai/core/script.py", line 84, in _run_args
return cls._run_from_parser_and_opt(opt, parser)
File "/home/bard/ParlAI/parlai/core/script.py", line 90, in _run_from_parser_and_opt
return script.run()
File "parlai/scripts/safe_interactive.py", line 82, in run
return safe_interactive(self.opt)
File "parlai/scripts/safe_interactive.py", line 62, in safe_interactive
world.parley()
File "/home/bard/ParlAI/parlai/tasks/interactive/worlds.py", line 78, in parley
acts[1] = agents[1].act()
File "/home/bard/ParlAI/parlai/core/torch_agent.py", line 1946, in act
response = self.batch_act([self.observation])[0]
File "/home/bard/ParlAI/parlai/core/torch_agent.py", line 2007, in batch_act
output = self.eval_step(batch)
File "/home/bard/ParlAI/parlai/core/torch_generator_agent.py", line 891, in eval_step
beam_preds_scores, beams = self._generate(batch, self.beam_size, maxlen)
File "/home/bard/ParlAI/parlai/core/torch_generator_agent.py", line 1135, in _generate
score, incr_state = model.decoder(decoder_input, encoder_states, incr_state)
File "/home/bard/miniconda3/envs/parlai/lib/python3.7/site-packages/torch-1.7.1-py3.7-linux-x86_64.egg/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/bard/ParlAI/parlai/agents/transformer/modules.py", line 888, in forward
tensor = self.forward_embedding(input, positions)
File "/home/bard/ParlAI/parlai/agents/transformer/modules.py", line 810, in forward_embedding
if positions.max().item() > self.n_positions:
RuntimeError: CUDA error: device-side assert triggered
Additional context https://i.ytimg.com/vi/CEVaHj73s5g/maxresdefault.jpg