facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.61k stars 6.42k forks source link

Fairseq-generate giving me the error: 'RuntimeError: Mask Type should be defined' on Colab #4899

Open FleetAdmiral opened 1 year ago

FleetAdmiral commented 1 year ago

Some background:

I'm working on a translation problem where I am able to get through the fairseq-preprocess and fairseq-train but during the process of fairseq-generate, the operation fails in the middle.

I have not found any mention of this error message online as an issue or in any documentation.

What I've attempted from my end:

Reducing train/test size. Increase the train and/or test size. Making sure the test dataset has no unknown token. I'm a novice so this may look elementary, but I'd really appreciate it if you can help me out here.

!fairseq-generate drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/ \
    --path drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/hi_to_hi/checkpoint_hi_hi/checkpoint_best.pt \
    --batch-size 128 \
    --beam 5 \
    --seed 1 \
    --source-lang mt --target-lang pe \
    --results-path drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/results_text \
    --scoring bleu \
    --wandb-project "Hi to Hi" 

This is the error that is then presented:

2022-12-08 14:55:12 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2022-12-08 14:55:14 | INFO | fairseq_cli.generate | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'log_file': None, 'aim_repo': None, 'aim_run_hash': None, 'tensorboard_logdir': None, 'wandb_project': 'Hi to Hi', 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': 'drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/hi_to_hi/checkpoint_hi_hi/checkpoint_best.pt', 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': 'drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/results_text'}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': None, 'batch_size': 128, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': None, 'batch_size_valid': 128, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.25], 'stop_min_lr': -1.0, 'use_bmuf': False, 'skip_remainder_batch': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False, 'eos_token': None}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': {'_name': 'wav2vec2', 'extractor_mode': 'default', 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': 'gelu', 'layer_type': 'transformer', 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 0, 'layer_norm_first': False, 'conv_feature_layers': '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]', 'conv_bias': False, 'logit_temp': 0.1, 'quantize_targets': False, 'quantize_input': False, 'same_quantizer': False, 'target_glu': False, 'feature_grad_mult': 1.0, 'quantizer_depth': 1, 'quantizer_factor': 3, 'latent_vars': 320, 'latent_groups': 2, 'latent_dim': 0, 'mask_length': 10, 'mask_prob': 0.65, 'mask_selection': 'static', 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'require_same_masks': True, 'mask_dropout': 0.0, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_before': False, 'mask_channel_selection': 'static', 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'num_negatives': 100, 'negatives_from_everywhere': False, 'cross_sample_negatives': 0, 'codebook_negatives': 0, 'conv_pos': 128, 'conv_pos_groups': 16, 'pos_conv_depth': 1, 'latent_temp': [2.0, 0.5, 0.999995], 'max_positions': 100000, 'checkpoint_activations': False, 'required_seq_len_multiple': 1, 'crop_seq_to_multiple': 1, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}, 'task': {'_name': 'translation', 'data': 'drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/', 'source_lang': 'mt', 'target_lang': 'pe', 'load_alignments': False, 'left_pad_source': True, 'left_pad_target': False, 'max_source_positions': 1024, 'max_target_positions': 1024, 'upsample_primary': -1, 'truncate_source': False, 'num_batch_buckets': 0, 'train_subset': 'train', 'dataset_impl': None, 'required_seq_len_multiple': 1, 'eval_bleu': False, 'eval_bleu_args': '{}', 'eval_bleu_detok': 'space', 'eval_bleu_detok_args': '{}', 'eval_tokenized_bleu': False, 'eval_bleu_remove_bpe': None, 'eval_bleu_print_samples': False}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': True}, 'optimizer': None, 'lr_scheduler': {'_name': 'fixed', 'force_anneal': None, 'lr_shrink': 0.1, 'warmup_updates': 0, 'lr': [0.25]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq': 1, 'ema_fp32': False}}
2022-12-08 14:55:14 | INFO | fairseq.tasks.translation | [mt] dictionary: 130200 types
2022-12-08 14:55:14 | INFO | fairseq.tasks.translation | [pe] dictionary: 62888 types
2022-12-08 14:55:14 | INFO | fairseq_cli.generate | loading model(s) from drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/hi_to_hi/checkpoint_hi_hi/checkpoint_best.pt
2022-12-08 14:55:16 | INFO | fairseq.data.data_utils | loaded 54,997 examples from: drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/test.mt-pe.mt
2022-12-08 14:55:16 | INFO | fairseq.data.data_utils | loaded 54,997 examples from: drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/test.mt-pe.pe
2022-12-08 14:55:16 | INFO | fairseq.tasks.translation | drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/ test mt-pe 54997 examples
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-generate", line 8, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/generate.py", line 413, in cli_main
    main(args)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/generate.py", line 48, in main
    return _main(cfg, h)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/generate.py", line 201, in _main
    hypos = task.inference_step(
  File "/usr/local/lib/python3.8/dist-packages/fairseq/tasks/fairseq_task.py", line 540, in inference_step
    return generator.generate(
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 204, in generate
    return self._generate(sample, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 274, in _generate
    encoder_outs = self.model.forward_encoder(net_input)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 801, in forward_encoder
    return [model.encoder.forward_torchscript(net_input) for model in self.models]
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 801, in <listcomp>
    return [model.encoder.forward_torchscript(net_input) for model in self.models]
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/fairseq_encoder.py", line 55, in forward_torchscript
    return self.forward_non_torchscript(net_input)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/fairseq_encoder.py", line 62, in forward_non_torchscript
    return self.forward(**encoder_input)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/transformer/transformer_encoder.py", line 165, in forward
    return self.forward_scriptable(
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/transformer/transformer_encoder.py", line 294, in forward_scriptable
    lr = layer(x, encoder_padding_mask=encoder_padding_mask_out)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/modules/transformer_layer.py", line 319, in forward
    output = torch._transformer_encoder_layer_fwd(
RuntimeError: Mask Type should be defined
OmarAshrafFathy commented 1 year ago

I got the same error

geehaad commented 1 year ago

Try to down grade Fairseq to the previous version.

OmarAshrafFathy commented 1 year ago

You can try the following lines: !pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 then !pip install fairseq==0.12.2 it solved the issue, the problem was with the new version of torch that is installed on Colab, so installing the previous version of torch solves the issue.

arnavmehta7 commented 1 year ago

@OmarAshrafFathy Thankyou for this. This error gave me a shock on the piece of code which we hadn't touched since months. 😂

jcheigh commented 1 year ago

Love you @OmarAshrafFathy you saved me

boolmriver commented 1 year ago

The model I trained with fairseq 0.12.2 and torch 2.1.0 also encountered this situation. If the torch version is reduced, does the model still need to be retrained? @OmarAshrafFathy !thank you

OmarAshrafFathy commented 1 year ago

@boolmriver No, you don't need to retrain the model again.

krgy12138 commented 11 months ago

Sorry I don't think it's optimal to fix this problem with a downgraded version. It looks like a problem with the higher version of torch? Is there a solution to this problem

krgy12138 commented 11 months ago

I have a simpler solution , which is to skip _transformer_encoder_layer_fwd by setting can_use_fastpath to False at generate, but that doesn't look good

udiboy1209 commented 10 months ago

This issue seems to be fixed on the latest main branch.