facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.45k stars 925 forks source link

After using extract_features_vmb.py to extract features, and then training with mcan, it reported the error message that the feature dimensions did not match. #1239

Open clearlove7-s11 opened 2 years ago

clearlove7-s11 commented 2 years ago

❓ Questions and Help

After using extract_features_vmb.py to extract features, and then training with mcan, it reported the error message that the feature dimensions did not match.

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [256, 2048, 1, 1], but got 3-dimensional input of size [32, 100, 2048] instead

''' 2022-04-24T08:36:57 | mmf.utils.configuration: Overriding option config to /home/cvpr/vqa/mmf-main/projects/movie_mcan/configs/vqa2/defaults.yaml 2022-04-24T08:36:57 | mmf.utils.configuration: Overriding option model to movie_mcan 2022-04-24T08:36:57 | mmf.utils.configuration: Overriding option datasets to vqa2 2022-04-24T08:36:57 | mmf.utils.configuration: Overriding option run_type to train_val /root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_LOG_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, /root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_REPORT_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, /root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_TENSORBOARD_LOGDIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, /root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_WANDB_LOGDIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, /root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, 2022-04-24T08:37:01 | mmf.utils.distributed: XLA Mode:False 2022-04-24T08:37:01 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:14677 /root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, 2022-04-24T08:37:01 | mmf.utils.distributed: XLA Mode:False 2022-04-24T08:37:01 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:14677 2022-04-24T08:37:02 | torch.distributed.distributed_c10d: Added key: store_based_barrier_key:1 to store for rank: 1 2022-04-24T08:37:02 | torch.distributed.distributed_c10d: Added key: store_based_barrier_key:1 to store for rank: 0 2022-04-24T08:37:02 | torch.distributed.distributed_c10d: Rank 0: Completed store-based barrier for 2 nodes. 2022-04-24T08:37:02 | mmf.utils.distributed: Initialized Host dgx5 as Rank 0 2022-04-24T08:37:02 | torch.distributed.distributed_c10d: Rank 1: Completed store-based barrier for 2 nodes. 2022-04-24T08:37:02 | mmf.utils.distributed: Initialized Host dgx5 as Rank 1 2022-04-24T08:37:23 | mmf: Logging to: ./save/train.log 2022-04-24T08:37:23 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=/home/cvpr/vqa/mmf-main/projects/movie_mcan/configs/vqa2/defaults.yaml', 'model=movie_mcan', 'datasets=vqa2', 'run_type=train_val']) 2022-04-24T08:37:23 | mmf_cli.run: Torch version: 1.9.0+cu102 2022-04-24T08:37:23 | mmf.utils.general: CUDA Device 0 is: Tesla V100-SXM2-32GB 2022-04-24T08:37:23 | mmf_cli.run: Using seed 23836977 2022-04-24T08:37:23 | mmf.trainers.mmf_trainer: Loading datasets qqqqqqqqqwqwqwwwwwwwwwwwwwwwwwwwwwwwwwwaaaaaaaaaaaaaaaaa qqqqqqqqqwqwqwwwwwwwwwwwwwwwwwwwwwwwwwwaaaaaaaaaaaaaaaaa 2022-04-24T08:37:24 | torchtext.vocab: Loading vectors from /root/.cache/torch/mmf/glove.6B.300d.txt.pt 2022-04-24T08:37:25 | torchtext.vocab: Loading vectors from /root/.cache/torch/mmf/glove.6B.300d.txt.pt 2022-04-24T08:37:26 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training 2022-04-24T08:37:26 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training 2022-04-24T08:37:26 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training 2022-04-24T08:37:26 | mmf.trainers.mmf_trainer: Loading model 2022-04-24T08:37:29 | mmf.trainers.mmf_trainer: Loading optimizer 2022-04-24T08:37:29 | mmf.trainers.mmf_trainer: Loading metrics 2022-04-24T08:37:29 | mmf.trainers.core.device: Using PyTorch DistributedDataParallel WARNING 2022-04-24T08:37:29 | py.warnings: /home/cvpr/vqa/mmf-main/mmf/utils/distributed.py:412: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)

WARNING 2022-04-24T08:37:29 | py.warnings: /home/cvpr/vqa/mmf-main/mmf/utils/distributed.py:412: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)

2022-04-24T08:37:29 | mmf.trainers.mmf_trainer: ===== Model ===== 2022-04-24T08:37:29 | mmf.trainers.mmf_trainer: DistributedDataParallel( (module): MoVieMcan( (word_embedding): Embedding(75505, 300) (text_embeddings): TextEmbedding( (module): SAEmbedding( (lstm): LSTM(300, 1024, batch_first=True) (self_attns): ModuleList( (0): SelfAttention( (multi_head_attn): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): Dropout(p=0.1, inplace=False) (ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (1): SelfAttention( (multi_head_attn): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): Dropout(p=0.1, inplace=False) (ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (2): SelfAttention( (multi_head_attn): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): Dropout(p=0.1, inplace=False) (ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (3): SelfAttention( (multi_head_attn): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): Dropout(p=0.1, inplace=False) (ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (4): SelfAttention( (multi_head_attn): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): Dropout(p=0.1, inplace=False) (ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (5): SelfAttention( (multi_head_attn): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): Dropout(p=0.1, inplace=False) (ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) (attn_pool): AttnPool1d( (linear): Sequential( (0): Linear(in_features=1024, out_features=512, bias=True) (1): ReLU() (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=512, out_features=2, bias=True) ) ) ) ) (image_feature_encoders): Identity() (image_feature_embeddings_list): TwoBranchEmbedding( (sga): SGAEmbedding( (linear): Linear(in_features=2048, out_features=1024, bias=True) (self_guided_attns): ModuleList( (0): SelfGuidedAttention( (multi_head_attn): ModuleList( (0): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (1): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): ModuleList( (0): Dropout(p=0.1, inplace=False) (1): Dropout(p=0.1, inplace=False) ) (ln_mha): ModuleList( (0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (1): SelfGuidedAttention( (multi_head_attn): ModuleList( (0): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (1): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): ModuleList( (0): Dropout(p=0.1, inplace=False) (1): Dropout(p=0.1, inplace=False) ) (ln_mha): ModuleList( (0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (2): SelfGuidedAttention( (multi_head_attn): ModuleList( (0): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (1): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): ModuleList( (0): Dropout(p=0.1, inplace=False) (1): Dropout(p=0.1, inplace=False) ) (ln_mha): ModuleList( (0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (3): SelfGuidedAttention( (multi_head_attn): ModuleList( (0): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (1): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): ModuleList( (0): Dropout(p=0.1, inplace=False) (1): Dropout(p=0.1, inplace=False) ) (ln_mha): ModuleList( (0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (4): SelfGuidedAttention( (multi_head_attn): ModuleList( (0): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (1): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): ModuleList( (0): Dropout(p=0.1, inplace=False) (1): Dropout(p=0.1, inplace=False) ) (ln_mha): ModuleList( (0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (5): SelfGuidedAttention( (multi_head_attn): ModuleList( (0): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) (1): MovieMcanMultiHeadAttention( (linears): ModuleList( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=1024, bias=True) (2): Linear(in_features=1024, out_features=1024, bias=True) (3): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) ) ) (fcn): Sequential( (0): Linear(in_features=1024, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=4096, out_features=1024, bias=True) ) (drop_mha): ModuleList( (0): Dropout(p=0.1, inplace=False) (1): Dropout(p=0.1, inplace=False) ) (ln_mha): ModuleList( (0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (drop_fcn): Dropout(p=0.1, inplace=False) (ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) ) (sga_pool): AttnPool1d( (linear): Sequential( (0): Linear(in_features=1024, out_features=512, bias=True) (1): ReLU() (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=512, out_features=1, bias=True) ) ) (cbn): CBNEmbedding( (layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (cbns): ModuleList( (0): MovieBottleneck( (conv1): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) (downsample): Conv2d(2048, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (cond): Modulation( (linear): Linear(in_features=1024, out_features=2048, bias=True) (conv): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) ) (se): SEModule( (se): Sequential( (0): AdaptiveAvgPool2d(output_size=(1, 1)) (1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): ReLU(inplace=True) (3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): Sigmoid() ) (attn): Sequential( (0): ChannelPool() (1): Conv2d(1, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False) (2): Sigmoid() ) ) ) (1): MovieBottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) (cond): Modulation( (linear): Linear(in_features=1024, out_features=1024, bias=True) (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) ) (se): SEModule( (se): Sequential( (0): AdaptiveAvgPool2d(output_size=(1, 1)) (1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): ReLU(inplace=True) (3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): Sigmoid() ) (attn): Sequential( (0): ChannelPool() (1): Conv2d(1, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False) (2): Sigmoid() ) ) ) (2): MovieBottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) (cond): Modulation( (linear): Linear(in_features=1024, out_features=1024, bias=True) (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) ) (se): SEModule( (se): Sequential( (0): AdaptiveAvgPool2d(output_size=(1, 1)) (1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): ReLU(inplace=True) (3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): Sigmoid() ) (attn): Sequential( (0): ChannelPool() (1): Conv2d(1, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False) (2): Sigmoid() ) ) ) (3): MovieBottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=1e-05) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=1e-05) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=1e-05) (relu): ReLU(inplace=True) (cond): Modulation( (linear): Linear(in_features=1024, out_features=1024, bias=True) (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) ) (se): SEModule( (se): Sequential( (0): AdaptiveAvgPool2d(output_size=(1, 1)) (1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): ReLU(inplace=True) (3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): Sigmoid() ) (attn): Sequential( (0): ChannelPool() (1): Conv2d(1, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False) (2): Sigmoid() ) ) ) ) ) ) (image_text_multi_modal_combine_layer): BranchCombineLayer( (linear_cga): ModuleList( (0): Linear(in_features=1024, out_features=2048, bias=True) (1): Linear(in_features=1024, out_features=2048, bias=True) ) (linear_cbn): ModuleList( (0): Linear(in_features=1024, out_features=2048, bias=True) (1): Linear(in_features=1024, out_features=2048, bias=True) ) (linear_ques): ModuleList( (0): Linear(in_features=1024, out_features=2048, bias=True) (1): Linear(in_features=1024, out_features=2048, bias=True) ) (layer_norm): ModuleList( (0): LayerNorm((2048,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((2048,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((2048,), eps=1e-05, elementwise_affine=True) ) ) (classifier): ClassifierLayer( (module): TripleLinear( (linears): ModuleList( (0): Linear(in_features=2048, out_features=3129, bias=True) (1): Linear(in_features=2048, out_features=3129, bias=True) (2): Linear(in_features=2048, out_features=3129, bias=True) ) ) ) (losses): Losses( (losses): ModuleList( (0): MMFLoss( (loss_criterion): TripleLogitBinaryCrossEntropy() ) ) ) ) ) 2022-04-24T08:37:29 | mmf.utils.general: Total Parameters: 254918110. Trained Parameters: 254918110 2022-04-24T08:37:29 | mmf.trainers.core.training_loop: Starting training... Traceback (most recent call last): File "/root/anaconda3/envs/mm/bin/mmf_run", line 33, in sys.exit(load_entry_point('mmf==1.0.0rc12', 'console_scripts', 'mmf_run')()) File "/home/cvpr/vqa/mmf-main/mmf_cli/run.py", line 129, in run nprocs=config.distributed.world_size, File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) File "/home/cvpr/vqa/mmf-main/mmf_cli/run.py", line 66, in distributed_main main(configuration, init_distributed=True, predict=predict) File "/home/cvpr/vqa/mmf-main/mmf_cli/run.py", line 56, in main trainer.train() File "/home/cvpr/vqa/mmf-main/mmf/trainers/mmf_trainer.py", line 145, in train self.training_loop() File "/home/cvpr/vqa/mmf-main/mmf/trainers/core/training_loop.py", line 33, in training_loop self.run_training_epoch() File "/home/cvpr/vqa/mmf-main/mmf/trainers/core/training_loop.py", line 91, in run_training_epoch report = self.run_training_batch(batch, num_batches_for_this_update) File "/home/cvpr/vqa/mmf-main/mmf/trainers/core/training_loop.py", line 166, in run_training_batch report = self._forward(batch) File "/home/cvpr/vqa/mmf-main/mmf/trainers/core/training_loop.py", line 200, in _forward model_output = self.model(prepared_batch) File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 799, in forward output = self.module(*inputs[0], *kwargs[0]) File "/home/cvpr/vqa/mmf-main/mmf/models/base_model.py", line 309, in call model_output = super().call(sample_list, args, kwargs) File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/cvpr/vqa/mmf-main/mmf/models/movie_mcan.py", line 266, in forward "image", sample_list, text_embedding_total, text_embedding_vec[:, 0] File "/home/cvpr/vqa/mmf-main/mmf/models/movie_mcan.py", line 243, in process_feature_embedding sample_list.text_mask, File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/cvpr/vqa/mmf-main/mmf/modules/embeddings.py", line 621, in forward x_cbn = self.cbn(x, v) File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/cvpr/vqa/mmf-main/mmf/modules/embeddings.py", line 589, in forward x, _ = cbn(x, v) File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/cvpr/vqa/mmf-main/mmf/modules/bottleneck.py", line 137, in forward x = self.conv1(x) + self.cond(x, cond) File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in forward return self._conv_forward(input, self.weight, self.bias) File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 440, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: Expected 4-dimensional input for 4-dimensional weight [256, 2048, 1, 1], but got 3-dimensional input of size [32, 100, 2048] instead

(mm) root@dgx5:/home/cvpr/vqa/mmf-main# Traceback (most recent call last): File "", line 1, in File "/root/anaconda3/envs/mm/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main exitcode = _main(fd) File "/root/anaconda3/envs/mm/lib/python3.7/multiprocessing/spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) _pickle.UnpicklingError: pickle data was truncated '''