loss does not drop - Githubissues

Hi, Thank you for releasing the code. I was trying to repeat the experiments. However, it seems the loss does not drop at all. Maybe I was not running it correctly. Would you please help me to check the training procedure?

I was training on the mmimdb dataset (but the same issue observed for the other 2 datasets as well).

CUDA_VISIBLE_DEVICES=0 python run.py with data_root=PATH_TO_DATA/mmimdb \
        num_gpus=1 \
        num_nodes=1 \
        per_gpu_batchsize=16 \
        task_finetune_mmimdb \
        load_path=PATH_TO_PRETRAIN/vilt_200k_mlm_itm.ckpt \
        exp_name=finetune_mmimdb \
        missing_table_root=PATH_TO_RESULT/missing_table \
        log_dir=PATH_TO_RESULT/log \
        prompt_type=input

Here is the printout in console

WARNING - root - Changed type of config entry "max_steps" from int to NoneType
WARNING - ViLT - No observers have been added to this run
INFO - ViLT - Running command 'main'
INFO - ViLT - Started
Global seed set to 0
INFO - torch.distributed.nn.jit.instantiator - Created a temporary directory at /tmp/tmp3iqqs12z
INFO - torch.distributed.nn.jit.instantiator - Writing /tmp/tmp3iqqs12z/_remote_module_non_scriptable.py
here -- multilayer enabled=True
Learning of Prompt is enabled
256 16 1 1
/dev38/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:147: UserWarning: You passed `deterministic=True` and `benchmark=True`. Note that PyTorch ignores torch.backends.cudnn.deterministic=True when torch.backends.cudnn.benchmark=True.
  rank_zero_warn(
/dev38/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:455: UserWarning: The flag `devices=gpu` will be ignored, instead the device specific number 1 will be used
  rank_zero_warn(
Using 16bit native Automatic Mixed Precision (AMP)
/dev38/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py:52: LightningDeprecationWarning: Setting `max_steps = None` is deprecated in v1.5 and will no longer be supported in v1.7. Use `max_steps = -1` instead.
  rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/dev38/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:61: LightningDeprecationWarning: Setting `Trainer(flush_logs_every_n_steps=10)` is deprecated in v1.5 and will be removed in v1.7. Please configure flushing in the logger instead.
  rank_zero_deprecation(
Global seed set to 0
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 0
INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/dev38/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(

  | Name                   | Type              | Params
-------------------------------------------------------------
0 | text_embeddings        | BertEmbeddings    | 24.2 M
1 | token_type_embeddings  | Embedding         | 1.5 K 
2 | transformer            | VisionTransformer | 87.5 M
3 | pooler                 | Pooler            | 590 K 
4 | mmimdb_classifier      | Sequential        | 1.2 M 
5 | train_mmimdb_F1_scores | F1_Score          | 0     
6 | train_mmimdb_loss      | Scalar            | 0     
7 | val_mmimdb_F1_scores   | F1_Score          | 0     
8 | val_mmimdb_loss        | Scalar            | 0     
-------------------------------------------------------------
2.0 M     Trainable params
111 M     Non-trainable params
113 M     Total params
227.583   Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]/dev38/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:486: PossibleUserWarning: Your `val_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test/predict dataloaders.
  rank_zero_warn(
Sanity Checking DataLoader 0:   0%|                                              | 0/2 [00:00<?, ?it/s]/dev38/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/dev38/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:72: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 16. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`

YiLunLee / missing_aware_prompts

loss does not drop #8