facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

Visual bert hateful memes model key , how do I add fined tuned visual bert instead of pretrained? #710

Closed MikeQ-hash closed 1 year ago

MikeQ-hash commented 3 years ago

❓ Questions and Help

Hi I have further trained the visual bert model with the hateful memes via command line as shown in the documentation. However I am having difficulty in understanding how I can add that model in main via a python code.

For example

model = MMBT.from_pretrained("mmbt.hateful_memes.images")

This works, but :

model = VisualBERT.from_pretrained('visual_bert.pretrained.hateful_memes')

This does not work. In addition, I tried adding the trained file 'visual_bert_final.pth', however I am uncertain how to call it in?

I have not found any documentation on this. The error message I get is:

Traceback (most recent call last): File "try1.py", line 13, in model = VisualBERT.from_pretrained('visual_bert.pretrained.hateful_memes') File "/home/michael/hackathon/mmf/mmf/models/base_model.py", line 223, in from_pretrained output = load_pretrained_model(model_name_or_path, *args, *kwargs) File "/home/michael/hackathon/mmf/mmf/utils/checkpoint.py", line 117, in load_pretrained_model return _load_pretrained_model(model_name_or_path_or_checkpoint, args, kwargs) File "/home/michael/hackathon/mmf/mmf/utils/checkpoint.py", line 71, in _load_pretrained_model download_path = download_pretrained_model(model_name_or_path, args, **kwargs) File "/home/michael/hackathon/mmf/mmf/utils/download.py", line 352, in download_pretrained_model if "version" not in model_config or "resources" not in model_config: TypeError: argument of type 'NoneType' is not iterable

hackgoofer commented 3 years ago

Hi @MikeQ-hash, thank you for using mmf :)

We are currently working to consolidate all model interfaces. Do you mind sharing what your model folder contains under: $DATA_DIR (typically ~/.cache/torch/mmf/data/models/visual_bert.pretrained.hateful_memes).

In that folder, you'd need both the model file and also a config - for example:

MikeQ-hash commented 3 years ago

Hi @ytsheng , thank you for your reply. I have both of these files available. I am just not certain how to properly call them in the python code. I am trying to do this with mmbt first because it is easier to check directly from image (visual bert requires additional pre-processing). My code is based on the posted notebook:

from mmf.utils.env import setup_imports
setup_imports()

import matplotlib.pyplot as plt
import requests
import torch
from PIL import Image
from mmf.common.registry import registry
import pdb
from mmf.models.mmbt import MMBT
#from mmf.models.visual_bert import VisualBERT
filename = 'mmbt_stuff/save/mmbt_final.pth'
model = MMBT.from_pretrained("mmbt.hateful_memes.images") 
checkpoint = torch.load('mmbt_stuff/save/mmbt_final.pth')
model.load_state_dict(checkpoint)
optimizer.load_state_dict(checkpoint)
image_url = "https://i.imgur.com/tEcsk5q.jpg" 
text = "Something"

output = model.classify(image_url, text)

plt.imshow(Image.open(requests.get(image_url, stream=True).raw))
plt.axis("off")
plt.show()
hateful = "Yes" if output["label"] == 1 else "No"
print("Hateful as per the model?", hateful)
print(f"Model's confidence: {output['confidence'] * 100:.3f}%")

This is the error message:

Missing keys ['model.bert.mmbt.transformer.embeddings.position_ids'] in the checkpoint.
If this is not your checkpoint, please open up an issue on MMF GitHub. 
Unexpected keys if any: []
Traceback (most recent call last):
  File "final.py", line 15, in <module>
    model.load_state_dict(checkpoint)
  File "/home/michael/miniconda3/envs/hackathon2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MMBTGridHMInterface:
    Missing key(s) in state_dict: "model.model.bert.mmbt.transformer.embeddings.position_ids", "model.model.bert.mmbt.transformer.embeddings.word_embeddings.weight", "model.model.bert.mmbt.transformer.embeddings.position_embeddings.weight", "model.model.bert.mmbt.transformer.embeddings.token_type_embeddings.weight", "model.model.bert.mmbt.transformer.embeddings.LayerNorm.weight", "model.model.bert.mmbt.transformer.embeddings.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.0.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.0.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.0.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.0.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.0.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.0.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.0.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.1.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.1.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.1.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.1.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.1.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.1.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.1.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.2.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.2.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.2.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.2.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.2.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.2.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.2.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.3.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.3.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.3.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.3.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.3.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.3.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.3.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.4.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.4.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.4.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.4.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.4.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.4.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.4.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.5.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.5.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.5.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.5.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.5.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.5.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.5.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.6.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.6.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.6.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.6.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.6.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.6.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.6.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.7.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.7.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.7.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.7.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.7.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.7.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.7.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.8.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.8.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.8.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.8.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.8.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.8.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.8.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.9.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.9.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.9.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.9.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.9.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.9.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.9.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.10.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.10.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.10.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.10.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.10.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.10.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.10.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.self.query.weight", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.self.query.bias", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.self.key.weight", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.self.key.bias", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.self.value.weight", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.self.value.bias", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.11.attention.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.encoder.layer.11.intermediate.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.11.intermediate.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.11.output.dense.weight", "model.model.bert.mmbt.transformer.encoder.layer.11.output.dense.bias", "model.model.bert.mmbt.transformer.encoder.layer.11.output.LayerNorm.weight", "model.model.bert.mmbt.transformer.encoder.layer.11.output.LayerNorm.bias", "model.model.bert.mmbt.transformer.pooler.dense.weight", "model.model.bert.mmbt.transformer.pooler.dense.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.0.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.0.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.1.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.4.2.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.0.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.1.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.2.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.3.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.4.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.5.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.6.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.5.7.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.0.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.1.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.2.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.3.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.4.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.5.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.6.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.7.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.8.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.9.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.10.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.11.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.12.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.13.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.14.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.15.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.16.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.17.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.18.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.19.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.20.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.21.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.22.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.23.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.24.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.25.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.26.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.27.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.28.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.29.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.30.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.31.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.32.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.33.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.34.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.6.35.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.0.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.1.bn3.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.conv1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn1.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn1.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn1.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn1.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.conv2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn2.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn2.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn2.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn2.running_var", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.conv3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn3.weight", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn3.bias", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn3.running_mean", "model.model.bert.mmbt.modal_encoder.encoder.model.7.2.bn3.running_var", "model.model.bert.mmbt.modal_encoder.proj_embeddings.weight", "model.model.bert.mmbt.modal_encoder.proj_embeddings.bias", "model.model.bert.mmbt.modal_encoder.position_embeddings.weight", "model.model.bert.mmbt.modal_encoder.token_type_embeddings.weight", "model.model.bert.mmbt.modal_encoder.word_embeddings.weight", "model.model.bert.mmbt.modal_encoder.LayerNorm.weight", "model.model.bert.mmbt.modal_encoder.LayerNorm.bias", "model.model.classifier.0.dense.weight", "model.model.classifier.0.dense.bias", "model.model.classifier.0.LayerNorm.weight", "model.model.classifier.0.LayerNorm.bias", "model.model.classifier.1.weight", "model.model.classifier.1.bias". 
    Unexpected key(s) in state_dict: "model.bert.mmbt.transformer.embeddings.position_ids", "model.bert.mmbt.transformer.embeddings.word_embeddings.weight", "model.bert.mmbt.transformer.embeddings.position_embeddings.weight", "model.bert.mmbt.transformer.embeddings.token_type_embeddings.weight", "model.bert.mmbt.transformer.embeddings.LayerNorm.weight", "model.bert.mmbt.transformer.embeddings.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.0.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.0.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.0.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.0.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.0.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.0.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.0.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.0.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.0.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.0.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.0.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.0.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.0.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.0.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.0.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.0.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.1.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.1.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.1.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.1.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.1.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.1.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.1.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.1.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.1.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.1.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.1.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.1.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.1.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.1.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.1.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.1.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.2.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.2.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.2.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.2.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.2.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.2.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.2.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.2.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.2.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.2.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.2.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.2.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.2.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.2.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.2.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.2.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.3.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.3.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.3.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.3.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.3.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.3.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.3.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.3.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.3.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.3.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.3.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.3.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.3.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.3.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.3.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.3.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.4.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.4.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.4.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.4.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.4.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.4.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.4.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.4.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.4.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.4.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.4.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.4.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.4.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.4.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.4.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.4.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.5.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.5.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.5.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.5.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.5.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.5.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.5.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.5.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.5.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.5.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.5.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.5.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.5.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.5.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.5.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.5.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.6.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.6.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.6.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.6.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.6.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.6.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.6.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.6.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.6.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.6.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.6.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.6.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.6.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.6.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.6.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.6.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.7.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.7.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.7.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.7.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.7.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.7.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.7.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.7.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.7.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.7.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.7.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.7.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.7.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.7.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.7.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.7.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.8.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.8.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.8.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.8.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.8.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.8.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.8.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.8.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.8.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.8.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.8.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.8.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.8.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.8.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.8.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.8.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.9.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.9.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.9.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.9.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.9.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.9.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.9.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.9.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.9.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.9.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.9.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.9.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.9.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.9.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.9.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.9.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.10.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.10.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.10.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.10.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.10.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.10.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.10.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.10.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.10.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.10.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.10.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.10.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.10.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.10.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.10.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.10.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.11.attention.self.query.weight", "model.bert.mmbt.transformer.encoder.layer.11.attention.self.query.bias", "model.bert.mmbt.transformer.encoder.layer.11.attention.self.key.weight", "model.bert.mmbt.transformer.encoder.layer.11.attention.self.key.bias", "model.bert.mmbt.transformer.encoder.layer.11.attention.self.value.weight", "model.bert.mmbt.transformer.encoder.layer.11.attention.self.value.bias", "model.bert.mmbt.transformer.encoder.layer.11.attention.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.11.attention.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.11.attention.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.11.attention.output.LayerNorm.bias", "model.bert.mmbt.transformer.encoder.layer.11.intermediate.dense.weight", "model.bert.mmbt.transformer.encoder.layer.11.intermediate.dense.bias", "model.bert.mmbt.transformer.encoder.layer.11.output.dense.weight", "model.bert.mmbt.transformer.encoder.layer.11.output.dense.bias", "model.bert.mmbt.transformer.encoder.layer.11.output.LayerNorm.weight", "model.bert.mmbt.transformer.encoder.layer.11.output.LayerNorm.bias", "model.bert.mmbt.transformer.pooler.dense.weight", "model.bert.mmbt.transformer.pooler.dense.bias", "model.bert.mmbt.modal_encoder.encoder.model.0.weight", "model.bert.mmbt.modal_encoder.encoder.model.1.weight", "model.bert.mmbt.modal_encoder.encoder.model.1.bias", "model.bert.mmbt.modal_encoder.encoder.model.1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.0.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.0.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.0.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.0.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.0.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.1.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.1.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.0.downsample.1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.1.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.1.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.1.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.1.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.2.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.2.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.4.2.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.4.2.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.0.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.0.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.0.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.0.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.0.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.1.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.0.downsample.1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.1.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.1.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.1.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.1.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.2.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.2.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.2.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.2.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.3.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.3.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.3.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.3.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.4.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.4.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.4.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.4.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.5.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.5.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.5.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.5.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.6.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.6.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.6.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.6.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.7.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.7.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.5.7.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.5.7.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.0.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.0.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.0.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.0.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.0.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.0.downsample.1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.1.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.1.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.1.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.1.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.2.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.2.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.2.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.2.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.3.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.3.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.3.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.3.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.4.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.4.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.4.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.4.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.5.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.5.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.5.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.5.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.6.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.6.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.6.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.6.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.7.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.7.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.7.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.7.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.8.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.8.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.8.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.8.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.9.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.9.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.9.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.9.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.10.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.10.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.10.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.10.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.11.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.11.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.11.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.11.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.12.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.12.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.12.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.12.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.13.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.13.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.13.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.13.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.14.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.14.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.14.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.14.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.15.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.15.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.15.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.15.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.16.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.16.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.16.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.16.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.17.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.17.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.17.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.17.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.18.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.18.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.18.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.18.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.19.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.19.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.19.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.19.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.20.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.20.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.20.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.20.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.21.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.21.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.21.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.21.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.22.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.22.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.22.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.22.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.23.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.23.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.23.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.23.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.24.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.24.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.24.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.24.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.25.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.25.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.25.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.25.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.26.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.26.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.26.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.26.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.27.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.27.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.27.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.27.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.28.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.28.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.28.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.28.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.29.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.29.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.29.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.29.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.30.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.30.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.30.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.30.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.31.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.31.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.31.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.31.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.32.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.32.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.32.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.32.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.33.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.33.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.33.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.33.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.34.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.34.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.34.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.34.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.35.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.35.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.6.35.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.6.35.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.0.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.0.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.0.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.0.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.0.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.1.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.1.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.0.downsample.1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.1.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.1.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.1.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.1.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.2.conv1.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn1.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn1.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn1.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn1.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn1.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.2.conv2.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn2.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn2.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn2.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn2.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn2.num_batches_tracked", "model.bert.mmbt.modal_encoder.encoder.model.7.2.conv3.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn3.weight", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn3.bias", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn3.running_mean", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn3.running_var", "model.bert.mmbt.modal_encoder.encoder.model.7.2.bn3.num_batches_tracked", "model.bert.mmbt.modal_encoder.proj_embeddings.weight", "model.bert.mmbt.modal_encoder.proj_embeddings.bias", "model.bert.mmbt.modal_encoder.position_embeddings.weight", "model.bert.mmbt.modal_encoder.token_type_embeddings.weight", "model.bert.mmbt.modal_encoder.word_embeddings.weight", "model.bert.mmbt.modal_encoder.LayerNorm.weight", "model.bert.mmbt.modal_encoder.LayerNorm.bias", "model.classifier.0.dense.weight", "model.classifier.0.dense.bias", "model.classifier.0.LayerNorm.weight", "model.classifier.0.LayerNorm.bias", "model.classifier.1.weight", "model.classifier.1.bias". 
apsdehal commented 3 years ago

Can you try doing:

model.model.load_state_dict("your_path")

instead of

model.load_state_dict("your_path")

in your code?

MikeQ-hash commented 3 years ago

Hi all,

I managed to make it work (I double checked the performance on test set)

Here is the code below for mmbt (visual bert requires the respective pre-processing):

 filename_pth = './save/mmbt_final.pth'#your pth
    model = MMBT("./save/config.yaml").from_pretrained('mmbt.hateful_memes.images') #note your config path
    ckpt = torch.load(filename_pth)
    own_state = model.state_dict()

    temp = 0
    for name, param in ckpt.items():
        name = 'model.' + name
        if name not in own_state:
            print('fail')
            continue
        temp += 1
        print('succes')
        own_state[name].copy_(param)
    print(temp)

    #image_url = "something.jpg"  #
    #text = "something"

    output = model.classify(filename, text)
    plt.imshow(img)
    plt.axis("off")
    plt.show()
    hateful = "Yes" if output["label"] == 1 else "No"
    print("Hateful as per the model?", hateful)
    print(f"Model's confidence: {output['confidence'] * 100:.3f}%")