ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.09k stars 728 forks source link

Bug with multi-modal classification and aother problem #1473

Open MirageTurtle opened 1 year ago

MirageTurtle commented 1 year ago

Describe the bug I'll discuss some bug which I think it is, and ask help for multi-modal classification with simpletransformers Firstly, I receive the exception: AttributeError: 'MMBTConfig' object has no attribute 'use_return_dict' while training the following model: #1115 After avoiding it as @simepy in #1115 said, I received this one: if isinstance(data, tuple): UnboundLocalError: local variable 'data' referenced before assignment as #1115 shows, and I think this is a bug. After fix this bug, when I want to load model for evaluating, I received: Can't set use_return_dict with value True for BertConfig { AttributeError: can't set attribute So, I delete use_return_dict: true in my config.json, and then, another warning when loading model: Some weights of BertModel were not initialized from the model checkpoint at outputs and are newly initialized: Just as I expected, the eval result is bad. Am I using it incorrectly? I think there's not a problem when using simpletransformers to load the model which simpletransformers trained.

To Reproduce What I run when I load model for evaluating after training:

sample_df = pd.read_csv("/path/to/my_data.csv")
label_list = list(set(sample_df["labels"].to_list()))
train_df, eval_df = train_test_split(sample_df, test_size=0.2)
cuda_available = torch.cuda.is_available()
model_args = MultiModalClassificationArgs(
    num_train_epochs=1,
    fp16=False,  # I got this config in #1115
)
model = MultiModalClassificationModel(
    model_type="bert",
    # model_name="bert-base-multilingual-cased",  # the first time I load bert and train
    model_name="outputs/checkpoint-4-epoch-1",  # when I want to load model for evaluating after training
    use_cuda=cuda_available,
    label_list=label_list,
    args=model_args,
)
model.config.use_return_dict = True  # I got this one in #1115, and I set True because I think True is the default value
# model.train_model(train_df, image_path="/path/to/image/dir/")
result, model_outputs = model.eval_model(eval_df, image_path="/path/to/image/dir/")
print(result)

About the bug which I think It is the error occurs at https://github.com/ThilinaRajapakse/simpletransformers/blob/4e40ca1659b2e7851d9c350144fe6de6c18be612/simpletransformers/classification/multi_modal_classification_model.py#L1013-L1014 and this is the first occurrence of the variable data, if function annotation is ignored. And there is just variable eval_data in function parameters: https://github.com/ThilinaRajapakse/simpletransformers/blob/4e40ca1659b2e7851d9c350144fe6de6c18be612/simpletransformers/classification/multi_modal_classification_model.py#L944-L958 So, I think this maybe a omission of one revision, and it's OK by renaming eval_data to data.

Desktop (please complete the following information):

MirageTurtle commented 1 year ago

After test, I guess, the weights of BertModel were not initialized from the model checkpoint at outputs correspond to some weights of the model checkpoint at outputs were not used when initializing BertModel, like mmbt.transformer.encoder.layer.0.intermediate.dense.weight and encoder.layer.0.intermediate.dense.weight. These are the weighs reported in warning.

Some weights of the model checkpoint at outputs were not used when initializing BertModel: ['mmbt.transformer.encoder.layer.0.intermediate.dense.weight', 'mmbt.transformer.encoder.layer.7.attention.self.query.weight', 'mmbt.modal_encoder.encoder.model.6.29.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.5.5.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.28.bn1.weight', 'mmbt.modal_encoder.encoder.model.7.0.downsample.1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.7.bn2.running_mean', 'mmbt.transformer.encoder.layer.0.attention.self.query.weight', 'mmbt.modal_encoder.encoder.model.5.3.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.7.bn3.running_mean', 'mmbt.transformer.encoder.layer.8.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.6.29.bn2.bias', 'mmbt.transformer.encoder.layer.9.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.24.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.5.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.6.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.15.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.7.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.32.bn2.bias', 'mmbt.transformer.encoder.layer.11.attention.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.8.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.3.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.22.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.22.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.24.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.13.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.27.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.5.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.24.bn1.weight', 'mmbt.modal_encoder.encoder.model.7.1.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.6.conv1.weight', 'mmbt.modal_encoder.encoder.model.7.0.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.9.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.7.bn3.weight', 'mmbt.modal_encoder.encoder.model.4.0.downsample.1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.4.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.22.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.1.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.3.bn3.bias', 'mmbt.modal_encoder.encoder.model.4.1.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.20.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.22.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.24.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.34.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.6.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.27.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.4.0.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.4.1.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.31.conv1.weight', 'mmbt.modal_encoder.encoder.model.4.0.conv1.weight', 'mmbt.transformer.encoder.layer.5.intermediate.dense.bias', 'mmbt.modal_encoder.encoder.model.6.1.bn3.running_var', 'mmbt.modal_encoder.encoder.model.5.5.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.33.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.9.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.33.bn3.num_batches_tracked', 'mmbt.transformer.encoder.layer.6.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.5.6.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.28.bn3.weight', 'mmbt.modal_encoder.encoder.model.5.6.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.23.conv1.weight', 'mmbt.transformer.encoder.layer.0.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.6.5.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.8.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.12.bn3.bias', 'mmbt.modal_encoder.proj_embeddings.bias', 'mmbt.transformer.encoder.layer.11.output.dense.weight', 'mmbt.transformer.encoder.layer.2.attention.self.value.bias', 'mmbt.modal_encoder.encoder.model.6.30.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.8.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.31.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.9.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.32.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.1.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.16.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.27.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.4.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.16.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.19.bn3.weight', 'mmbt.transformer.encoder.layer.5.attention.self.query.weight', 'mmbt.transformer.pooler.dense.weight', 'mmbt.modal_encoder.encoder.model.6.1.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.28.bn3.running_var', 'mmbt.modal_encoder.encoder.model.4.2.conv3.weight', 'mmbt.modal_encoder.encoder.model.5.3.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.11.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.20.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.4.2.bn1.weight', 'mmbt.modal_encoder.encoder.model.5.1.bn1.running_var', 'mmbt.modal_encoder.encoder.model.5.2.bn2.weight', 'mmbt.modal_encoder.encoder.model.5.6.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.14.bn2.weight', 'mmbt.modal_encoder.encoder.model.4.0.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.19.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.10.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.14.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.30.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.5.bn3.weight', 'mmbt.modal_encoder.encoder.model.7.0.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.35.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.1.conv1.weight', 'mmbt.modal_encoder.encoder.model.1.running_var', 'mmbt.modal_encoder.encoder.model.6.16.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.35.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.5.4.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.12.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.6.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.13.bn3.running_mean', 'mmbt.transformer.encoder.layer.11.output.dense.bias', 'mmbt.transformer.encoder.layer.5.attention.self.value.bias', 'classifier.bias', 'mmbt.transformer.encoder.layer.9.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.30.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.35.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.7.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.0.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.4.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.24.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.21.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.4.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.18.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.11.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.32.bn3.running_mean', 'mmbt.transformer.encoder.layer.7.output.dense.bias', 'mmbt.modal_encoder.encoder.model.7.2.bn3.weight', 'mmbt.modal_encoder.encoder.model.5.1.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.7.0.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.11.conv2.weight', 'mmbt.transformer.encoder.layer.8.intermediate.dense.bias', 'mmbt.modal_encoder.encoder.model.5.0.bn3.running_mean', 'mmbt.transformer.encoder.layer.4.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.6.0.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.23.bn2.weight', 'mmbt.transformer.encoder.layer.2.attention.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.4.2.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.4.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.5.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.23.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.18.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.8.bn2.weight', 'mmbt.transformer.encoder.layer.3.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.5.6.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.5.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.35.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.18.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.7.2.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.2.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.11.bn3.running_var', 'mmbt.transformer.encoder.layer.8.attention.self.query.weight', 'mmbt.modal_encoder.encoder.model.6.11.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.2.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.6.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.22.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.34.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.7.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.31.bn1.bias', 'mmbt.transformer.encoder.layer.4.attention.self.query.bias', 'mmbt.transformer.encoder.layer.0.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.5.4.conv3.weight', 'mmbt.modal_encoder.encoder.model.4.1.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.25.conv3.weight', 'mmbt.transformer.encoder.layer.5.attention.self.query.bias', 'mmbt.modal_encoder.encoder.model.6.5.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.4.2.bn1.bias', 'mmbt.modal_encoder.encoder.model.5.4.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.7.1.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.34.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.3.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.14.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.31.bn3.num_batches_tracked', 'mmbt.modal_encoder.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.4.1.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.22.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.14.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.7.1.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.16.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.22.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.7.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.0.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.6.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.0.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.29.bn1.running_var', 'mmbt.modal_encoder.encoder.model.5.4.conv1.weight', 'mmbt.transformer.encoder.layer.8.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.12.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.33.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.4.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.26.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.14.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.33.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.5.0.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.0.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.5.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.11.conv3.weight', 'mmbt.transformer.encoder.layer.8.attention.output.dense.bias', 'mmbt.transformer.encoder.layer.8.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.15.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.7.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.20.bn3.bias', 'mmbt.modal_encoder.encoder.model.7.2.conv1.weight', 'mmbt.modal_encoder.encoder.model.4.0.downsample.1.running_var', 'mmbt.transformer.encoder.layer.10.attention.self.query.bias', 'mmbt.modal_encoder.encoder.model.6.7.bn2.running_var', 'mmbt.modal_encoder.encoder.model.5.0.bn3.num_batches_tracked', 'mmbt.transformer.encoder.layer.0.attention.self.value.bias', 'mmbt.modal_encoder.encoder.model.6.20.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.7.conv3.weight', 'mmbt.modal_encoder.encoder.model.5.7.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.8.bn1.running_var', 'mmbt.modal_encoder.encoder.model.5.0.conv2.weight', 'mmbt.transformer.encoder.layer.4.attention.self.query.weight', 'mmbt.transformer.encoder.layer.1.attention.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.15.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.10.bn2.running_mean', 'mmbt.transformer.encoder.layer.9.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.17.conv2.weight', 'mmbt.transformer.encoder.layer.1.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.1.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.25.bn2.weight', 'mmbt.transformer.encoder.layer.4.attention.self.value.bias', 'mmbt.modal_encoder.encoder.model.5.1.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.33.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.18.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.17.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.6.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.0.downsample.0.weight', 'mmbt.modal_encoder.encoder.model.6.3.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.17.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.10.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.13.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.4.1.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.35.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.11.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.9.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.27.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.21.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.27.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.5.3.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.4.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.2.bn3.running_var', 'mmbt.modal_encoder.encoder.model.5.3.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.7.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.30.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.16.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.16.bn3.running_var', 'mmbt.modal_encoder.encoder.model.5.3.bn3.bias', 'mmbt.transformer.encoder.layer.8.attention.self.value.bias', 'mmbt.transformer.encoder.layer.2.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.5.5.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.0.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.5.0.bn1.weight', 'mmbt.transformer.encoder.layer.7.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.7.1.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.29.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.31.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.16.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.9.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.34.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.5.5.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.23.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.21.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.9.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.3.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.23.conv3.weight', 'mmbt.transformer.embeddings.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.7.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.5.4.bn2.running_var', 'mmbt.transformer.encoder.layer.2.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.6.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.19.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.22.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.14.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.16.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.11.bn1.num_batches_tracked', 'mmbt.transformer.encoder.layer.1.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.6.24.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.16.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.25.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.28.bn1.bias', 'mmbt.modal_encoder.encoder.model.5.6.bn1.running_var', 'mmbt.transformer.encoder.layer.0.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.6.31.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.12.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.35.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.1.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.22.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.7.0.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.12.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.4.0.conv3.weight', 'mmbt.transformer.encoder.layer.5.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.6.19.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.21.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.0.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.2.bn2.bias', 'mmbt.modal_encoder.encoder.model.4.1.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.13.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.25.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.10.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.25.conv2.weight', 'mmbt.transformer.encoder.layer.8.attention.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.29.conv3.weight', 'mmbt.transformer.encoder.layer.11.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.6.9.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.30.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.27.bn3.running_var', 'mmbt.transformer.encoder.layer.7.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.15.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.5.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.1.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.21.bn3.running_var', 'mmbt.modal_encoder.encoder.model.7.2.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.4.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.17.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.0.downsample.1.bias', 'mmbt.modal_encoder.encoder.model.7.0.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.1.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.3.conv2.weight', 'mmbt.transformer.encoder.layer.9.attention.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.33.bn2.running_var', 'mmbt.modal_encoder.encoder.model.5.0.downsample.1.weight', 'mmbt.modal_encoder.encoder.model.6.17.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.17.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.5.2.conv1.weight', 'mmbt.transformer.encoder.layer.1.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.6.11.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.17.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.11.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.1.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.30.bn3.num_batches_tracked', 'mmbt.transformer.encoder.layer.6.attention.self.key.bias', 'mmbt.transformer.encoder.layer.7.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.4.2.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.5.bn1.weight', 'mmbt.modal_encoder.encoder.model.0.weight', 'mmbt.modal_encoder.encoder.model.6.20.bn2.running_var', 'mmbt.transformer.encoder.layer.1.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.16.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.5.0.bn2.running_var', 'mmbt.transformer.encoder.layer.9.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.6.28.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.2.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.21.bn3.bias', 'mmbt.transformer.encoder.layer.4.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.6.15.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.25.conv1.weight', 'mmbt.transformer.encoder.layer.2.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.6.32.bn1.running_var', 'mmbt.modal_encoder.encoder.model.5.4.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.34.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.24.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.26.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.4.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.18.bn3.weight', 'mmbt.modal_encoder.encoder.model.7.2.bn1.running_mean', 'mmbt.transformer.encoder.layer.10.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.27.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.12.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.30.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.31.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.33.bn3.bias', 'mmbt.transformer.embeddings.word_embeddings.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.35.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.20.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.10.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.27.conv3.weight', 'mmbt.modal_encoder.encoder.model.4.2.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.8.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.31.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.2.bn1.running_mean', 'mmbt.transformer.encoder.layer.10.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.5.6.bn2.bias', 'mmbt.transformer.encoder.layer.7.intermediate.dense.weight', 'mmbt.transformer.encoder.layer.10.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.1.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.6.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.29.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.31.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.4.1.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn1.weight', 'mmbt.transformer.encoder.layer.11.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.6.26.bn3.bias', 'mmbt.modal_encoder.encoder.model.7.0.downsample.1.bias', 'classifier.weight', 'mmbt.modal_encoder.encoder.model.5.0.downsample.1.running_mean', 'mmbt.modal_encoder.encoder.model.6.15.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.4.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.5.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.0.downsample.1.running_mean', 'mmbt.modal_encoder.encoder.model.6.28.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.6.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.26.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.7.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.17.bn2.num_batches_tracked', 'mmbt.transformer.encoder.layer.8.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.6.19.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.32.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.4.0.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.15.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.32.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.6.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.3.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.5.3.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.17.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.19.bn3.bias', 'mmbt.modal_encoder.proj_embeddings.weight', 'mmbt.modal_encoder.encoder.model.6.22.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.1.bn2.weight', 'mmbt.modal_encoder.encoder.model.5.0.downsample.0.weight', 'mmbt.transformer.encoder.layer.0.intermediate.dense.bias', 'mmbt.modal_encoder.encoder.model.5.0.bn1.bias', 'mmbt.transformer.encoder.layer.2.output.dense.weight', 'mmbt.transformer.encoder.layer.10.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.6.32.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.12.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.0.bn2.num_batches_tracked', 'mmbt.transformer.encoder.layer.6.attention.output.LayerNorm.weight', 'mmbt.transformer.encoder.layer.10.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.8.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.30.bn3.bias', 'mmbt.modal_encoder.encoder.model.7.1.bn3.running_mean', 'mmbt.transformer.encoder.layer.2.attention.self.value.weight', 'mmbt.transformer.encoder.layer.11.intermediate.dense.bias', 'mmbt.modal_encoder.encoder.model.6.13.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.21.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.24.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.35.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.19.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.7.bn3.running_var', 'mmbt.transformer.encoder.layer.10.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.6.bn1.running_var', 'mmbt.transformer.encoder.layer.6.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.21.bn3.weight', 'mmbt.modal_encoder.encoder.model.4.2.bn3.running_var', 'mmbt.transformer.encoder.layer.10.attention.output.LayerNorm.bias', 'mmbt.transformer.embeddings.position_embeddings.weight', 'mmbt.modal_encoder.encoder.model.6.27.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.29.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.27.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.1.bn2.bias', 'mmbt.modal_encoder.word_embeddings.weight', 'mmbt.modal_encoder.encoder.model.5.1.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.27.bn1.running_var', 'mmbt.modal_encoder.encoder.model.7.0.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.4.bn1.num_batches_tracked', 'mmbt.transformer.encoder.layer.2.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.15.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.5.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.13.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.5.bn2.weight', 'mmbt.modal_encoder.encoder.model.4.0.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.33.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.17.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.14.bn1.running_var', 'mmbt.transformer.encoder.layer.6.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.5.2.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.21.bn1.weight', 'mmbt.transformer.encoder.layer.11.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.5.2.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.21.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.33.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.34.bn1.running_var', 'mmbt.modal_encoder.encoder.model.1.running_mean', 'mmbt.modal_encoder.encoder.model.6.28.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.34.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.19.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.7.2.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.32.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.12.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.30.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.32.bn3.bias', 'mmbt.modal_encoder.encoder.model.5.1.bn1.running_mean', 'mmbt.transformer.encoder.layer.0.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.7.2.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.4.0.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.10.bn1.weight', 'mmbt.modal_encoder.encoder.model.7.0.bn1.weight', 'mmbt.transformer.encoder.layer.0.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.5.4.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.1.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.35.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.1.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.7.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.19.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.20.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.18.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.33.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.20.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.24.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.29.bn3.weight', 'mmbt.transformer.encoder.layer.7.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.7.0.downsample.1.running_mean', 'mmbt.modal_encoder.encoder.model.6.23.bn2.running_mean', 'mmbt.transformer.encoder.layer.1.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.5.0.downsample.1.bias', 'mmbt.modal_encoder.encoder.model.6.24.bn2.bias', 'mmbt.modal_encoder.encoder.model.7.0.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.4.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.2.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.18.bn2.running_mean', 'mmbt.transformer.encoder.layer.2.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.27.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.26.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.21.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.0.bn1.running_mean', 'mmbt.transformer.encoder.layer.0.attention.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.5.5.bn1.bias', 'mmbt.modal_encoder.encoder.model.7.2.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.4.conv2.weight', 'mmbt.modal_encoder.encoder.model.7.2.bn2.bias', 'mmbt.modal_encoder.encoder.model.5.3.bn1.weight', 'mmbt.modal_encoder.encoder.model.5.7.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.21.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.25.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.0.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.26.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.13.bn1.running_var', 'mmbt.transformer.encoder.layer.5.attention.self.key.weight', 'mmbt.transformer.encoder.layer.1.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.16.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.26.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.12.bn2.running_var', 'mmbt.transformer.encoder.layer.3.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.10.bn3.bias', 'mmbt.modal_encoder.encoder.model.7.1.bn1.weight', 'mmbt.transformer.encoder.layer.4.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.6.2.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.9.bn1.running_mean', 'mmbt.transformer.encoder.layer.2.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.6.11.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.20.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.35.bn2.running_var', 'mmbt.modal_encoder.encoder.model.5.7.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.23.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.17.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.0.bn3.running_var', 'mmbt.modal_encoder.encoder.model.5.2.bn3.weight', 'mmbt.modal_encoder.encoder.model.5.0.downsample.1.running_var', 'mmbt.transformer.encoder.layer.3.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.6.28.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.24.bn3.running_mean', 'mmbt.transformer.encoder.layer.6.attention.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.13.bn2.bias', 'mmbt.modal_encoder.encoder.model.7.0.bn2.running_mean', 'mmbt.transformer.encoder.layer.1.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.5.6.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.26.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.23.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.31.bn2.bias', 'mmbt.transformer.encoder.layer.1.attention.self.query.bias', 'mmbt.transformer.encoder.layer.2.intermediate.dense.bias', 'mmbt.transformer.encoder.layer.3.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.6.5.bn2.running_mean', 'mmbt.transformer.encoder.layer.5.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.6.17.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.0.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.33.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.1.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.14.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.26.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.21.bn1.running_var', 'mmbt.transformer.encoder.layer.10.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.6.9.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.10.bn3.running_var', 'mmbt.transformer.encoder.layer.5.attention.output.LayerNorm.bias', 'mmbt.transformer.encoder.layer.4.attention.output.LayerNorm.weight', 'mmbt.transformer.encoder.layer.8.attention.self.key.weight', 'mmbt.transformer.encoder.layer.11.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.6.3.bn1.bias', 'mmbt.modal_encoder.encoder.model.5.2.bn2.num_batches_tracked', 'mmbt.transformer.encoder.layer.1.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.35.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.33.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.34.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.15.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.24.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.9.bn1.running_var', 'mmbt.transformer.encoder.layer.7.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.7.0.bn3.num_batches_tracked', 'mmbt.transformer.encoder.layer.10.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.9.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.26.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.3.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.2.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.10.conv1.weight', 'mmbt.modal_encoder.encoder.model.4.2.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.4.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.26.bn2.bias', 'mmbt.transformer.encoder.layer.3.intermediate.dense.bias', 'mmbt.modal_encoder.encoder.model.6.15.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.23.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.5.bn3.weight', 'mmbt.transformer.encoder.layer.11.attention.self.value.bias', 'mmbt.modal_encoder.encoder.model.5.0.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.23.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.20.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.7.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.6.conv3.weight', 'mmbt.transformer.encoder.layer.4.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.30.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.18.bn1.running_mean', 'mmbt.transformer.encoder.layer.4.attention.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.1.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.3.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.4.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.28.conv1.weight', 'mmbt.modal_encoder.encoder.model.7.0.bn2.running_var', 'mmbt.modal_encoder.encoder.model.5.2.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.23.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.22.conv3.weight', 'mmbt.transformer.embeddings.token_type_embeddings.weight', 'mmbt.modal_encoder.encoder.model.6.6.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.3.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.10.bn1.bias', 'mmbt.transformer.encoder.layer.6.attention.self.query.weight', 'mmbt.modal_encoder.encoder.model.6.27.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.6.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.4.bn1.running_var', 'mmbt.modal_encoder.encoder.model.5.1.bn2.running_mean', 'mmbt.transformer.encoder.layer.3.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.5.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.3.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.28.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.15.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.34.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.1.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.19.bn3.num_batches_tracked', 'mmbt.transformer.encoder.layer.4.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.30.bn3.running_mean', 'mmbt.transformer.encoder.layer.5.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.12.bn2.weight', 'mmbt.modal_encoder.position_embeddings.weight', 'mmbt.transformer.encoder.layer.4.output.LayerNorm.weight', 'mmbt.transformer.encoder.layer.6.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.7.bn2.weight', 'mmbt.modal_encoder.encoder.model.5.0.downsample.1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.5.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.7.1.bn2.bias', 'mmbt.modal_encoder.encoder.model.7.2.bn1.running_var', 'mmbt.modal_encoder.encoder.model.5.2.bn2.running_var', 'mmbt.transformer.encoder.layer.7.attention.self.query.bias', 'mmbt.modal_encoder.encoder.model.6.17.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.12.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.30.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.18.conv3.weight', 'mmbt.modal_encoder.encoder.model.5.1.bn2.bias', 'mmbt.modal_encoder.encoder.model.4.0.downsample.0.weight', 'mmbt.modal_encoder.encoder.model.6.16.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.2.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.4.0.bn2.running_var', 'mmbt.modal_encoder.encoder.model.7.1.bn3.bias', 'mmbt.modal_encoder.encoder.model.5.7.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.13.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.34.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.29.conv1.weight', 'mmbt.transformer.encoder.layer.10.intermediate.dense.bias', 'mmbt.transformer.encoder.layer.2.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.4.0.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.16.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn2.bias', 'mmbt.modal_encoder.encoder.model.5.5.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.0.downsample.1.weight', 'mmbt.modal_encoder.encoder.model.5.4.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.2.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.11.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.28.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.0.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.15.bn2.running_mean', 'mmbt.transformer.encoder.layer.6.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.7.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.20.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.21.bn2.num_batches_tracked', 'mmbt.transformer.encoder.layer.11.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.29.bn1.bias', 'mmbt.modal_encoder.encoder.model.5.7.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.1.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.18.bn1.weight', 'mmbt.transformer.encoder.layer.4.intermediate.dense.bias', 'mmbt.modal_encoder.encoder.model.7.0.downsample.0.weight', 'mmbt.modal_encoder.encoder.model.6.31.conv2.weight', 'mmbt.transformer.encoder.layer.5.output.dense.weight', 'mmbt.modal_encoder.encoder.model.5.4.bn3.bias', 'mmbt.transformer.encoder.layer.9.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.6.10.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.27.bn2.weight', 'mmbt.modal_encoder.token_type_embeddings.weight', 'mmbt.modal_encoder.encoder.model.5.7.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.13.bn2.running_mean', 'mmbt.transformer.encoder.layer.11.attention.self.query.weight', 'mmbt.modal_encoder.encoder.model.5.2.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.33.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.4.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.5.7.conv2.weight', 'mmbt.transformer.encoder.layer.9.intermediate.dense.bias', 'mmbt.transformer.embeddings.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.4.0.bn2.bias', 'mmbt.transformer.encoder.layer.1.attention.self.value.bias', 'mmbt.transformer.encoder.layer.3.attention.self.query.bias', 'mmbt.modal_encoder.encoder.model.6.14.bn3.weight', 'mmbt.transformer.encoder.layer.4.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.28.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.24.bn3.num_batches_tracked', 'mmbt.transformer.encoder.layer.7.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.17.conv3.weight', 'mmbt.transformer.encoder.layer.8.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.4.2.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.13.bn1.weight', 'mmbt.modal_encoder.encoder.model.7.1.bn3.running_var', 'mmbt.modal_encoder.encoder.model.4.2.bn1.running_mean', 'mmbt.transformer.encoder.layer.9.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.6.24.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.27.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.22.bn1.weight', 'mmbt.transformer.encoder.layer.1.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.13.conv3.weight', 'mmbt.modal_encoder.encoder.model.5.5.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.8.bn3.weight', 'mmbt.modal_encoder.encoder.model.7.1.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.20.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.2.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.3.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.10.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.1.bn3.running_var', 'mmbt.modal_encoder.encoder.model.5.7.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.17.conv1.weight', 'mmbt.modal_encoder.encoder.model.7.2.bn2.running_var', 'mmbt.transformer.encoder.layer.7.intermediate.dense.bias', 'mmbt.modal_encoder.encoder.model.6.19.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.27.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.14.conv2.weight', 'mmbt.modal_encoder.encoder.model.7.2.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.32.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.3.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.2.bn2.bias', 'mmbt.modal_encoder.encoder.model.5.5.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.0.bn3.weight', 'mmbt.modal_encoder.encoder.model.5.4.bn3.weight', 'mmbt.modal_encoder.encoder.model.5.2.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.35.bn1.running_var', 'mmbt.modal_encoder.encoder.model.5.1.bn2.num_batches_tracked', 'mmbt.transformer.encoder.layer.2.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.11.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.26.bn1.bias', 'mmbt.transformer.encoder.layer.9.attention.self.value.bias', 'mmbt.modal_encoder.encoder.model.6.7.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.4.0.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.7.bn1.running_mean', 'mmbt.transformer.encoder.layer.2.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.6.20.bn3.weight', 'mmbt.modal_encoder.encoder.model.4.2.bn3.bias', 'mmbt.transformer.encoder.layer.8.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.1.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.33.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.8.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.29.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.4.0.bn2.running_mean', 'mmbt.transformer.encoder.layer.7.attention.self.value.bias', 'mmbt.modal_encoder.encoder.model.6.15.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.16.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.16.bn2.bias', 'mmbt.transformer.encoder.layer.9.attention.self.query.bias', 'mmbt.modal_encoder.encoder.model.6.19.bn1.bias', 'mmbt.transformer.encoder.layer.9.attention.self.value.weight', 'mmbt.transformer.encoder.layer.2.attention.self.query.weight', 'mmbt.modal_encoder.encoder.model.6.12.bn2.running_mean', 'mmbt.transformer.encoder.layer.10.attention.self.value.bias', 'mmbt.modal_encoder.encoder.model.6.14.bn3.bias', 'mmbt.transformer.encoder.layer.10.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.6.29.bn3.num_batches_tracked', 'mmbt.transformer.encoder.layer.6.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.6.20.conv1.weight', 'mmbt.transformer.encoder.layer.4.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.6.bn3.running_var', 'mmbt.modal_encoder.encoder.model.5.1.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.3.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.6.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.7.conv1.weight', 'mmbt.modal_encoder.encoder.model.4.1.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.18.conv2.weight', 'mmbt.modal_encoder.encoder.model.7.1.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.13.bn3.bias', 'mmbt.modal_encoder.encoder.model.5.4.bn3.running_var', 'mmbt.modal_encoder.encoder.model.4.0.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.18.bn1.bias', 'mmbt.transformer.encoder.layer.6.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.2.bn2.weight', 'mmbt.transformer.encoder.layer.3.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.2.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.5.5.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.23.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.0.bn2.weight', 'mmbt.modal_encoder.encoder.model.5.3.bn3.running_var', 'mmbt.modal_encoder.encoder.model.4.1.bn3.num_batches_tracked', 'mmbt.transformer.encoder.layer.9.attention.output.dense.weight', 'mmbt.transformer.encoder.layer.3.attention.output.LayerNorm.bias', 'mmbt.transformer.encoder.layer.0.output.dense.bias', 'mmbt.transformer.encoder.layer.10.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.10.bn2.running_var', 'mmbt.transformer.encoder.layer.7.attention.output.dense.bias', 'mmbt.transformer.encoder.layer.0.attention.self.query.bias', 'mmbt.modal_encoder.encoder.model.7.0.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.6.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.5.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.2.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.21.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.7.2.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.2.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.32.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.16.bn1.num_batches_tracked', 'mmbt.modal_encoder.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.32.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.14.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.23.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.1.bn2.running_var', 'mmbt.transformer.encoder.layer.3.attention.self.value.bias', 'mmbt.transformer.encoder.layer.11.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.3.bn3.weight', 'mmbt.modal_encoder.encoder.model.7.0.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.20.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.0.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.2.bn3.bias', 'mmbt.modal_encoder.encoder.model.4.0.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.5.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.8.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.2.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.22.bn3.bias', 'mmbt.modal_encoder.encoder.model.7.1.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.12.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.14.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.5.1.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.30.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.14.bn2.bias', 'mmbt.modal_encoder.encoder.model.5.3.bn2.running_var', 'mmbt.modal_encoder.encoder.model.5.0.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.10.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.9.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.29.bn3.bias', 'mmbt.modal_encoder.encoder.model.7.2.conv3.weight', 'mmbt.transformer.encoder.layer.3.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.6.8.bn2.running_var', 'mmbt.modal_encoder.encoder.model.4.2.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.2.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.12.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.2.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.9.bn3.weight', 'mmbt.transformer.encoder.layer.9.attention.self.query.weight', 'mmbt.modal_encoder.encoder.model.6.5.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.1.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.5.6.bn2.running_var', 'mmbt.modal_encoder.encoder.model.5.6.bn2.running_mean', 'mmbt.transformer.encoder.layer.4.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.4.0.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.4.bn1.running_mean', 'mmbt.transformer.encoder.layer.6.attention.self.value.bias', 'mmbt.modal_encoder.encoder.model.6.18.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.30.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.30.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.35.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.16.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.0.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.19.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.28.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.0.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.3.bn1.running_mean', 'mmbt.transformer.encoder.layer.0.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.5.3.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.7.1.bn3.weight', 'mmbt.modal_encoder.encoder.model.4.1.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.11.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.7.0.downsample.1.running_var', 'mmbt.transformer.encoder.layer.5.output.LayerNorm.weight', 'mmbt.transformer.encoder.layer.6.intermediate.dense.bias', 'mmbt.modal_encoder.encoder.model.6.18.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.8.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.4.0.downsample.1.weight', 'mmbt.transformer.encoder.layer.8.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.24.bn1.running_var', 'mmbt.transformer.encoder.layer.5.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.6.21.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.31.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.0.downsample.1.running_mean', 'mmbt.modal_encoder.encoder.model.6.6.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.16.bn2.weight', 'mmbt.modal_encoder.encoder.model.5.7.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.18.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.24.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.4.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.35.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.1.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.6.bn2.weight', 'mmbt.transformer.encoder.layer.1.intermediate.dense.bias', 'mmbt.modal_encoder.encoder.model.7.1.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.30.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.33.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.31.bn3.bias', 'mmbt.transformer.encoder.layer.10.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.27.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.6.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.4.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.9.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.0.bn2.num_batches_tracked', 'mmbt.transformer.encoder.layer.0.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.13.bn1.bias', 'mmbt.modal_encoder.encoder.model.5.5.bn3.running_var', 'mmbt.modal_encoder.encoder.model.5.3.conv2.weight', 'mmbt.transformer.encoder.layer.11.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.7.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.32.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.6.bn1.num_batches_tracked', 'mmbt.transformer.encoder.layer.2.attention.self.query.bias', 'mmbt.transformer.encoder.layer.0.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.10.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.12.bn3.num_batches_tracked', 'mmbt.transformer.encoder.layer.1.attention.self.query.weight', 'mmbt.modal_encoder.encoder.model.5.2.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.14.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.31.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.26.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.7.0.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.34.bn2.running_var', 'mmbt.transformer.encoder.layer.11.attention.self.query.bias', 'mmbt.modal_encoder.encoder.model.6.31.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.0.downsample.1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.0.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.31.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.5.6.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.5.6.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.19.bn1.running_mean', 'mmbt.transformer.encoder.layer.1.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.31.bn1.running_var', 'mmbt.modal_encoder.encoder.model.5.4.bn2.num_batches_tracked', 'mmbt.transformer.encoder.layer.8.attention.self.query.bias', 'mmbt.modal_encoder.encoder.model.6.12.bn3.running_mean', 'mmbt.transformer.encoder.layer.5.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.30.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.35.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.28.conv2.weight', 'mmbt.modal_encoder.encoder.model.5.2.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.15.bn2.weight', 'mmbt.modal_encoder.encoder.model.1.bias', 'mmbt.modal_encoder.encoder.model.6.29.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.19.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.29.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.0.downsample.1.running_var', 'mmbt.modal_encoder.encoder.model.6.34.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.11.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.3.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.35.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.15.bn2.running_var', 'mmbt.modal_encoder.encoder.model.5.7.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.28.bn3.bias', 'mmbt.transformer.encoder.layer.9.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.26.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.29.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.35.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.14.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.33.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.24.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.29.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.23.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.10.bn2.bias', 'mmbt.modal_encoder.encoder.model.4.2.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.5.3.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.15.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.32.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.21.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.7.1.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.4.1.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.7.1.bn1.bias', 'mmbt.modal_encoder.encoder.model.5.1.bn3.weight', 'mmbt.transformer.encoder.layer.9.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.3.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.13.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.34.bn1.running_mean', 'mmbt.transformer.encoder.layer.7.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.17.bn1.num_batches_tracked', 'mmbt.transformer.encoder.layer.3.attention.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.6.17.bn3.bias', 'mmbt.transformer.encoder.layer.3.output.LayerNorm.weight', 'mmbt.modal_encoder.encoder.model.5.3.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.1.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.14.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.2.conv1.weight', 'mmbt.modal_encoder.encoder.model.7.1.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.13.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.25.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.31.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.15.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.11.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.12.bn1.weight', 'mmbt.modal_encoder.encoder.model.5.7.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.33.bn1.running_var', 'mmbt.transformer.encoder.layer.6.intermediate.dense.weight', 'mmbt.modal_encoder.encoder.model.5.5.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn2.running_var', 'mmbt.transformer.encoder.layer.4.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.19.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.22.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.5.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.8.bn2.bias', 'mmbt.transformer.encoder.layer.10.attention.self.query.weight', 'mmbt.modal_encoder.encoder.model.6.28.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.7.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.18.bn2.weight', 'mmbt.modal_encoder.encoder.model.4.1.bn2.running_var', 'mmbt.modal_encoder.encoder.model.7.0.bn1.bias', 'mmbt.modal_encoder.encoder.model.7.0.bn2.weight', 'mmbt.transformer.pooler.dense.bias', 'mmbt.modal_encoder.encoder.model.5.0.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.0.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.9.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.28.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.23.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.10.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.6.2.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.32.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.1.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.6.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.26.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.24.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.20.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.10.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.18.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.27.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.22.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.7.0.downsample.1.weight', 'mmbt.modal_encoder.encoder.model.6.30.bn1.num_batches_tracked', 'mmbt.transformer.encoder.layer.9.output.dense.bias', 'mmbt.modal_encoder.encoder.model.5.0.bn2.running_mean', 'mmbt.transformer.encoder.layer.5.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.5.5.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.0.conv3.weight', 'mmbt.transformer.embeddings.position_ids', 'mmbt.modal_encoder.encoder.model.6.0.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.9.bn1.weight', 'mmbt.transformer.encoder.layer.8.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.17.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.15.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.13.bn2.weight', 'mmbt.modal_encoder.encoder.model.5.2.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.5.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.26.conv1.weight', 'mmbt.modal_encoder.encoder.model.7.0.bn3.bias', 'mmbt.modal_encoder.encoder.model.5.7.bn1.running_var', 'mmbt.modal_encoder.encoder.model.5.6.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.3.conv3.weight', 'mmbt.modal_encoder.encoder.model.5.1.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.34.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.23.bn2.running_var', 'mmbt.transformer.encoder.layer.8.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.0.bn1.bias', 'mmbt.transformer.encoder.layer.6.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.20.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.23.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.32.bn1.weight', 'mmbt.modal_encoder.encoder.model.5.0.bn2.bias', 'mmbt.modal_encoder.encoder.model.7.0.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.19.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.6.bn1.bias', 'mmbt.modal_encoder.encoder.model.4.0.downsample.1.bias', 'mmbt.modal_encoder.encoder.model.6.34.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.26.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.11.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.21.conv2.weight', 'mmbt.transformer.encoder.layer.3.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.4.2.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.9.conv2.weight', 'mmbt.transformer.encoder.layer.3.output.dense.weight', 'mmbt.modal_encoder.encoder.model.5.3.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.33.bn2.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.20.bn2.weight', 'mmbt.modal_encoder.encoder.model.5.2.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.5.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.8.bn1.bias', 'mmbt.modal_encoder.encoder.model.5.4.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.11.bn2.bias', 'mmbt.modal_encoder.encoder.model.6.22.bn2.running_mean', 'mmbt.transformer.encoder.layer.0.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.35.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.7.2.conv2.weight', 'mmbt.transformer.encoder.layer.11.attention.output.dense.weight', 'mmbt.modal_encoder.encoder.model.6.7.bn1.running_var', 'mmbt.modal_encoder.encoder.model.6.23.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.5.2.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.3.bn2.num_batches_tracked', 'mmbt.transformer.encoder.layer.5.attention.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.8.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.32.conv1.weight', 'mmbt.transformer.encoder.layer.3.attention.self.query.weight', 'mmbt.transformer.encoder.layer.11.attention.self.key.weight', 'mmbt.modal_encoder.encoder.model.6.32.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.5.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.34.bn1.bias', 'mmbt.modal_encoder.encoder.model.6.29.bn1.weight', 'mmbt.transformer.encoder.layer.7.attention.output.LayerNorm.bias', 'mmbt.modal_encoder.encoder.model.6.8.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.13.bn3.weight', 'mmbt.modal_encoder.encoder.model.6.4.bn3.weight', 'mmbt.modal_encoder.encoder.model.1.weight', 'mmbt.modal_encoder.encoder.model.6.22.bn3.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.7.2.bn3.running_var', 'mmbt.modal_encoder.encoder.model.6.9.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.22.bn3.weight', 'mmbt.modal_encoder.encoder.model.4.1.bn3.bias', 'mmbt.modal_encoder.encoder.model.6.4.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.14.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.34.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.0.bn1.weight', 'mmbt.modal_encoder.encoder.model.7.1.bn2.running_var', 'mmbt.modal_encoder.encoder.model.5.0.conv1.weight', 'mmbt.modal_encoder.encoder.model.5.2.bn1.weight', 'mmbt.modal_encoder.encoder.model.6.28.bn2.running_var', 'mmbt.modal_encoder.encoder.model.6.19.bn2.running_mean', 'mmbt.modal_encoder.encoder.model.6.12.conv1.weight', 'mmbt.modal_encoder.encoder.model.6.34.conv2.weight', 'mmbt.modal_encoder.encoder.model.6.8.conv1.weight', 'mmbt.modal_encoder.encoder.model.7.2.bn1.num_batches_tracked', 'mmbt.modal_encoder.encoder.model.6.18.bn3.running_var', 'mmbt.transformer.encoder.layer.6.attention.self.query.bias', 'mmbt.modal_encoder.encoder.model.6.4.conv3.weight', 'mmbt.modal_encoder.encoder.model.6.25.bn1.running_var', 'mmbt.transformer.encoder.layer.1.attention.self.value.weight', 'mmbt.modal_encoder.encoder.model.5.0.bn1.running_var', 'mmbt.modal_encoder.encoder.model.4.1.bn2.running_mean', 'mmbt.transformer.encoder.layer.5.output.dense.bias', 'mmbt.modal_encoder.encoder.model.6.8.bn1.weight', 'mmbt.modal_encoder.encoder.model.4.1.bn1.bias', 'mmbt.transformer.encoder.layer.7.attention.self.key.bias', 'mmbt.modal_encoder.encoder.model.4.2.bn2.weight', 'mmbt.modal_encoder.encoder.model.6.26.bn1.running_mean', 'mmbt.modal_encoder.encoder.model.6.3.conv1.weight', 'mmbt.modal_encoder.encoder.model.4.1.bn3.running_mean', 'mmbt.modal_encoder.encoder.model.7.2.bn2.running_mean']
Some weights of BertModel were not initialized from the model checkpoint at outputs and are newly initialized: ['encoder.layer.9.attention.output.dense.weight', 'encoder.layer.7.attention.self.query.weight', 'encoder.layer.3.attention.output.dense.weight', 'encoder.layer.0.output.dense.bias', 'encoder.layer.9.attention.self.value.bias', 'encoder.layer.1.attention.output.LayerNorm.bias', 'encoder.layer.4.attention.self.query.bias', 'encoder.layer.2.intermediate.dense.bias', 'encoder.layer.1.attention.output.LayerNorm.weight', 'encoder.layer.10.attention.output.dense.bias', 'encoder.layer.10.output.dense.bias', 'encoder.layer.5.output.LayerNorm.weight', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.10.attention.self.query.weight', 'encoder.layer.7.attention.self.value.bias', 'encoder.layer.7.output.dense.bias', 'encoder.layer.5.attention.output.dense.weight', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.2.attention.self.key.weight', 'encoder.layer.6.output.dense.bias', 'embeddings.LayerNorm.bias', 'encoder.layer.4.attention.self.query.weight', 'encoder.layer.3.intermediate.dense.weight', 'pooler.dense.bias', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.9.output.dense.bias', 'encoder.layer.8.output.dense.weight', 'encoder.layer.11.attention.self.value.bias', 'encoder.layer.1.attention.self.query.bias', 'encoder.layer.8.attention.self.value.weight', 'encoder.layer.3.attention.self.key.bias', 'encoder.layer.11.attention.self.key.bias', 'encoder.layer.2.attention.output.LayerNorm.weight', 'encoder.layer.5.output.dense.weight', 'encoder.layer.0.attention.self.key.weight', 'encoder.layer.5.attention.output.LayerNorm.weight', 'encoder.layer.4.attention.output.dense.weight', 'encoder.layer.3.attention.self.key.weight', 'encoder.layer.8.output.LayerNorm.bias', 'encoder.layer.2.attention.self.value.bias', 'encoder.layer.3.attention.output.LayerNorm.bias', 'encoder.layer.4.output.dense.weight', 'encoder.layer.1.attention.self.value.bias', 'encoder.layer.0.attention.self.key.bias', 'encoder.layer.8.intermediate.dense.bias', 'encoder.layer.9.attention.output.dense.bias', 'encoder.layer.9.intermediate.dense.bias', 'encoder.layer.9.output.LayerNorm.bias', 'embeddings.LayerNorm.weight', 'encoder.layer.2.attention.output.LayerNorm.bias', 'encoder.layer.6.attention.self.query.bias', 'encoder.layer.5.attention.self.key.bias', 'encoder.layer.5.output.dense.bias', 'encoder.layer.10.output.LayerNorm.bias', 'encoder.layer.1.attention.self.key.weight', 'encoder.layer.0.attention.output.dense.bias', 'encoder.layer.2.output.LayerNorm.weight', 'encoder.layer.9.attention.output.LayerNorm.weight', 'encoder.layer.9.attention.output.LayerNorm.bias', 'encoder.layer.5.intermediate.dense.weight', 'encoder.layer.1.output.dense.bias', 'encoder.layer.5.attention.self.query.weight', 'encoder.layer.1.attention.output.dense.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.8.attention.self.key.bias', 'encoder.layer.10.attention.output.LayerNorm.bias', 'encoder.layer.4.intermediate.dense.weight', 'encoder.layer.5.attention.self.value.weight', 'encoder.layer.0.attention.self.query.bias', 'encoder.layer.5.intermediate.dense.bias', 'encoder.layer.11.output.dense.bias', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.6.attention.output.dense.weight', 'encoder.layer.11.intermediate.dense.weight', 'encoder.layer.7.attention.output.LayerNorm.weight', 'encoder.layer.4.attention.self.value.bias', 'encoder.layer.2.attention.self.query.bias', 'encoder.layer.2.attention.output.dense.weight', 'encoder.layer.3.attention.output.dense.bias', 'encoder.layer.5.output.LayerNorm.bias', 'encoder.layer.6.attention.self.query.weight', 'encoder.layer.6.attention.output.dense.bias', 'encoder.layer.10.attention.self.query.bias', 'encoder.layer.2.intermediate.dense.weight', 'encoder.layer.4.attention.output.dense.bias', 'encoder.layer.7.attention.output.dense.bias', 'encoder.layer.11.attention.self.value.weight', 'encoder.layer.3.intermediate.dense.bias', 'embeddings.word_embeddings.weight', 'encoder.layer.10.attention.self.value.weight', 'encoder.layer.6.attention.self.key.weight', 'encoder.layer.5.attention.output.LayerNorm.bias', 'encoder.layer.1.attention.self.query.weight', 'encoder.layer.10.output.dense.weight', 'encoder.layer.4.output.LayerNorm.bias', 'encoder.layer.4.attention.self.key.weight', 'encoder.layer.4.attention.output.LayerNorm.weight', 'encoder.layer.7.intermediate.dense.weight', 'encoder.layer.5.attention.self.query.bias', 'encoder.layer.6.attention.self.value.bias', 'encoder.layer.8.attention.output.LayerNorm.weight', 'encoder.layer.3.attention.output.LayerNorm.weight', 'encoder.layer.5.attention.output.dense.bias', 'embeddings.token_type_embeddings.weight', 'encoder.layer.8.attention.self.value.bias', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.6.output.dense.weight', 'pooler.dense.weight', 'encoder.layer.10.attention.output.LayerNorm.weight', 'encoder.layer.10.attention.self.key.bias', 'encoder.layer.11.output.LayerNorm.bias', 'encoder.layer.6.intermediate.dense.bias', 'encoder.layer.10.attention.self.key.weight', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.9.attention.self.value.weight', 'encoder.layer.10.intermediate.dense.bias', 'encoder.layer.8.intermediate.dense.weight', 'encoder.layer.1.attention.self.value.weight', 'encoder.layer.5.attention.self.key.weight', 'encoder.layer.11.intermediate.dense.bias', 'encoder.layer.4.attention.self.key.bias', 'encoder.layer.7.attention.self.key.weight', 'encoder.layer.0.attention.self.query.weight', 'encoder.layer.11.attention.self.key.weight', 'encoder.layer.4.attention.self.value.weight', 'encoder.layer.0.attention.output.dense.weight', 'encoder.layer.4.output.LayerNorm.weight', 'encoder.layer.4.output.dense.bias', 'encoder.layer.7.attention.self.query.bias', 'encoder.layer.10.intermediate.dense.weight', 'encoder.layer.7.output.dense.weight', 'encoder.layer.9.attention.self.query.bias', 'encoder.layer.2.attention.output.dense.bias', 'encoder.layer.11.attention.output.LayerNorm.bias', 'encoder.layer.2.output.dense.weight', 'encoder.layer.8.attention.self.query.weight', 'encoder.layer.3.attention.self.value.bias', 'encoder.layer.0.attention.output.LayerNorm.weight', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.3.attention.self.query.bias', 'encoder.layer.7.output.LayerNorm.bias', 'encoder.layer.10.attention.output.dense.weight', 'encoder.layer.11.output.LayerNorm.weight', 'encoder.layer.0.attention.output.LayerNorm.bias', 'encoder.layer.8.attention.self.key.weight', 'encoder.layer.1.attention.output.dense.weight', 'encoder.layer.6.attention.self.value.weight', 'encoder.layer.7.attention.output.LayerNorm.bias', 'encoder.layer.6.output.LayerNorm.bias', 'encoder.layer.8.attention.output.LayerNorm.bias', 'encoder.layer.3.output.LayerNorm.weight', 'encoder.layer.3.attention.self.query.weight', 'encoder.layer.4.attention.output.LayerNorm.bias', 'encoder.layer.10.attention.self.value.bias', 'encoder.layer.7.intermediate.dense.bias', 'encoder.layer.2.output.dense.bias', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.5.attention.self.value.bias', 'encoder.layer.7.attention.self.value.weight', 'encoder.layer.6.output.LayerNorm.weight', 'encoder.layer.9.output.dense.weight', 'encoder.layer.11.attention.self.query.bias', 'encoder.layer.11.output.dense.weight', 'encoder.layer.8.attention.self.query.bias', 'encoder.layer.6.attention.self.key.bias', 'encoder.layer.8.attention.output.dense.weight', 'embeddings.position_embeddings.weight', 'encoder.layer.3.attention.self.value.weight', 'encoder.layer.0.output.dense.weight', 'encoder.layer.1.attention.self.key.bias', 'encoder.layer.7.output.LayerNorm.weight', 'encoder.layer.8.output.dense.bias', 'encoder.layer.9.attention.self.query.weight', 'encoder.layer.11.attention.self.query.weight', 'encoder.layer.3.output.dense.bias', 'encoder.layer.9.attention.self.key.weight', 'encoder.layer.7.attention.self.key.bias', 'encoder.layer.0.attention.self.value.weight', 'encoder.layer.6.attention.output.LayerNorm.weight', 'encoder.layer.11.attention.output.dense.weight', 'encoder.layer.3.output.LayerNorm.bias', 'encoder.layer.6.intermediate.dense.weight', 'encoder.layer.8.attention.output.dense.bias', 'encoder.layer.2.attention.self.value.weight', 'encoder.layer.9.output.LayerNorm.weight', 'encoder.layer.3.output.dense.weight', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.4.intermediate.dense.bias', 'encoder.layer.7.attention.output.dense.weight', 'encoder.layer.9.attention.self.key.bias', 'encoder.layer.9.intermediate.dense.weight', 'encoder.layer.10.output.LayerNorm.weight', 'encoder.layer.2.attention.self.key.bias', 'encoder.layer.11.attention.output.dense.bias', 'encoder.layer.8.output.LayerNorm.weight', 'encoder.layer.6.attention.output.LayerNorm.bias', 'encoder.layer.0.attention.self.value.bias', 'encoder.layer.2.attention.self.query.weight', 'encoder.layer.11.attention.output.LayerNorm.weight']
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.