Open xuewyang opened 4 years ago
^
doesn't mean complementing. It tells regex to match the start of the text. Here's a lazy version to freeze the first 15 layers. You probably can make better use of regex (look up Python regex syntax):
no_grad:
- ^bert.encoder.layer.0
- ^bert.encoder.layer.1
- ^bert.encoder.layer.2
- ^bert.encoder.layer.3
# and so on
- ^bert.encoder.layer.15
In yaml, -
means it's we have a list. So parameter_groups
expects a list of groups, where each group is a list containing two objects. The first object is a list of regex expressions, while the second object is a dictionary containing optimizer parameters specific to that group.
In any case parameter_groups
is only useful if you want to specify, say a different learning rate, for each layer. Otherwise you can just ignore it.
Gotcha. I actually tried this but failed. I think the reason might be layer.1 would cover layer.11-19 and layer.2 would cover layer.21-23. So I used the following instead and it worked. Thank you.
Hi Alasdair, I know that when my training is crashed, I can use recover to resume the training. However, this is only feasible when the config file does not change anything at all. If I want to fine-tune on a new dataset, I have change the data reader but fix the model. Do you know how to achieve this? Thank you.
Hi Alasdair, I am finishing my paper using latex. I noticed that you used upright or topdown (instead of parallel) text for GoodNews and NYTimes800K in table 3. I want to follow that format. Can you explain how to achieve this in latex?
Here's my table:
\usepackage{graphicx} % rotate box
\usepackage{multirow} % merge multiple rows
\usepackage{tabularx} % for 'tabularx' environment
\begin{table*}[t]
\caption {Results on GoodNews (rows 1--10) and NYTimes800k (rows 11--19).
We report BLEU-4, ROUGE, CIDEr, and precision (P) \& recall (R) of
named entities, people's names, and rare proper nouns. Precision and
recall are expressed as percentages. Rows 1--2 contain previous
state-of-the-art results \cite{Biten2019GoodNews}. Rows 3--5 and 11--13
are ablation studies where we swap the Transformer with an LSTM and/or
RoBERTa with GloVe. These models only have the image attention (IA).
Rows 6 \& 14 are our baseline RoBERTa transformer language model that
only has the article text (and not the image) as inputs. Building on
top of this, we first add attention over image patches (rows 7 \& 15).
We then take a weighted sum of the RoBERTa embeddings (rows 8 \& 16)
and attend to the text surrounding the image instead of the first 512
tokens of the article (row 17). Finally we add attention over faces
(rows 9 \& 18) and objects (rows 10 \& 19) in the image.}
\label{tab:results}
\centering
\begin{tabularx}{\textwidth}{llXXX XX XX XX}
\toprule
&
& \multirow{2}{*}{\mbox{\small{BLEU-4}}}
& \multirow{2}{*}{\small{ROUGE}}
& \multirow{2}{*}{\small{CIDEr}}
& \multicolumn{2}{l}{\small{Named entities}}
& \multicolumn{2}{l}{\small{People's names}}
& \multicolumn{2}{l}{\small{Rare proper nouns}} \\
& & & & & \small{P} & \small{R} & \small{P} & \small{R} & \small{P} & \small{R} \\
\midrule
\multirow{9}{*}{\rotatebox[origin=c]{90}{GoodNews}}
& (1) Biten (Avg + CtxIns)~\cite{Biten2019GoodNews} & 0.89 & 12.2 & 13.1 & 8.23 & 6.06 & 9.38 & 6.55 & 1.06 & 12.5 \\
& (2) Biten (TBB + AttIns)~\cite{Biten2019GoodNews} & 0.76 & 12.2 & 12.7 & 8.87 & 5.64 & 11.9 & 6.98 & 1.58 & 12.6 \\
\cmidrule{2-11}
& (3) LSTM + GloVe + IA & 1.97 & 13.6 & 13.9 & 10.7 & 7.09 & 9.07 & 5.36 & 0 & 0 \\
& (4) Transformer + GloVe + IA & 3.48 & 17.0 & 25.2 & 14.3 & 11.1 & 14.5 & 10.5 & 0 & 0 \\
& (5) LSTM + RoBERTa + IA & 3.45 & 17.0 & 28.6 & 15.5 & 12.0 & 16.4 & 12.4 & 2.75 & 8.64 \\
\cmidrule{2-11}
& (6) Transformer + RoBERTa & 4.60 & 18.6 & 40.9 & 19.3 & 16.1 & 24.4 & 18.7 & 10.7 & 18.7 \\
& (7) \quad + image attention & 5.45 & 20.7 & 48.5 & 21.1 & 17.4 & 26.9 & 20.7 & 12.2 & 20.9 \\
& (8) \quad\quad + weighted RoBERTa & 6.0 & 21.2 & 53.1 & 21.8 & 18.5 & 28.8 & 22.8 & 16.2 & 26.0 \\
& (9) \quad\quad\quad + face attention & \textbf{6.05} & \textbf{21.4} & \textbf{54.3} & 22.0 & 18.6 & \textbf{29.3} & \textbf{23.3} & 15.5 & 24.5 \\
& (10) \quad\quad\quad\quad + object attention & \textbf{6.05} & \textbf{21.4} & 53.8 & \textbf{22.2} & \textbf{18.7} & 29.2 & 23.1 & \textbf{15.6} & \textbf{26.3} \\
\midrule
\midrule
\multirow{8}{*}{\rotatebox[origin=c]{90}{NYTimes800k}}
& (11) LSTM + GloVe + IA & 1.77 & 13.1 & 12.1 & 10.2 & 7.24 & 8.83 & 5.73 & 0 & 0 \\
& (12) Transformer + GloVe + IA & 2.75 & 15.9 & 20.3 & 13.2 & 10.8 & 13.2 & 9.66 & 0 & 0 \\
& (13) LSTM + RoBERTa + IA & 3.29 & 16.1 & 24.9 & 15.1 & 12.9 & 17.7 & 14.4 & 7.47 & 9.50 \\
\cmidrule{2-11}
& (14) Transformer + RoBERTa & 4.26 & 17.3 & 33.9 & 17.8 & 16.3 & 23.6 & 19.7 & 21.1 & 16.7 \\
& (15) \quad + image attention & 5.01 & 19.4 & 40.3 & 20.0 & 18.1 & 28.2 & 23.0 & 24.3 & 19.3 \\
& (16) \quad\quad + weighted RoBERTa & 5.75 & 19.9 & 45.1 & 21.1 & 19.6 & 29.7 & 25.4 & 29.6 & 22.8 \\
& (17) \quad\quad\quad + location-aware & 6.36 & 21.4 & 52.8 & 24.0 & 21.9 & 35.4 & 30.2 & 33.8 & \textbf{27.2} \\
& (18) \quad\quad\quad\quad + face attention & 6.26 & 21.5 & 53.9 & 24.2 & 22.1 & 36.5 & 30.8 & 33.4 & 26.4 \\
& (19) \quad\quad\quad\quad\quad + object attention & \textbf{6.30} & \textbf{21.7} & \textbf{54.4} & \textbf{24.6} & \textbf{22.2} & \textbf{37.3} & \textbf{31.1} & \textbf{34.2} & 27.0 \\
\bottomrule
\end{tabularx}
\end{table*}
I have a simple question. I tried to solve it but didn't make it. I want to update the higher layers of bert while keep the lower layers fixed. For example, the below INFO. If I want to keep the layers 0-15 fixed and finetune 16-23, how to do this? I think I have to change something here and here. I know the "parameter_groups" is where we update the parameters, and the "no_grad" is where we fixed the parameters. So my questions are:
bert.encoder.layer.0.attention.self.query.weight INFO bert.encoder.layer.0.attention.self.query.bias INFO bert.encoder.layer.0.attention.self.key.weight INFO bert.encoder.layer.0.attention.self.key.bias INFO bert.encoder.layer.0.attention.self.value.weight INFO bert.encoder.layer.0.attention.self.value.bias INFO bert.encoder.layer.0.attention.output.dense.weight INFO bert.encoder.layer.0.attention.output.dense.bias INFO bert.encoder.layer.0.attention.output.LayerNorm.weight INFO bert.encoder.layer.0.attention.output.LayerNorm.bias INFO bert.encoder.layer.0.intermediate.dense.weight INFO bert.encoder.layer.0.intermediate.dense.bias INFO bert.encoder.layer.0.output.dense.weight INFO bert.encoder.layer.0.output.dense.bias INFO bert.encoder.layer.0.output.LayerNorm.weight INFO bert.encoder.layer.0.output.LayerNorm.bias INFO bert.encoder.layer.1.attention.self.query.weight INFO bert.encoder.layer.1.attention.self.query.bias INFO bert.encoder.layer.1.attention.self.key.weight INFO bert.encoder.layer.1.attention.self.key.bias INFO bert.encoder.layer.1.attention.self.value.weight INFO bert.encoder.layer.1.attention.self.value.bias INFO bert.encoder.layer.1.attention.output.dense.weight INFO bert.encoder.layer.1.attention.output.dense.bias INFO bert.encoder.layer.1.attention.output.LayerNorm.weight INFO bert.encoder.layer.1.attention.output.LayerNorm.bias INFO bert.encoder.layer.1.intermediate.dense.weight INFO bert.encoder.layer.1.intermediate.dense.bias INFO bert.encoder.layer.1.output.dense.weight INFO bert.encoder.layer.1.output.dense.bias INFO bert.encoder.layer.1.output.LayerNorm.weight INFO bert.encoder.layer.1.output.LayerNorm.bias INFO bert.encoder.layer.2.attention.self.query.weight INFO bert.encoder.layer.2.attention.self.query.bias INFO bert.encoder.layer.2.attention.self.key.weight INFO bert.encoder.layer.2.attention.self.key.bias INFO bert.encoder.layer.2.attention.self.value.weight INFO bert.encoder.layer.2.attention.self.value.bias INFO bert.encoder.layer.2.attention.output.dense.weight INFO bert.encoder.layer.2.attention.output.dense.bias INFO bert.encoder.layer.2.attention.output.LayerNorm.weight INFO bert.encoder.layer.2.attention.output.LayerNorm.bias INFO bert.encoder.layer.2.intermediate.dense.weight INFO bert.encoder.layer.2.intermediate.dense.bias INFO bert.encoder.layer.2.output.dense.weight INFO bert.encoder.layer.2.output.dense.bias INFO bert.encoder.layer.2.output.LayerNorm.weight INFO bert.encoder.layer.2.output.LayerNorm.bias INFO bert.encoder.layer.3.attention.self.query.weight INFO bert.encoder.layer.3.attention.self.query.bias INFO bert.encoder.layer.3.attention.self.key.weight INFO bert.encoder.layer.3.attention.self.key.bias INFO bert.encoder.layer.3.attention.self.value.weight INFO bert.encoder.layer.3.attention.self.value.bias INFO bert.encoder.layer.3.attention.output.dense.weight INFO bert.encoder.layer.3.attention.output.dense.bias INFO bert.encoder.layer.3.attention.output.LayerNorm.weight INFO bert.encoder.layer.3.attention.output.LayerNorm.bias INFO bert.encoder.layer.3.intermediate.dense.weight INFO bert.encoder.layer.3.intermediate.dense.bias INFO bert.encoder.layer.3.output.dense.weight INFO bert.encoder.layer.3.output.dense.bias INFO bert.encoder.layer.3.output.LayerNorm.weight INFO bert.encoder.layer.3.output.LayerNorm.bias INFO bert.encoder.layer.4.attention.self.query.weight INFO bert.encoder.layer.4.attention.self.query.bias INFO bert.encoder.layer.4.attention.self.key.weight INFO bert.encoder.layer.4.attention.self.key.bias INFO bert.encoder.layer.4.attention.self.value.weight INFO bert.encoder.layer.4.attention.self.value.bias INFO bert.encoder.layer.4.attention.output.dense.weight INFO bert.encoder.layer.4.attention.output.dense.bias INFO bert.encoder.layer.4.attention.output.LayerNorm.weight INFO bert.encoder.layer.4.attention.output.LayerNorm.bias INFO bert.encoder.layer.4.intermediate.dense.weight INFO bert.encoder.layer.4.intermediate.dense.bias INFO bert.encoder.layer.4.output.dense.weight INFO bert.encoder.layer.4.output.dense.bias INFO bert.encoder.layer.4.output.LayerNorm.weight INFO bert.encoder.layer.4.output.LayerNorm.bias INFO bert.encoder.layer.5.attention.self.query.weight INFO bert.encoder.layer.5.attention.self.query.bias INFO bert.encoder.layer.5.attention.self.key.weight INFO bert.encoder.layer.5.attention.self.key.bias INFO bert.encoder.layer.5.attention.self.value.weight INFO bert.encoder.layer.5.attention.self.value.bias INFO bert.encoder.layer.5.attention.output.dense.weight INFO bert.encoder.layer.5.attention.output.dense.bias INFO bert.encoder.layer.5.attention.output.LayerNorm.weight INFO bert.encoder.layer.5.attention.output.LayerNorm.bias INFO bert.encoder.layer.5.intermediate.dense.weight INFO bert.encoder.layer.5.intermediate.dense.bias INFO bert.encoder.layer.5.output.dense.weight INFO bert.encoder.layer.5.output.dense.bias INFO bert.encoder.layer.5.output.LayerNorm.weight INFO bert.encoder.layer.5.output.LayerNorm.bias INFO bert.encoder.layer.6.attention.self.query.weight INFO bert.encoder.layer.6.attention.self.query.bias INFO bert.encoder.layer.6.attention.self.key.weight INFO bert.encoder.layer.6.attention.self.key.bias INFO bert.encoder.layer.6.attention.self.value.weight INFO bert.encoder.layer.6.attention.self.value.bias INFO bert.encoder.layer.6.attention.output.dense.weight INFO bert.encoder.layer.6.attention.output.dense.bias INFO bert.encoder.layer.6.attention.output.LayerNorm.weight INFO bert.encoder.layer.6.attention.output.LayerNorm.bias INFO bert.encoder.layer.6.intermediate.dense.weight INFO bert.encoder.layer.6.intermediate.dense.bias INFO bert.encoder.layer.6.output.dense.weight INFO bert.encoder.layer.6.output.dense.bias INFO bert.encoder.layer.6.output.LayerNorm.weight INFO bert.encoder.layer.6.output.LayerNorm.bias INFO bert.encoder.layer.7.attention.self.query.weight INFO bert.encoder.layer.7.attention.self.query.bias INFO bert.encoder.layer.7.attention.self.key.weight INFO bert.encoder.layer.7.attention.self.key.bias INFO bert.encoder.layer.7.attention.self.value.weight INFO bert.encoder.layer.7.attention.self.value.bias INFO bert.encoder.layer.7.attention.output.dense.weight INFO bert.encoder.layer.7.attention.output.dense.bias INFO bert.encoder.layer.7.attention.output.LayerNorm.weight INFO bert.encoder.layer.7.attention.output.LayerNorm.bias INFO bert.encoder.layer.7.intermediate.dense.weight INFO bert.encoder.layer.7.intermediate.dense.bias INFO bert.encoder.layer.7.output.dense.weight INFO bert.encoder.layer.7.output.dense.bias INFO bert.encoder.layer.7.output.LayerNorm.weight INFO bert.encoder.layer.7.output.LayerNorm.bias INFO bert.encoder.layer.8.attention.self.query.weight INFO bert.encoder.layer.8.attention.self.query.bias INFO bert.encoder.layer.8.attention.self.key.weight INFO bert.encoder.layer.8.attention.self.key.bias INFO bert.encoder.layer.8.attention.self.value.weight INFO bert.encoder.layer.8.attention.self.value.bias INFO bert.encoder.layer.8.attention.output.dense.weight INFO bert.encoder.layer.8.attention.output.dense.bias INFO bert.encoder.layer.8.attention.output.LayerNorm.weight INFO bert.encoder.layer.8.attention.output.LayerNorm.bias INFO bert.encoder.layer.8.intermediate.dense.weight INFO bert.encoder.layer.8.intermediate.dense.bias INFO bert.encoder.layer.8.output.dense.weight INFO bert.encoder.layer.8.output.dense.bias INFO bert.encoder.layer.8.output.LayerNorm.weight INFO bert.encoder.layer.8.output.LayerNorm.bias INFO bert.encoder.layer.9.attention.self.query.weight INFO bert.encoder.layer.9.attention.self.query.bias INFO bert.encoder.layer.9.attention.self.key.weight INFO bert.encoder.layer.9.attention.self.key.bias INFO bert.encoder.layer.9.attention.self.value.weight INFO bert.encoder.layer.9.attention.self.value.bias INFO bert.encoder.layer.9.attention.output.dense.weight INFO bert.encoder.layer.9.attention.output.dense.bias INFO bert.encoder.layer.9.attention.output.LayerNorm.weight INFO bert.encoder.layer.9.attention.output.LayerNorm.bias INFO bert.encoder.layer.9.intermediate.dense.weight INFO bert.encoder.layer.9.intermediate.dense.bias INFO bert.encoder.layer.9.output.dense.weight INFO bert.encoder.layer.9.output.dense.bias INFO bert.encoder.layer.9.output.LayerNorm.weight INFO bert.encoder.layer.9.output.LayerNorm.bias INFO bert.encoder.layer.10.attention.self.query.weight INFO bert.encoder.layer.10.attention.self.query.bias INFO bert.encoder.layer.10.attention.self.key.weight INFO bert.encoder.layer.10.attention.self.key.bias INFO bert.encoder.layer.10.attention.self.value.weight INFO bert.encoder.layer.10.attention.self.value.bias INFO bert.encoder.layer.10.attention.output.dense.weight INFO bert.encoder.layer.10.attention.output.dense.bias INFO bert.encoder.layer.10.attention.output.LayerNorm.weight INFO bert.encoder.layer.10.attention.output.LayerNorm.bias INFO bert.encoder.layer.10.intermediate.dense.weight INFO bert.encoder.layer.10.intermediate.dense.bias INFO bert.encoder.layer.10.output.dense.weight INFO bert.encoder.layer.10.output.dense.bias INFO bert.encoder.layer.10.output.LayerNorm.weight INFO bert.encoder.layer.10.output.LayerNorm.bias INFO bert.encoder.layer.11.attention.self.query.weight INFO bert.encoder.layer.11.attention.self.query.bias INFO bert.encoder.layer.11.attention.self.key.weight INFO bert.encoder.layer.11.attention.self.key.bias INFO bert.encoder.layer.11.attention.self.value.weight INFO bert.encoder.layer.11.attention.self.value.bias INFO bert.encoder.layer.11.attention.output.dense.weight INFO bert.encoder.layer.11.attention.output.dense.bias INFO bert.encoder.layer.11.attention.output.LayerNorm.weight INFO bert.encoder.layer.11.attention.output.LayerNorm.bias INFO bert.encoder.layer.11.intermediate.dense.weight INFO bert.encoder.layer.11.intermediate.dense.bias INFO bert.encoder.layer.11.output.dense.weight INFO bert.encoder.layer.11.output.dense.bias INFO bert.encoder.layer.11.output.LayerNorm.weight INFO bert.encoder.layer.11.output.LayerNorm.bias INFO bert.encoder.layer.12.attention.self.query.weight INFO bert.encoder.layer.12.attention.self.query.bias INFO bert.encoder.layer.12.attention.self.key.weight INFO bert.encoder.layer.12.attention.self.key.bias INFO bert.encoder.layer.12.attention.self.value.weight INFO bert.encoder.layer.12.attention.self.value.bias INFO bert.encoder.layer.12.attention.output.dense.weight INFO bert.encoder.layer.12.attention.output.dense.bias INFO bert.encoder.layer.12.attention.output.LayerNorm.weight INFO bert.encoder.layer.12.attention.output.LayerNorm.bias INFO bert.encoder.layer.12.intermediate.dense.weight INFO bert.encoder.layer.12.intermediate.dense.bias INFO bert.encoder.layer.12.output.dense.weight INFO bert.encoder.layer.12.output.dense.bias INFO bert.encoder.layer.12.output.LayerNorm.weight INFO bert.encoder.layer.12.output.LayerNorm.bias INFO bert.encoder.layer.13.attention.self.query.weight INFO bert.encoder.layer.13.attention.self.query.bias INFO bert.encoder.layer.13.attention.self.key.weight INFO bert.encoder.layer.13.attention.self.key.bias INFO bert.encoder.layer.13.attention.self.value.weight INFO bert.encoder.layer.13.attention.self.value.bias INFO bert.encoder.layer.13.attention.output.dense.weight INFO bert.encoder.layer.13.attention.output.dense.bias INFO bert.encoder.layer.13.attention.output.LayerNorm.weight INFO bert.encoder.layer.13.attention.output.LayerNorm.bias INFO bert.encoder.layer.13.intermediate.dense.weight INFO bert.encoder.layer.13.intermediate.dense.bias INFO bert.encoder.layer.13.output.dense.weight INFO bert.encoder.layer.13.output.dense.bias INFO bert.encoder.layer.13.output.LayerNorm.weight INFO bert.encoder.layer.13.output.LayerNorm.bias INFO bert.encoder.layer.14.attention.self.query.weight INFO bert.encoder.layer.14.attention.self.query.bias INFO bert.encoder.layer.14.attention.self.key.weight INFO bert.encoder.layer.14.attention.self.key.bias INFO bert.encoder.layer.14.attention.self.value.weight INFO bert.encoder.layer.14.attention.self.value.bias INFO bert.encoder.layer.14.attention.output.dense.weight INFO bert.encoder.layer.14.attention.output.dense.bias INFO bert.encoder.layer.14.attention.output.LayerNorm.weight INFO bert.encoder.layer.14.attention.output.LayerNorm.bias INFO bert.encoder.layer.14.intermediate.dense.weight INFO bert.encoder.layer.14.intermediate.dense.bias INFO bert.encoder.layer.14.output.dense.weight INFO bert.encoder.layer.14.output.dense.bias INFO bert.encoder.layer.14.output.LayerNorm.weight INFO bert.encoder.layer.14.output.LayerNorm.bias INFO bert.encoder.layer.15.attention.self.query.weight INFO bert.encoder.layer.15.attention.self.query.bias INFO bert.encoder.layer.15.attention.self.key.weight INFO bert.encoder.layer.15.attention.self.key.bias INFO bert.encoder.layer.15.attention.self.value.weight INFO bert.encoder.layer.15.attention.self.value.bias INFO bert.encoder.layer.15.attention.output.dense.weight INFO bert.encoder.layer.15.attention.output.dense.bias INFO bert.encoder.layer.15.attention.output.LayerNorm.weight INFO bert.encoder.layer.15.attention.output.LayerNorm.bias INFO bert.encoder.layer.15.intermediate.dense.weight INFO bert.encoder.layer.15.intermediate.dense.bias INFO bert.encoder.layer.15.output.dense.weight INFO bert.encoder.layer.15.output.dense.bias INFO bert.encoder.layer.15.output.LayerNorm.weight INFO bert.encoder.layer.15.output.LayerNorm.bias INFO bert.encoder.layer.16.attention.self.query.weight INFO bert.encoder.layer.16.attention.self.query.bias INFO bert.encoder.layer.16.attention.self.key.weight INFO bert.encoder.layer.16.attention.self.key.bias INFO bert.encoder.layer.16.attention.self.value.weight INFO bert.encoder.layer.16.attention.self.value.bias INFO bert.encoder.layer.16.attention.output.dense.weight INFO bert.encoder.layer.16.attention.output.dense.bias INFO bert.encoder.layer.16.attention.output.LayerNorm.weight INFO bert.encoder.layer.16.attention.output.LayerNorm.bias INFO bert.encoder.layer.16.intermediate.dense.weight INFO bert.encoder.layer.16.intermediate.dense.bias INFO bert.encoder.layer.16.output.dense.weight INFO bert.encoder.layer.16.output.dense.bias INFO bert.encoder.layer.16.output.LayerNorm.weight INFO bert.encoder.layer.16.output.LayerNorm.bias INFO bert.encoder.layer.17.attention.self.query.weight INFO bert.encoder.layer.17.attention.self.query.bias INFO bert.encoder.layer.17.attention.self.key.weight INFO bert.encoder.layer.17.attention.self.key.bias INFO bert.encoder.layer.17.attention.self.value.weight INFO bert.encoder.layer.17.attention.self.value.bias INFO bert.encoder.layer.17.attention.output.dense.weight INFO bert.encoder.layer.17.attention.output.dense.bias INFO bert.encoder.layer.17.attention.output.LayerNorm.weight INFO bert.encoder.layer.17.attention.output.LayerNorm.bias INFO bert.encoder.layer.17.intermediate.dense.weight INFO bert.encoder.layer.17.intermediate.dense.bias INFO bert.encoder.layer.17.output.dense.weight INFO bert.encoder.layer.17.output.dense.bias INFO bert.encoder.layer.17.output.LayerNorm.weight INFO bert.encoder.layer.17.output.LayerNorm.bias INFO bert.encoder.layer.18.attention.self.query.weight INFO bert.encoder.layer.18.attention.self.query.bias INFO bert.encoder.layer.18.attention.self.key.weight INFO bert.encoder.layer.18.attention.self.key.bias INFO bert.encoder.layer.18.attention.self.value.weight INFO bert.encoder.layer.18.attention.self.value.bias INFO bert.encoder.layer.18.attention.output.dense.weight INFO bert.encoder.layer.18.attention.output.dense.bias INFO bert.encoder.layer.18.attention.output.LayerNorm.weight INFO bert.encoder.layer.18.attention.output.LayerNorm.bias INFO bert.encoder.layer.18.intermediate.dense.weight INFO bert.encoder.layer.18.intermediate.dense.bias INFO bert.encoder.layer.18.output.dense.weight INFO bert.encoder.layer.18.output.dense.bias INFO bert.encoder.layer.18.output.LayerNorm.weight INFO bert.encoder.layer.18.output.LayerNorm.bias INFO bert.encoder.layer.19.attention.self.query.weight INFO bert.encoder.layer.19.attention.self.query.bias INFO bert.encoder.layer.19.attention.self.key.weight INFO bert.encoder.layer.19.attention.self.key.bias INFO bert.encoder.layer.19.attention.self.value.weight INFO bert.encoder.layer.19.attention.self.value.bias INFO bert.encoder.layer.19.attention.output.dense.weight INFO bert.encoder.layer.19.attention.output.dense.bias INFO bert.encoder.layer.19.attention.output.LayerNorm.weight INFO bert.encoder.layer.19.attention.output.LayerNorm.bias INFO bert.encoder.layer.19.intermediate.dense.weight INFO bert.encoder.layer.19.intermediate.dense.bias INFO bert.encoder.layer.19.output.dense.weight INFO bert.encoder.layer.19.output.dense.bias INFO bert.encoder.layer.19.output.LayerNorm.weight INFO bert.encoder.layer.19.output.LayerNorm.bias INFO bert.encoder.layer.20.attention.self.query.weight INFO bert.encoder.layer.20.attention.self.query.bias INFO bert.encoder.layer.20.attention.self.key.weight INFO bert.encoder.layer.20.attention.self.key.bias INFO bert.encoder.layer.20.attention.self.value.weight INFO bert.encoder.layer.20.attention.self.value.bias INFO bert.encoder.layer.20.attention.output.dense.weight INFO bert.encoder.layer.20.attention.output.dense.bias INFO bert.encoder.layer.20.attention.output.LayerNorm.weight INFO bert.encoder.layer.20.attention.output.LayerNorm.bias INFO bert.encoder.layer.20.intermediate.dense.weight INFO bert.encoder.layer.20.intermediate.dense.bias INFO bert.encoder.layer.20.output.dense.weight INFO bert.encoder.layer.20.output.dense.bias INFO bert.encoder.layer.20.output.LayerNorm.weight INFO bert.encoder.layer.20.output.LayerNorm.bias INFO bert.encoder.layer.21.attention.self.query.weight INFO bert.encoder.layer.21.attention.self.query.bias INFO bert.encoder.layer.21.attention.self.key.weight INFO bert.encoder.layer.21.attention.self.key.bias INFO bert.encoder.layer.21.attention.self.value.weight INFO bert.encoder.layer.21.attention.self.value.bias INFO bert.encoder.layer.21.attention.output.dense.weight INFO bert.encoder.layer.21.attention.output.dense.bias INFO bert.encoder.layer.21.attention.output.LayerNorm.weight INFO bert.encoder.layer.21.attention.output.LayerNorm.bias INFO bert.encoder.layer.21.intermediate.dense.weight INFO bert.encoder.layer.21.intermediate.dense.bias INFO bert.encoder.layer.21.output.dense.weight INFO bert.encoder.layer.21.output.dense.bias INFO bert.encoder.layer.21.output.LayerNorm.weight INFO bert.encoder.layer.21.output.LayerNorm.bias INFO bert.encoder.layer.22.attention.self.query.weight INFO bert.encoder.layer.22.attention.self.query.bias INFO bert.encoder.layer.22.attention.self.key.weight INFO bert.encoder.layer.22.attention.self.key.bias INFO bert.encoder.layer.22.attention.self.value.weight INFO bert.encoder.layer.22.attention.self.value.bias INFO bert.encoder.layer.22.attention.output.dense.weight INFO bert.encoder.layer.22.attention.output.dense.bias INFO bert.encoder.layer.22.attention.output.LayerNorm.weight INFO bert.encoder.layer.22.attention.output.LayerNorm.bias INFO bert.encoder.layer.22.intermediate.dense.weight INFO bert.encoder.layer.22.intermediate.dense.bias INFO bert.encoder.layer.22.output.dense.weight INFO bert.encoder.layer.22.output.dense.bias INFO bert.encoder.layer.22.output.LayerNorm.weight INFO bert.encoder.layer.22.output.LayerNorm.bias INFO bert.encoder.layer.23.attention.self.query.weight INFO bert.encoder.layer.23.attention.self.query.bias INFO bert.encoder.layer.23.attention.self.key.weight INFO bert.encoder.layer.23.attention.self.key.bias INFO bert.encoder.layer.23.attention.self.value.weight INFO bert.encoder.layer.23.attention.self.value.bias INFO bert.encoder.layer.23.attention.output.dense.weight INFO bert.encoder.layer.23.attention.output.dense.bias INFO bert.encoder.layer.23.attention.output.LayerNorm.weight INFO bert.encoder.layer.23.attention.output.LayerNorm.bias INFO bert.encoder.layer.23.intermediate.dense.weight INFO bert.encoder.layer.23.intermediate.dense.bias INFO bert.encoder.layer.23.output.dense.weight INFO bert.encoder.layer.23.output.dense.bias INFO bert.encoder.layer.23.output.LayerNorm.weight INFO bert.encoder.layer.23.output.LayerNorm.bias