config file - Githubissues

xuewyang commented 4 years ago

I have a simple question. I tried to solve it but didn't make it. I want to update the higher layers of bert while keep the lower layers fixed. For example, the below INFO. If I want to keep the layers 0-15 fixed and finetune 16-23, how to do this? I think I have to change something here and here. I know the "parameter_groups" is where we update the parameters, and the "no_grad" is where we fixed the parameters. So my questions are:

how to keep 0-15 fixed and 16-23 finetuned.
basically, ^ as a regex would complement a character class, why ^resnet in the "no_grad" makes resnet fixed?
what are the different between --- and - before ^?

bert.encoder.layer.0.attention.self.query.weight INFO bert.encoder.layer.0.attention.self.query.bias INFO bert.encoder.layer.0.attention.self.key.weight INFO bert.encoder.layer.0.attention.self.key.bias INFO bert.encoder.layer.0.attention.self.value.weight INFO bert.encoder.layer.0.attention.self.value.bias INFO bert.encoder.layer.0.attention.output.dense.weight INFO bert.encoder.layer.0.attention.output.dense.bias INFO bert.encoder.layer.0.attention.output.LayerNorm.weight INFO bert.encoder.layer.0.attention.output.LayerNorm.bias INFO bert.encoder.layer.0.intermediate.dense.weight INFO bert.encoder.layer.0.intermediate.dense.bias INFO bert.encoder.layer.0.output.dense.weight INFO bert.encoder.layer.0.output.dense.bias INFO bert.encoder.layer.0.output.LayerNorm.weight INFO bert.encoder.layer.0.output.LayerNorm.bias INFO bert.encoder.layer.1.attention.self.query.weight INFO bert.encoder.layer.1.attention.self.query.bias INFO bert.encoder.layer.1.attention.self.key.weight INFO bert.encoder.layer.1.attention.self.key.bias INFO bert.encoder.layer.1.attention.self.value.weight INFO bert.encoder.layer.1.attention.self.value.bias INFO bert.encoder.layer.1.attention.output.dense.weight INFO bert.encoder.layer.1.attention.output.dense.bias INFO bert.encoder.layer.1.attention.output.LayerNorm.weight INFO bert.encoder.layer.1.attention.output.LayerNorm.bias INFO bert.encoder.layer.1.intermediate.dense.weight INFO bert.encoder.layer.1.intermediate.dense.bias INFO bert.encoder.layer.1.output.dense.weight INFO bert.encoder.layer.1.output.dense.bias INFO bert.encoder.layer.1.output.LayerNorm.weight INFO bert.encoder.layer.1.output.LayerNorm.bias INFO bert.encoder.layer.2.attention.self.query.weight INFO bert.encoder.layer.2.attention.self.query.bias INFO bert.encoder.layer.2.attention.self.key.weight INFO bert.encoder.layer.2.attention.self.key.bias INFO bert.encoder.layer.2.attention.self.value.weight INFO bert.encoder.layer.2.attention.self.value.bias INFO bert.encoder.layer.2.attention.output.dense.weight INFO bert.encoder.layer.2.attention.output.dense.bias INFO bert.encoder.layer.2.attention.output.LayerNorm.weight INFO bert.encoder.layer.2.attention.output.LayerNorm.bias INFO bert.encoder.layer.2.intermediate.dense.weight INFO bert.encoder.layer.2.intermediate.dense.bias INFO bert.encoder.layer.2.output.dense.weight INFO bert.encoder.layer.2.output.dense.bias INFO bert.encoder.layer.2.output.LayerNorm.weight INFO bert.encoder.layer.2.output.LayerNorm.bias INFO bert.encoder.layer.3.attention.self.query.weight INFO bert.encoder.layer.3.attention.self.query.bias INFO bert.encoder.layer.3.attention.self.key.weight INFO bert.encoder.layer.3.attention.self.key.bias INFO bert.encoder.layer.3.attention.self.value.weight INFO bert.encoder.layer.3.attention.self.value.bias INFO bert.encoder.layer.3.attention.output.dense.weight INFO bert.encoder.layer.3.attention.output.dense.bias INFO bert.encoder.layer.3.attention.output.LayerNorm.weight INFO bert.encoder.layer.3.attention.output.LayerNorm.bias INFO bert.encoder.layer.3.intermediate.dense.weight INFO bert.encoder.layer.3.intermediate.dense.bias INFO bert.encoder.layer.3.output.dense.weight INFO bert.encoder.layer.3.output.dense.bias INFO bert.encoder.layer.3.output.LayerNorm.weight INFO bert.encoder.layer.3.output.LayerNorm.bias INFO bert.encoder.layer.4.attention.self.query.weight INFO bert.encoder.layer.4.attention.self.query.bias INFO bert.encoder.layer.4.attention.self.key.weight INFO bert.encoder.layer.4.attention.self.key.bias INFO bert.encoder.layer.4.attention.self.value.weight INFO bert.encoder.layer.4.attention.self.value.bias INFO bert.encoder.layer.4.attention.output.dense.weight INFO bert.encoder.layer.4.attention.output.dense.bias INFO bert.encoder.layer.4.attention.output.LayerNorm.weight INFO bert.encoder.layer.4.attention.output.LayerNorm.bias INFO bert.encoder.layer.4.intermediate.dense.weight INFO bert.encoder.layer.4.intermediate.dense.bias INFO bert.encoder.layer.4.output.dense.weight INFO bert.encoder.layer.4.output.dense.bias INFO bert.encoder.layer.4.output.LayerNorm.weight INFO bert.encoder.layer.4.output.LayerNorm.bias INFO bert.encoder.layer.5.attention.self.query.weight INFO bert.encoder.layer.5.attention.self.query.bias INFO bert.encoder.layer.5.attention.self.key.weight INFO bert.encoder.layer.5.attention.self.key.bias INFO bert.encoder.layer.5.attention.self.value.weight INFO bert.encoder.layer.5.attention.self.value.bias INFO bert.encoder.layer.5.attention.output.dense.weight INFO bert.encoder.layer.5.attention.output.dense.bias INFO bert.encoder.layer.5.attention.output.LayerNorm.weight INFO bert.encoder.layer.5.attention.output.LayerNorm.bias INFO bert.encoder.layer.5.intermediate.dense.weight INFO bert.encoder.layer.5.intermediate.dense.bias INFO bert.encoder.layer.5.output.dense.weight INFO bert.encoder.layer.5.output.dense.bias INFO bert.encoder.layer.5.output.LayerNorm.weight INFO bert.encoder.layer.5.output.LayerNorm.bias INFO bert.encoder.layer.6.attention.self.query.weight INFO bert.encoder.layer.6.attention.self.query.bias INFO bert.encoder.layer.6.attention.self.key.weight INFO bert.encoder.layer.6.attention.self.key.bias INFO bert.encoder.layer.6.attention.self.value.weight INFO bert.encoder.layer.6.attention.self.value.bias INFO bert.encoder.layer.6.attention.output.dense.weight INFO bert.encoder.layer.6.attention.output.dense.bias INFO bert.encoder.layer.6.attention.output.LayerNorm.weight INFO bert.encoder.layer.6.attention.output.LayerNorm.bias INFO bert.encoder.layer.6.intermediate.dense.weight INFO bert.encoder.layer.6.intermediate.dense.bias INFO bert.encoder.layer.6.output.dense.weight INFO bert.encoder.layer.6.output.dense.bias INFO bert.encoder.layer.6.output.LayerNorm.weight INFO bert.encoder.layer.6.output.LayerNorm.bias INFO bert.encoder.layer.7.attention.self.query.weight INFO bert.encoder.layer.7.attention.self.query.bias INFO bert.encoder.layer.7.attention.self.key.weight INFO bert.encoder.layer.7.attention.self.key.bias INFO bert.encoder.layer.7.attention.self.value.weight INFO bert.encoder.layer.7.attention.self.value.bias INFO bert.encoder.layer.7.attention.output.dense.weight INFO bert.encoder.layer.7.attention.output.dense.bias INFO bert.encoder.layer.7.attention.output.LayerNorm.weight INFO bert.encoder.layer.7.attention.output.LayerNorm.bias INFO bert.encoder.layer.7.intermediate.dense.weight INFO bert.encoder.layer.7.intermediate.dense.bias INFO bert.encoder.layer.7.output.dense.weight INFO bert.encoder.layer.7.output.dense.bias INFO bert.encoder.layer.7.output.LayerNorm.weight INFO bert.encoder.layer.7.output.LayerNorm.bias INFO bert.encoder.layer.8.attention.self.query.weight INFO bert.encoder.layer.8.attention.self.query.bias INFO bert.encoder.layer.8.attention.self.key.weight INFO bert.encoder.layer.8.attention.self.key.bias INFO bert.encoder.layer.8.attention.self.value.weight INFO bert.encoder.layer.8.attention.self.value.bias INFO bert.encoder.layer.8.attention.output.dense.weight INFO bert.encoder.layer.8.attention.output.dense.bias INFO bert.encoder.layer.8.attention.output.LayerNorm.weight INFO bert.encoder.layer.8.attention.output.LayerNorm.bias INFO bert.encoder.layer.8.intermediate.dense.weight INFO bert.encoder.layer.8.intermediate.dense.bias INFO bert.encoder.layer.8.output.dense.weight INFO bert.encoder.layer.8.output.dense.bias INFO bert.encoder.layer.8.output.LayerNorm.weight INFO bert.encoder.layer.8.output.LayerNorm.bias INFO bert.encoder.layer.9.attention.self.query.weight INFO bert.encoder.layer.9.attention.self.query.bias INFO bert.encoder.layer.9.attention.self.key.weight INFO bert.encoder.layer.9.attention.self.key.bias INFO bert.encoder.layer.9.attention.self.value.weight INFO bert.encoder.layer.9.attention.self.value.bias INFO bert.encoder.layer.9.attention.output.dense.weight INFO bert.encoder.layer.9.attention.output.dense.bias INFO bert.encoder.layer.9.attention.output.LayerNorm.weight INFO bert.encoder.layer.9.attention.output.LayerNorm.bias INFO bert.encoder.layer.9.intermediate.dense.weight INFO bert.encoder.layer.9.intermediate.dense.bias INFO bert.encoder.layer.9.output.dense.weight INFO bert.encoder.layer.9.output.dense.bias INFO bert.encoder.layer.9.output.LayerNorm.weight INFO bert.encoder.layer.9.output.LayerNorm.bias INFO bert.encoder.layer.10.attention.self.query.weight INFO bert.encoder.layer.10.attention.self.query.bias INFO bert.encoder.layer.10.attention.self.key.weight INFO bert.encoder.layer.10.attention.self.key.bias INFO bert.encoder.layer.10.attention.self.value.weight INFO bert.encoder.layer.10.attention.self.value.bias INFO bert.encoder.layer.10.attention.output.dense.weight INFO bert.encoder.layer.10.attention.output.dense.bias INFO bert.encoder.layer.10.attention.output.LayerNorm.weight INFO bert.encoder.layer.10.attention.output.LayerNorm.bias INFO bert.encoder.layer.10.intermediate.dense.weight INFO bert.encoder.layer.10.intermediate.dense.bias INFO bert.encoder.layer.10.output.dense.weight INFO bert.encoder.layer.10.output.dense.bias INFO bert.encoder.layer.10.output.LayerNorm.weight INFO bert.encoder.layer.10.output.LayerNorm.bias INFO bert.encoder.layer.11.attention.self.query.weight INFO bert.encoder.layer.11.attention.self.query.bias INFO bert.encoder.layer.11.attention.self.key.weight INFO bert.encoder.layer.11.attention.self.key.bias INFO bert.encoder.layer.11.attention.self.value.weight INFO bert.encoder.layer.11.attention.self.value.bias INFO bert.encoder.layer.11.attention.output.dense.weight INFO bert.encoder.layer.11.attention.output.dense.bias INFO bert.encoder.layer.11.attention.output.LayerNorm.weight INFO bert.encoder.layer.11.attention.output.LayerNorm.bias INFO bert.encoder.layer.11.intermediate.dense.weight INFO bert.encoder.layer.11.intermediate.dense.bias INFO bert.encoder.layer.11.output.dense.weight INFO bert.encoder.layer.11.output.dense.bias INFO bert.encoder.layer.11.output.LayerNorm.weight INFO bert.encoder.layer.11.output.LayerNorm.bias INFO bert.encoder.layer.12.attention.self.query.weight INFO bert.encoder.layer.12.attention.self.query.bias INFO bert.encoder.layer.12.attention.self.key.weight INFO bert.encoder.layer.12.attention.self.key.bias INFO bert.encoder.layer.12.attention.self.value.weight INFO bert.encoder.layer.12.attention.self.value.bias INFO bert.encoder.layer.12.attention.output.dense.weight INFO bert.encoder.layer.12.attention.output.dense.bias INFO bert.encoder.layer.12.attention.output.LayerNorm.weight INFO bert.encoder.layer.12.attention.output.LayerNorm.bias INFO bert.encoder.layer.12.intermediate.dense.weight INFO bert.encoder.layer.12.intermediate.dense.bias INFO bert.encoder.layer.12.output.dense.weight INFO bert.encoder.layer.12.output.dense.bias INFO bert.encoder.layer.12.output.LayerNorm.weight INFO bert.encoder.layer.12.output.LayerNorm.bias INFO bert.encoder.layer.13.attention.self.query.weight INFO bert.encoder.layer.13.attention.self.query.bias INFO bert.encoder.layer.13.attention.self.key.weight INFO bert.encoder.layer.13.attention.self.key.bias INFO bert.encoder.layer.13.attention.self.value.weight INFO bert.encoder.layer.13.attention.self.value.bias INFO bert.encoder.layer.13.attention.output.dense.weight INFO bert.encoder.layer.13.attention.output.dense.bias INFO bert.encoder.layer.13.attention.output.LayerNorm.weight INFO bert.encoder.layer.13.attention.output.LayerNorm.bias INFO bert.encoder.layer.13.intermediate.dense.weight INFO bert.encoder.layer.13.intermediate.dense.bias INFO bert.encoder.layer.13.output.dense.weight INFO bert.encoder.layer.13.output.dense.bias INFO bert.encoder.layer.13.output.LayerNorm.weight INFO bert.encoder.layer.13.output.LayerNorm.bias INFO bert.encoder.layer.14.attention.self.query.weight INFO bert.encoder.layer.14.attention.self.query.bias INFO bert.encoder.layer.14.attention.self.key.weight INFO bert.encoder.layer.14.attention.self.key.bias INFO bert.encoder.layer.14.attention.self.value.weight INFO bert.encoder.layer.14.attention.self.value.bias INFO bert.encoder.layer.14.attention.output.dense.weight INFO bert.encoder.layer.14.attention.output.dense.bias INFO bert.encoder.layer.14.attention.output.LayerNorm.weight INFO bert.encoder.layer.14.attention.output.LayerNorm.bias INFO bert.encoder.layer.14.intermediate.dense.weight INFO bert.encoder.layer.14.intermediate.dense.bias INFO bert.encoder.layer.14.output.dense.weight INFO bert.encoder.layer.14.output.dense.bias INFO bert.encoder.layer.14.output.LayerNorm.weight INFO bert.encoder.layer.14.output.LayerNorm.bias INFO bert.encoder.layer.15.attention.self.query.weight INFO bert.encoder.layer.15.attention.self.query.bias INFO bert.encoder.layer.15.attention.self.key.weight INFO bert.encoder.layer.15.attention.self.key.bias INFO bert.encoder.layer.15.attention.self.value.weight INFO bert.encoder.layer.15.attention.self.value.bias INFO bert.encoder.layer.15.attention.output.dense.weight INFO bert.encoder.layer.15.attention.output.dense.bias INFO bert.encoder.layer.15.attention.output.LayerNorm.weight INFO bert.encoder.layer.15.attention.output.LayerNorm.bias INFO bert.encoder.layer.15.intermediate.dense.weight INFO bert.encoder.layer.15.intermediate.dense.bias INFO bert.encoder.layer.15.output.dense.weight INFO bert.encoder.layer.15.output.dense.bias INFO bert.encoder.layer.15.output.LayerNorm.weight INFO bert.encoder.layer.15.output.LayerNorm.bias INFO bert.encoder.layer.16.attention.self.query.weight INFO bert.encoder.layer.16.attention.self.query.bias INFO bert.encoder.layer.16.attention.self.key.weight INFO bert.encoder.layer.16.attention.self.key.bias INFO bert.encoder.layer.16.attention.self.value.weight INFO bert.encoder.layer.16.attention.self.value.bias INFO bert.encoder.layer.16.attention.output.dense.weight INFO bert.encoder.layer.16.attention.output.dense.bias INFO bert.encoder.layer.16.attention.output.LayerNorm.weight INFO bert.encoder.layer.16.attention.output.LayerNorm.bias INFO bert.encoder.layer.16.intermediate.dense.weight INFO bert.encoder.layer.16.intermediate.dense.bias INFO bert.encoder.layer.16.output.dense.weight INFO bert.encoder.layer.16.output.dense.bias INFO bert.encoder.layer.16.output.LayerNorm.weight INFO bert.encoder.layer.16.output.LayerNorm.bias INFO bert.encoder.layer.17.attention.self.query.weight INFO bert.encoder.layer.17.attention.self.query.bias INFO bert.encoder.layer.17.attention.self.key.weight INFO bert.encoder.layer.17.attention.self.key.bias INFO bert.encoder.layer.17.attention.self.value.weight INFO bert.encoder.layer.17.attention.self.value.bias INFO bert.encoder.layer.17.attention.output.dense.weight INFO bert.encoder.layer.17.attention.output.dense.bias INFO bert.encoder.layer.17.attention.output.LayerNorm.weight INFO bert.encoder.layer.17.attention.output.LayerNorm.bias INFO bert.encoder.layer.17.intermediate.dense.weight INFO bert.encoder.layer.17.intermediate.dense.bias INFO bert.encoder.layer.17.output.dense.weight INFO bert.encoder.layer.17.output.dense.bias INFO bert.encoder.layer.17.output.LayerNorm.weight INFO bert.encoder.layer.17.output.LayerNorm.bias INFO bert.encoder.layer.18.attention.self.query.weight INFO bert.encoder.layer.18.attention.self.query.bias INFO bert.encoder.layer.18.attention.self.key.weight INFO bert.encoder.layer.18.attention.self.key.bias INFO bert.encoder.layer.18.attention.self.value.weight INFO bert.encoder.layer.18.attention.self.value.bias INFO bert.encoder.layer.18.attention.output.dense.weight INFO bert.encoder.layer.18.attention.output.dense.bias INFO bert.encoder.layer.18.attention.output.LayerNorm.weight INFO bert.encoder.layer.18.attention.output.LayerNorm.bias INFO bert.encoder.layer.18.intermediate.dense.weight INFO bert.encoder.layer.18.intermediate.dense.bias INFO bert.encoder.layer.18.output.dense.weight INFO bert.encoder.layer.18.output.dense.bias INFO bert.encoder.layer.18.output.LayerNorm.weight INFO bert.encoder.layer.18.output.LayerNorm.bias INFO bert.encoder.layer.19.attention.self.query.weight INFO bert.encoder.layer.19.attention.self.query.bias INFO bert.encoder.layer.19.attention.self.key.weight INFO bert.encoder.layer.19.attention.self.key.bias INFO bert.encoder.layer.19.attention.self.value.weight INFO bert.encoder.layer.19.attention.self.value.bias INFO bert.encoder.layer.19.attention.output.dense.weight INFO bert.encoder.layer.19.attention.output.dense.bias INFO bert.encoder.layer.19.attention.output.LayerNorm.weight INFO bert.encoder.layer.19.attention.output.LayerNorm.bias INFO bert.encoder.layer.19.intermediate.dense.weight INFO bert.encoder.layer.19.intermediate.dense.bias INFO bert.encoder.layer.19.output.dense.weight INFO bert.encoder.layer.19.output.dense.bias INFO bert.encoder.layer.19.output.LayerNorm.weight INFO bert.encoder.layer.19.output.LayerNorm.bias INFO bert.encoder.layer.20.attention.self.query.weight INFO bert.encoder.layer.20.attention.self.query.bias INFO bert.encoder.layer.20.attention.self.key.weight INFO bert.encoder.layer.20.attention.self.key.bias INFO bert.encoder.layer.20.attention.self.value.weight INFO bert.encoder.layer.20.attention.self.value.bias INFO bert.encoder.layer.20.attention.output.dense.weight INFO bert.encoder.layer.20.attention.output.dense.bias INFO bert.encoder.layer.20.attention.output.LayerNorm.weight INFO bert.encoder.layer.20.attention.output.LayerNorm.bias INFO bert.encoder.layer.20.intermediate.dense.weight INFO bert.encoder.layer.20.intermediate.dense.bias INFO bert.encoder.layer.20.output.dense.weight INFO bert.encoder.layer.20.output.dense.bias INFO bert.encoder.layer.20.output.LayerNorm.weight INFO bert.encoder.layer.20.output.LayerNorm.bias INFO bert.encoder.layer.21.attention.self.query.weight INFO bert.encoder.layer.21.attention.self.query.bias INFO bert.encoder.layer.21.attention.self.key.weight INFO bert.encoder.layer.21.attention.self.key.bias INFO bert.encoder.layer.21.attention.self.value.weight INFO bert.encoder.layer.21.attention.self.value.bias INFO bert.encoder.layer.21.attention.output.dense.weight INFO bert.encoder.layer.21.attention.output.dense.bias INFO bert.encoder.layer.21.attention.output.LayerNorm.weight INFO bert.encoder.layer.21.attention.output.LayerNorm.bias INFO bert.encoder.layer.21.intermediate.dense.weight INFO bert.encoder.layer.21.intermediate.dense.bias INFO bert.encoder.layer.21.output.dense.weight INFO bert.encoder.layer.21.output.dense.bias INFO bert.encoder.layer.21.output.LayerNorm.weight INFO bert.encoder.layer.21.output.LayerNorm.bias INFO bert.encoder.layer.22.attention.self.query.weight INFO bert.encoder.layer.22.attention.self.query.bias INFO bert.encoder.layer.22.attention.self.key.weight INFO bert.encoder.layer.22.attention.self.key.bias INFO bert.encoder.layer.22.attention.self.value.weight INFO bert.encoder.layer.22.attention.self.value.bias INFO bert.encoder.layer.22.attention.output.dense.weight INFO bert.encoder.layer.22.attention.output.dense.bias INFO bert.encoder.layer.22.attention.output.LayerNorm.weight INFO bert.encoder.layer.22.attention.output.LayerNorm.bias INFO bert.encoder.layer.22.intermediate.dense.weight INFO bert.encoder.layer.22.intermediate.dense.bias INFO bert.encoder.layer.22.output.dense.weight INFO bert.encoder.layer.22.output.dense.bias INFO bert.encoder.layer.22.output.LayerNorm.weight INFO bert.encoder.layer.22.output.LayerNorm.bias INFO bert.encoder.layer.23.attention.self.query.weight INFO bert.encoder.layer.23.attention.self.query.bias INFO bert.encoder.layer.23.attention.self.key.weight INFO bert.encoder.layer.23.attention.self.key.bias INFO bert.encoder.layer.23.attention.self.value.weight INFO bert.encoder.layer.23.attention.self.value.bias INFO bert.encoder.layer.23.attention.output.dense.weight INFO bert.encoder.layer.23.attention.output.dense.bias INFO bert.encoder.layer.23.attention.output.LayerNorm.weight INFO bert.encoder.layer.23.attention.output.LayerNorm.bias INFO bert.encoder.layer.23.intermediate.dense.weight INFO bert.encoder.layer.23.intermediate.dense.bias INFO bert.encoder.layer.23.output.dense.weight INFO bert.encoder.layer.23.output.dense.bias INFO bert.encoder.layer.23.output.LayerNorm.weight INFO bert.encoder.layer.23.output.LayerNorm.bias

alasdairtran commented 4 years ago

^ doesn't mean complementing. It tells regex to match the start of the text. Here's a lazy version to freeze the first 15 layers. You probably can make better use of regex (look up Python regex syntax):

  no_grad:
    - ^bert.encoder.layer.0
    - ^bert.encoder.layer.1
    - ^bert.encoder.layer.2
    - ^bert.encoder.layer.3
    # and so on
    - ^bert.encoder.layer.15

In yaml, - means it's we have a list. So parameter_groups expects a list of groups, where each group is a list containing two objects. The first object is a list of regex expressions, while the second object is a dictionary containing optimizer parameters specific to that group.

In any case parameter_groups is only useful if you want to specify, say a different learning rate, for each layer. Otherwise you can just ignore it.

xuewyang commented 4 years ago

Gotcha. I actually tried this but failed. I think the reason might be layer.1 would cover layer.11-19 and layer.2 would cover layer.21-23. So I used the following instead and it worked. Thank you.

^bert.encoder.layer.0.attention
^bert.encoder.layer.0.intermediate
^bert.encoder.layer.0.output

xuewyang commented 4 years ago

Hi Alasdair, I know that when my training is crashed, I can use recover to resume the training. However, this is only feasible when the config file does not change anything at all. If I want to fine-tune on a new dataset, I have change the data reader but fix the model. Do you know how to achieve this? Thank you.

alasdairtran commented 4 years ago

One way is to specify the path to the pretrained model and manually load the weights. Like what I did here and here, where I used pretrained weights to fine-tune the model with the copy mechanism.

xuewyang commented 4 years ago

Hi Alasdair, I am finishing my paper using latex. I noticed that you used upright or topdown (instead of parallel) text for GoodNews and NYTimes800K in table 3. I want to follow that format. Can you explain how to achieve this in latex?

alasdairtran commented 4 years ago

Here's my table:

\usepackage{graphicx} % rotate box
\usepackage{multirow} % merge multiple rows
\usepackage{tabularx} % for 'tabularx' environment

\begin{table*}[t]

    \caption {Results on GoodNews (rows 1--10) and NYTimes800k (rows 11--19).
        We report BLEU-4, ROUGE, CIDEr, and precision (P) \& recall (R)  of
        named entities, people's names, and rare proper nouns. Precision and
        recall are expressed as percentages. Rows 1--2 contain previous
        state-of-the-art results \cite{Biten2019GoodNews}. Rows 3--5 and 11--13
        are ablation studies where we swap the Transformer with an LSTM and/or
        RoBERTa with GloVe. These models only have the image attention (IA).
        Rows 6 \& 14 are our baseline RoBERTa transformer language model that
        only has the article text (and not the image) as inputs. Building on
        top of this, we first add attention over image patches (rows 7 \& 15).
        We then take a weighted sum of the RoBERTa embeddings (rows 8 \& 16)
        and attend to the text surrounding the image instead of the first 512
        tokens of the article (row 17). Finally we add attention over faces
        (rows 9 \& 18) and objects (rows 10 \& 19) in the image.}

    \label{tab:results}
    \centering
    \begin{tabularx}{\textwidth}{llXXX XX XX XX}
        \toprule
         &
         & \multirow{2}{*}{\mbox{\small{BLEU-4}}}
         & \multirow{2}{*}{\small{ROUGE}}
         & \multirow{2}{*}{\small{CIDEr}}
         & \multicolumn{2}{l}{\small{Named entities}}
         & \multicolumn{2}{l}{\small{People's names}}
         & \multicolumn{2}{l}{\small{Rare proper nouns}}                                                                                                                                                     \\
         &                                                   &               &               &               & \small{P}     & \small{R}     & \small{P}     & \small{R}     & \small{P}     & \small{R}     \\
        \midrule
        \multirow{9}{*}{\rotatebox[origin=c]{90}{GoodNews}}
         & (1) Biten (Avg + CtxIns)~\cite{Biten2019GoodNews} & 0.89          & 12.2          & 13.1          & 8.23          & 6.06          & 9.38          & 6.55          & 1.06          & 12.5          \\
         & (2) Biten (TBB + AttIns)~\cite{Biten2019GoodNews} & 0.76          & 12.2          & 12.7          & 8.87          & 5.64          & 11.9          & 6.98          & 1.58          & 12.6          \\
        \cmidrule{2-11}

         & (3) LSTM + GloVe + IA                             & 1.97          & 13.6          & 13.9          & 10.7          & 7.09          & 9.07          & 5.36          & 0             & 0             \\
         & (4) Transformer + GloVe + IA                      & 3.48          & 17.0          & 25.2          & 14.3          & 11.1          & 14.5          & 10.5          & 0             & 0             \\
         & (5) LSTM + RoBERTa + IA                           & 3.45          & 17.0          & 28.6          & 15.5          & 12.0          & 16.4          & 12.4          & 2.75          & 8.64          \\
        \cmidrule{2-11}

         & (6) Transformer + RoBERTa                         & 4.60          & 18.6          & 40.9          & 19.3          & 16.1          & 24.4          & 18.7          & 10.7          & 18.7          \\
         & (7) \quad + image attention                       & 5.45          & 20.7          & 48.5          & 21.1          & 17.4          & 26.9          & 20.7          & 12.2          & 20.9          \\
         & (8) \quad\quad + weighted RoBERTa                 & 6.0           & 21.2          & 53.1          & 21.8          & 18.5          & 28.8          & 22.8          & 16.2          & 26.0          \\
         & (9) \quad\quad\quad + face attention              & \textbf{6.05} & \textbf{21.4} & \textbf{54.3} & 22.0          & 18.6          & \textbf{29.3} & \textbf{23.3} & 15.5          & 24.5          \\
         & (10) \quad\quad\quad\quad + object attention      & \textbf{6.05} & \textbf{21.4} & 53.8          & \textbf{22.2} & \textbf{18.7} & 29.2          & 23.1          & \textbf{15.6} & \textbf{26.3} \\

        \midrule
        \midrule
        \multirow{8}{*}{\rotatebox[origin=c]{90}{NYTimes800k}}
         & (11) LSTM + GloVe + IA                            & 1.77          & 13.1          & 12.1          & 10.2          & 7.24          & 8.83          & 5.73          & 0             & 0             \\
         & (12) Transformer + GloVe + IA                     & 2.75          & 15.9          & 20.3          & 13.2          & 10.8          & 13.2          & 9.66          & 0             & 0             \\
         & (13) LSTM + RoBERTa + IA                          & 3.29          & 16.1          & 24.9          & 15.1          & 12.9          & 17.7          & 14.4          & 7.47          & 9.50          \\
        \cmidrule{2-11}
         & (14) Transformer + RoBERTa                        & 4.26          & 17.3          & 33.9          & 17.8          & 16.3          & 23.6          & 19.7          & 21.1          & 16.7          \\
         & (15) \quad + image attention                      & 5.01          & 19.4          & 40.3          & 20.0          & 18.1          & 28.2          & 23.0          & 24.3          & 19.3          \\
         & (16) \quad\quad + weighted RoBERTa                & 5.75          & 19.9          & 45.1          & 21.1          & 19.6          & 29.7          & 25.4          & 29.6          & 22.8          \\
         & (17) \quad\quad\quad + location-aware             & 6.36          & 21.4          & 52.8          & 24.0          & 21.9          & 35.4          & 30.2          & 33.8          & \textbf{27.2} \\
         & (18) \quad\quad\quad\quad + face attention        & 6.26          & 21.5          & 53.9          & 24.2          & 22.1          & 36.5          & 30.8          & 33.4          & 26.4          \\
         & (19) \quad\quad\quad\quad\quad + object attention & \textbf{6.30} & \textbf{21.7} & \textbf{54.4} & \textbf{24.6} & \textbf{22.2} & \textbf{37.3} & \textbf{31.1} & \textbf{34.2} & 27.0          \\

        \bottomrule
    \end{tabularx}
\end{table*}

alasdairtran / transform-and-tell

config file #10