google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.15k stars 217 forks source link

Question related to variables of Tabfact models #82

Closed NielsRogge closed 4 years ago

NielsRogge commented 4 years ago

Hey!

As I'm further converting Tensorflow checkpoints to their PyTorch counterpart, I have a question related to the Tabfact checkpoints. When I print out the variables of the Tabfact (base with reset + intermediate pretraining) checkpoint, these are the last ones:

>>> import os
>>> from pprint import pprint
>>> import tensorflow as tf
>>> tf_path = os.path.abspath(r'C:\Users\niels.rogge\Documents\Python projecten\tensorflow\Tensorflow models\tapas_tabfact_inter_masklm_base_reset/model.ckpt')
>>> tf_vars = tf.train.list_variables(tf_path)
>>> pprint(tf_vars)
>>>
(...)
 ('bert/encoder/layer_9/output/dense/kernel/adam_m', [3072, 768]),
 ('bert/encoder/layer_9/output/dense/kernel/adam_v', [3072, 768]),
 ('bert/pooler/dense/bias', [768]),
 ('bert/pooler/dense/bias/adam_m', [768]),
 ('bert/pooler/dense/bias/adam_v', [768]),
 ('bert/pooler/dense/kernel', [768, 768]),
 ('bert/pooler/dense/kernel/adam_m', [768, 768]),
 ('bert/pooler/dense/kernel/adam_v', [768, 768]),
 ('global_step', []),
 ('output_bias', []),
 ('output_bias_cls', [2]),
 ('output_bias_cls/adam_m', [2]),
 ('output_bias_cls/adam_v', [2]),
 ('output_weights', [768]),
 ('output_weights_cls', [2, 768]),
 ('output_weights_cls/adam_m', [2, 768]),
 ('output_weights_cls/adam_v', [2, 768])]

I understand output_weights_cls and output_bias_cls are here, however, why are output_weights and output_bias here? Aren't these related to cell selection, which Tabfact does not require?

muelletm commented 4 years ago

The output of the TabFact models is a simple classification layer.

That's the equation in Section 2 of Understanding tables with intermediate pre-training. In the code it's happening in compute_classification_logits.

NielsRogge commented 4 years ago

Yes so as I understand: first the hidden representation of the [CLS] token is converted into another vector of size 768 using the pooling layer (whose weights were pretrained), and then this vector is multiplied by output_weights_cls and the output_bias_cls is added as seen in compute_classification_logits.

In other words, output_weightsand output_bias (even though they are in the checkpoint as shown above), are not used for Tabfact, right?

ghost commented 4 years ago

Yes, I misunderstood your question, sorry!

You are correct, output_weights and output_bias are not used by the TabFact model.