google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.15k stars 217 forks source link

Adding new aggregation operation #128

Open omrishsu opened 3 years ago

omrishsu commented 3 years ago

I want to add another aggregation operating. I've updated the code (protos enums, calc metrics, the model code, etc). By adding a new aggregation operation, some variables need to change their shape to be in the shape of [5,] instead of [4,].

def _get_classification_outputs

...

output_weights_agg = tf.get_variable(
            "output_weights_agg",
            shape=[config.num_aggregation_labels, hidden_size_agg],
            initializer=_classification_initializer())

output_bias_agg = tf.get_variable(
            "output_bias_agg",
            shape=[config.num_aggregation_labels],
            initializer=tf.zeros_initializer())

and

def _calculate_aggregation_loss_known

...

one_hot_labels = tf.one_hot(
      target_aggregation, depth=config.num_aggregation_labels, dtype=tf.float32)

(not sure if i need to change this one)

There is a way to take an existing fine-tune tapas model (e.g., tapas_wikisql_sqa_masklm_large_reset) and "inject" the data (lets say 0.5 for the weight and bias or maybe some other value), so i can fine-tune on my data with the new operation? or i must train a new tapas model on wiki-sql (for example) and only then to fine-tune on my model? (i want to save the time for training tapas again)

eisenjulian commented 3 years ago

Hi, I think there are multiple ways of doing what you want. One way would be to manipulate the checkpoint to change the shape of the output_agg_bias and output_weights_agg adding some extra random floats for the new op, there is probably a way to do this directly when the checkpoint is loaded by the model but not sure top of my head. The advantage of this is that you get to keep the pretrained weights for the other ops, but it's a bit more complicated than the solutions below.

Alternatively you can just rename the variables so that they they are not loaded and produce a shape error when a checkpoint is loaded. Another way would be to use the same mechanism we use when the number of classes for classification changes: see here which is ignoring the specific variables with name output_agg_bias and output_weights_agg.

Let us know how it goes or if you have any further questions.

omrishsu commented 3 years ago

Thank you @eisenjulian . I was actually asking about the first option you suggested. I don't know how (programmatically) do it. otherwise I'm "losing" the weights that you already trained and since I plan to add more and more operations I prefer not to reset the cls every time. any idea how to change the shape of the vars in a checkpoint and save it again?

eisenjulian commented 3 years ago

I am not sure sadly, consider asking in the tensorflow repo. Perhaps this can be of help.