Pre-Training TAPAS on new data #143

Open naserahmadi opened 2 years ago

naserahmadi commented 2 years ago

Hello, I am trying to pre-train a tapas model on my data (a set of queries and a table). My first question is that is there any function that can help me to convert my data into a format readable for create_pretrain_examples_main? or I should convert it myself to a format like interactions_sample.txtpb?

Also when I want to run tapas_pretraining_experiment on generated tfrecords, it returns the following error:

WARNING:tensorflow:From /home/pignal/tapas/tapas/utils/ The name tf.estimator.tpu.InputPipelineConfig is deprecated. Please use tf.compat.v1.estimator.tpu.InputPipelineConfig instead.

W1026 10:43:37.885343 140090917570368] From /home/pignal/tapas/tapas/utils/ The name tf.estimator.tpu.InputPipelineConfig is deprecated. Please use tf.compat.v1.estimator.tpu.InputPipelineConfig instead.

WARNING:tensorflow:From /home/pignal/tapas/tapas/utils/ The name tf.estimator.tpu.RunConfig is deprecated. Please use tf.compat.v1.estimator.tpu.RunConfig instead.

W1026 10:43:37.885531 140090917570368] From /home/pignal/tapas/tapas/utils/ The name tf.estimator.tpu.RunConfig is deprecated. Please use tf.compat.v1.estimator.tpu.RunConfig instead.

WARNING:tensorflow:From /home/pignal/tapas/tapas/utils/ The name tf.estimator.tpu.TPUConfig is deprecated. Please use tf.compat.v1.estimator.tpu.TPUConfig instead.

W1026 10:43:37.885647 140090917570368] From /home/pignal/tapas/tapas/utils/ The name tf.estimator.tpu.TPUConfig is deprecated. Please use tf.compat.v1.estimator.tpu.TPUConfig instead.

WARNING:tensorflow:From /home/pignal/tapas/tapas/utils/ The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

W1026 10:43:37.885888 140090917570368] From /home/pignal/tapas/tapas/utils/ The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

INFO:tensorflow:Using config: {'_model_dir': '...', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 4.0, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=5000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I1026 10:43:37.886492 140090917570368] Using config: {'_model_dir': '...', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 4.0, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=5000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I1026 10:43:37.887225 140090917570368] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W1026 10:43:37.887610 140090917570368] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/ calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W1026 10:43:37.893979 140090917570368] From /home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/ calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/ Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1026 10:43:37.894331 140090917570368] From /home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/ Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
2021-10-26 10:43:37.900831: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory
2021-10-26 10:43:37.900858: E tensorflow/stream_executor/cuda/] failed call to cuInit: UNKNOWN ERROR (303)
2021-10-26 10:43:37.900884: I tensorflow/stream_executor/cuda/] kernel driver does not appear to be running on this host (pignal): /proc/driver/nvidia/version does not exist
WARNING:tensorflow:From /home/pignal/tapas/tapas/datasets/ parallel_interleave (from is deprecated and will be removed in a future version.
Instructions for updating:
Use `, cycle_length, block_length,` instead. If sloppy execution is desired, use ``.
W1026 10:43:37.925668 140090917570368] From /home/pignal/tapas/tapas/datasets/ parallel_interleave (from is deprecated and will be removed in a future version.
Instructions for updating:
Use `, cycle_length, block_length,` instead. If sloppy execution is desired, use ``.
WARNING:tensorflow:From /home/pignal/tapas/tapas/datasets/ map_and_batch (from is deprecated and will be removed in a future version.
Instructions for updating:
Use `, num_parallel_calls)` followed by `, drop_remainder)`. Static optimizations will take care of using the fused implementation.
W1026 10:43:38.022828 140090917570368] From /home/pignal/tapas/tapas/datasets/ map_and_batch (from is deprecated and will be removed in a future version.
Instructions for updating:
Use `, num_parallel_calls)` followed by `, drop_remainder)`. Static optimizations will take care of using the fused implementation.
INFO:tensorflow:Calling model_fn.
I1026 10:43:38.306126 140090917570368] Calling model_fn.
INFO:tensorflow:Running train on CPU/GPU
I1026 10:43:38.306341 140090917570368] Running train on CPU/GPU
INFO:tensorflow:*** Features ***
I1026 10:43:38.306876 140090917570368] *** Features ***
INFO:tensorflow:  name = column_ids, shape = (512, 128)
I1026 10:43:38.306996 140090917570368]   name = column_ids, shape = (512, 128)
INFO:tensorflow:  name = column_ranks, shape = (512, 128)
I1026 10:43:38.307089 140090917570368]   name = column_ranks, shape = (512, 128)
INFO:tensorflow:  name = input_ids, shape = (512, 128)
I1026 10:43:38.307202 140090917570368]   name = input_ids, shape = (512, 128)
INFO:tensorflow:  name = input_mask, shape = (512, 128)
I1026 10:43:38.307282 140090917570368]   name = input_mask, shape = (512, 128)
INFO:tensorflow:  name = inv_column_ranks, shape = (512, 128)
I1026 10:43:38.307380 140090917570368]   name = inv_column_ranks, shape = (512, 128)
INFO:tensorflow:  name = masked_lm_ids, shape = (512, 20)
I1026 10:43:38.307460 140090917570368]   name = masked_lm_ids, shape = (512, 20)
INFO:tensorflow:  name = masked_lm_positions, shape = (512, 20)
I1026 10:43:38.307562 140090917570368]   name = masked_lm_positions, shape = (512, 20)
INFO:tensorflow:  name = masked_lm_weights, shape = (512, 20)
I1026 10:43:38.307645 140090917570368]   name = masked_lm_weights, shape = (512, 20)
INFO:tensorflow:  name = next_sentence_labels, shape = (512, 1)
I1026 10:43:38.307728 140090917570368]   name = next_sentence_labels, shape = (512, 1)
INFO:tensorflow:  name = numeric_relations, shape = (512, 128)
I1026 10:43:38.307821 140090917570368]   name = numeric_relations, shape = (512, 128)
INFO:tensorflow:  name = prev_label_ids, shape = (512, 128)
I1026 10:43:38.307901 140090917570368]   name = prev_label_ids, shape = (512, 128)
INFO:tensorflow:  name = row_ids, shape = (512, 128)
I1026 10:43:38.307995 140090917570368]   name = row_ids, shape = (512, 128)
INFO:tensorflow:  name = segment_ids, shape = (512, 128)
I1026 10:43:38.308069 140090917570368]   name = segment_ids, shape = (512, 128)
INFO:tensorflow:training_loop marked as finished
I1026 10:43:38.317348 140090917570368] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W1026 10:43:38.317499 140090917570368] Reraising captured error
Traceback (most recent call last):
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/", line 378, in assert_same_structure
ValueError: The two structures don't have the same nested structure.

First structure: type=list str=[<tf.Tensor 'IteratorGetNext:12' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:0' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:11' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:10' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:1' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:4' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:9' shape=(512, 128) dtype=int32>]

Second structure: type=int str=2

More specifically: Substructure "type=list str=[<tf.Tensor 'IteratorGetNext:12' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:0' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:11' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:10' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:1' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:4' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:9' shape=(512, 128) dtype=int32>]" is a sequence, while substructure "type=int str=2" is not

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tapas/experiments/", line 157, in <module>
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/absl/", line 303, in run
    _run_main(main, args)
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/absl/", line 251, in _run_main
  File "tapas/experiments/", line 117, in main
    input_fn=train_input_fn, max_steps=experiment_utils.num_train_steps())
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/", line 3083, in train
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/", line 150, in raise_errors
    six.reraise(typ, value, traceback)
  File "/home/pignal/.local/lib/python3.6/site-packages/", line 703, in reraise
    raise value
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/", line 3078, in train
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/", line 349, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/", line 1182, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/", line 1211, in _train_model_default
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/", line 2915, in _call_model_fn
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/", line 1170, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/", line 3173, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/", line 1700, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/", line 2031, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/home/pignal/tapas/tapas/models/", line 157, in model_fn
  File "/home/pignal/tapas/tapas/models/bert/", line 94, in create_model
  File "/home/pignal/tapas/tapas/models/bert/", line 228, in __init__
  File "/home/pignal/tapas/tapas/models/bert/", line 569, in embedding_postprocessor
    tf.nest.assert_same_structure(token_type_ids, token_type_vocab_size)
  File "/home/pignal/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/", line 385, in assert_same_structure
    % (str(e), str1, str2))
ValueError: The two structures don't have the same nested structure.

First structure: type=list str=[<tf.Tensor 'IteratorGetNext:12' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:0' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:11' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:10' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:1' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:4' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:9' shape=(512, 128) dtype=int32>]

Second structure: type=int str=2

More specifically: Substructure "type=list str=[<tf.Tensor 'IteratorGetNext:12' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:0' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:11' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:10' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:1' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:4' shape=(512, 128) dtype=int32>, <tf.Tensor 'IteratorGetNext:9' shape=(512, 128) dtype=int32>]" is a sequence, while substructure "type=int str=2" is not
Entire first structure:
[., ., ., ., ., ., .]
Entire second structure:

Can you help me with these questions? thanks

-I think that the error is related to the BERT config file that you are passing to tapas_pretraining_experiment. tapas_pretraining_experiment uses a bert_config_file flag that can be set and in this file there is a field called type_vocab_size. Here it is expected to be with size 7. You need to use something like this "type_vocab_size": [ 3, 256, 256, 2, 256, 256, 10 ].

Thanks, Syrine

