google-research / text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
https://arxiv.org/abs/1910.10683
Apache License 2.0
6.11k stars 753 forks source link

model.eval() : 'list' object not callable #243

Closed mmcs-work closed 4 years ago

mmcs-work commented 4 years ago

I am trying to train the T5 model from scratch using the IMDB dataset. Instead of using the TensorFlow datasets I am using the raw dataset and preprocessing it accordingly. With the same dataset when I train the model it works correctly. (Loss curve is reasonable). But when I ran model.eval() on the test set or even the train set (the same data with which model is trained), it gets this error.

INFO:tensorflow:Using config: {'_model_dir': 'gs://fiery-lcm-000001/models1/small', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.14.56.218:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({'worker': ['10.14.56.218:8470']}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.14.56.218:8470', '_evaluation_master': 'grpc://10.14.56.218:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=None, num_cores_per_replica=1, per_host_input_for_training=4, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': <tensorflow.python.distribute.cluster_resolver.tpu_cluster_resolver.TPUClusterResolver object at 0x7f7a17bb1240>}
INFO:tensorflow:_TPUContext: eval_on_tpu True
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-4b2a7485e77f> in <module>()
      4     mixture_or_task_name="trivia_all",
      5     checkpoint_steps="all",
----> 6     split="test"
      7 )

2 frames
/usr/local/lib/python3.6/dist-packages/t5/models/mtf_model.py in eval(self, mixture_or_task_name, checkpoint_steps, summary_dir, split)
    265     utils.eval_model(self.estimator(vocabulary), vocabulary,
    266                      self._sequence_length, self.batch_size, split,
--> 267                      self._model_dir, dataset_fn, summary_dir, checkpoint_steps)
    268 
    269   def finetune(self, mixture_or_task_name, finetune_steps, pretrained_model_dir,

/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/utils.py in eval_model(estimator, vocabulary, sequence_length, batch_size, dataset_split, model_dir, eval_dataset_fn, eval_summary_dir, eval_checkpoint_step)
   1261                 tf.compat.as_text(ex["targets_plaintext"]),
   1262                 example=ex, is_target=True)
-> 1263             for ex in examples
   1264         ]
   1265         targets_filename = os.path.join(

/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/utils.py in <listcomp>(.0)
   1261                 tf.compat.as_text(ex["targets_plaintext"]),
   1262                 example=ex, is_target=True)
-> 1263             for ex in examples
   1264         ]
   1265         targets_filename = os.path.join(

TypeError: 'list' object is not callable
  1. If there is any pre-processed data, shouldn't it affect the training process as well?
  2. If anybody has any idea regarding this issue, please let me know.

Here is the notebook link.

adarob commented 4 years ago

The issue is that you have the postprocess_fn in a list. It should just be a single function.

mmcs-work commented 4 years ago

The issue is that you have the postprocess_fn in a list. It should just be a single function.

Thanks for the reply.

  1. I am surprised that it didn't create any problem in the training phase. (Because post-process function is applied in that case as well). Or am I missing something?

  2. Also so in case there are multiple post-process functions one wants to apply, they have to create a wrapper function that would do all those works under the hood and just pass this single wrapper function in postprocess_fn?

adarob commented 4 years ago
  1. Post-process is only applied during evaluation actually.
  2. That's correct, although we'd happily accept a PR to change the behavior to be more like the preprocessors!