google-research / bigbird

Transformers for Longer Sequences
https://arxiv.org/abs/2007.14062
Apache License 2.0
563 stars 101 forks source link

How can we finetune the pretrained model using tfrecord files? #19

Closed gymbeijing closed 3 years ago

gymbeijing commented 3 years ago

I've tried to finetune the model on my own text summarization dataset. Before doing that, I tested using tfrecord as the input file. So I put /tmp/bigb/tfds/aeslc/1.0.0 as data_dir:

flags.DEFINE_string(
    "data_dir", "/tmp/bigb/tfds/aeslc/1.0.0",
    "The input data dir. Should contain the TFRecord files. "
    "Can be TF Dataset with prefix tfds://")

Then I run run_summarization.py. But I got the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Feature: document (data type: string) is required but could not be found.
         [[{{node ParseSingleExample/ParseExample/ParseExampleV2}}]]
         [[MultiDeviceIteratorGetNextFromShard]]
         [[RemoteCall]]
         [[IteratorGetNext]]
         [[Mean/_19475]]
  (1) Invalid argument: Feature: document (data type: string) is required but could not be found.
         [[{{node ParseSingleExample/ParseExample/ParseExampleV2}}]]
         [[MultiDeviceIteratorGetNextFromShard]]
         [[RemoteCall]]
         [[IteratorGetNext]]

Could anyone advise me how to finetune the model using tfrecord as the input file?

gymbeijing commented 3 years ago

I solved this problem by replacing the name_to_features fields with the actual fields in the tfrecord file.