When I'm reading the pretraining code, the comment says drop_remainder should be true when it is training and should be false when evaluating, but the code confuses me a lot. I am not sure whether it's a bug or I missed something.
The code is here:run_pretraining.py
# For training, we want a lot of parallel reading and shuffling.
# For eval, we want no shuffling and parallel reading doesn't matter.
if is_training:
d = tf.data.Dataset.from_tensor_slices(tf.constant(input_files))
d = d.repeat()
d = d.shuffle(buffer_size=len(input_files))
# `cycle_length` is the number of parallel files that get read.
cycle_length = min(num_cpu_threads, len(input_files))
# `sloppy` mode means that the interleaving is not exact. This adds
# even more randomness to the training pipeline.
d = d.apply(
tf.contrib.data.parallel_interleave(
tf.data.TFRecordDataset,
sloppy=is_training,
cycle_length=cycle_length))
d = d.shuffle(buffer_size=100)
else:
d = tf.data.TFRecordDataset(input_files)
# Since we evaluate for a fixed number of steps we don't want to encounter
# out-of-range exceptions.
d = d.repeat()
# We must `drop_remainder` on training because the TPU requires fixed
# size dimensions. For eval, we assume we are evaluating on the CPU or GPU
# and we *don't* want to drop the remainder, otherwise we wont cover
# every sample.
d = d.apply(
tf.contrib.data.map_and_batch(
lambda record: _decode_record(record, name_to_features),
batch_size=batch_size,
num_parallel_batches=num_cpu_threads,
drop_remainder=True))
return d
Clearly, the funtion d = d.apply(tf.contrib.data.map_and_batch(...)) is always exeuted no matter it is traing or evaluating. But it can lead to drop data when evaluating which can be a problem. Am I right? Thanks in advance @jacobdevlin-google
When I'm reading the pretraining code, the comment says
drop_remainder
should betrue
when it is training and should be false when evaluating, but the code confuses me a lot. I am not sure whether it's a bug or I missed something. The code is here:run_pretraining.pyClearly, the funtion
d = d.apply(tf.contrib.data.map_and_batch(...))
is always exeuted no matter it is traing or evaluating. But it can lead to drop data when evaluating which can be a problem. Am I right? Thanks in advance @jacobdevlin-google