This prevents a crash on TPU from occuring when the final chunk of the dev
set doesn't fit into your batch.
This is very wonky. I am pretty sure (but not <90% certain) that now I
am doing a pass over the Dev set once every 50 training
steps. Essentially, in librispeech_ctc.py, eval_steps_per_loop is 5,
and the batch size is 96. The number of TPU's is 8. Multiplying those
together, you get: 3840, which is greater than 2703, the size of the
dev set. I'm not 100% sure that this is correct, though.
Change the job type to "executor_tpu". This means that the eval and
train jobs will both run on the TPU. We probably want this long-term,
since CPU instances are not free for us, and we rarely evaluate the
dev set anyway.
Change learning_rate to 1e-4 based on Anajali's experiments.
I'm merging this. Currently we have a biased estimator of things like loss because we divide by the padded batch size rather than the true batch size, but it seems minor enough to ignore for now.
This prevents a crash on TPU from occuring when the final chunk of the dev set doesn't fit into your batch.
This is very wonky. I am pretty sure (but not <90% certain) that now I am doing a pass over the Dev set once every 50 training steps. Essentially, in librispeech_ctc.py, eval_steps_per_loop is 5, and the batch size is 96. The number of TPU's is 8. Multiplying those together, you get: 3840, which is greater than 2703, the size of the dev set. I'm not 100% sure that this is correct, though.
Change the job type to "executor_tpu". This means that the eval and train jobs will both run on the TPU. We probably want this long-term, since CPU instances are not free for us, and we rarely evaluate the dev set anyway.
Change learning_rate to 1e-4 based on Anajali's experiments.
Currently running an experiment here:
gs://the-peoples-speech-west-europe/training_logs/galvez/tpu_ctc_2h