galv / lingvo-copy

Apache License 2.0
4 stars 0 forks source link

- Pad batch dimension. #5

Closed galv closed 4 years ago

galv commented 4 years ago

This prevents a crash on TPU from occuring when the final chunk of the dev set doesn't fit into your batch.

This is very wonky. I am pretty sure (but not <90% certain) that now I am doing a pass over the Dev set once every 50 training steps. Essentially, in librispeech_ctc.py, eval_steps_per_loop is 5, and the batch size is 96. The number of TPU's is 8. Multiplying those together, you get: 3840, which is greater than 2703, the size of the dev set. I'm not 100% sure that this is correct, though.

Currently running an experiment here:

gs://the-peoples-speech-west-europe/training_logs/galvez/tpu_ctc_2h

galv commented 4 years ago

I'm merging this. Currently we have a biased estimator of things like loss because we divide by the padded batch size rather than the true batch size, but it seems minor enough to ignore for now.