Closed lairikeqiA closed 4 years ago
This is due to the fact that on TPUs the batch size has to be constant for all batches, including the last one, so the examples at the end have to be padded to the nearest multiple of the batch size, and then it gets discarded after prediction. You can set the test_batch_size
argument to the exact number of examples that you have (if it's small enough to fit) and create a single batch without any padding.
To be specific, I took 10 examples from the test dataset as a small test dataset. When I created data, I got the results as follows: Num questions processed:10 Num examples:8 Num conversion errors:2 Padded with 24 examples. Why are there some padded examples? How many examples are produced under different conditions?
In addition, I got the last layer's outputs of bert and printed these. Why are there more outputs than samples in the test dataset? Why are there always some repetitive outputs at the end?