Closed cmacdonald closed 5 years ago
Yes, pre-computing the wordpiece tokens would save time on each validation iteration. But based on some measurements I've taken previously, tokenization is a very small component of the total runtime. The network itself takes considerably more time to run than all other parts. Upcoming enhancements to CUDNN (and corresponding changes to pytorch) should improve the performance of the self-attention components of BERT in the future, improving the validation speed.
In the meantime, perhaps consider using a smaller subset of the validation data.
Yes, have gone that route. Thanks for your input on the measurements. Closing.
Validation for large cutoffs and numbers of queries can be slower than training. Are there any optimisations that can be done? E.g. tokenising just once, rather than for each iteration?