Pre-cache the DB similarity matrix for DBScheduler

Hi Bailin,

I've implemented a caching approach for the database similarity matrix when using DBScheduler -- the current version calls SpaCy every step which is inefficient. If you do this once at the start of training you get a big speedup during BERT- based training (at least from what I have observed).

This might also help with Issue #11

Let me know if this is helpful or if you want to move this to the not-public version of the code.

Thanks, Tom

berlino / tensor2struct-public

Pre-cache the DB similarity matrix for DBScheduler #12