berlino / tensor2struct-public

Semantic parsers based on encoder-decoder framework
MIT License
90 stars 23 forks source link

Pre-cache the DB similarity matrix for DBScheduler #12

Closed tomsherborne closed 2 years ago

tomsherborne commented 2 years ago

Hi Bailin,

I've implemented a caching approach for the database similarity matrix when using DBScheduler -- the current version calls SpaCy every step which is inefficient. If you do this once at the start of training you get a big speedup during BERT- based training (at least from what I have observed).

This might also help with Issue #11

Let me know if this is helpful or if you want to move this to the not-public version of the code.

Thanks, Tom