Enable batching in `DeeBert` at inference time

JulesBelveze / bert-squeeze

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

https://julesbelveze.github.io/bert-squeeze/

78 stars 10 forks source link

Enable batching in `DeeBert` at inference time #50

Closed JulesBelveze closed 1 year ago

JulesBelveze commented 1 year ago

Since we need to asses the amount of information carried out by one prediction it is not possible to have a batch size greater than 1 at inference time.

I have implemented a workaround for FastBert which can be found here and should be adjustable to DeeBert.