Since we need to asses the amount of information carried out by one prediction it is not possible to have a batch size greater than 1 at inference time.
I have implemented a workaround for FastBert which can be found here and should be adjustable to DeeBert.
Since we need to asses the amount of information carried out by one prediction it is not possible to have a batch size greater than 1 at inference time.
I have implemented a workaround for FastBert which can be found here and should be adjustable to DeeBert.