DeeBert models need to be fine-tuned in a two step fashion: first the final layer and then the ramps.
The current implementation requires the user to do two different training. However, this can be achieved in one-shot using a pl.Callback, as done for TheseusBert.
DeeBert models need to be fine-tuned in a two step fashion: first the final layer and then the ramps. The current implementation requires the user to do two different training. However, this can be achieved in one-shot using a
pl.Callback
, as done for TheseusBert.