Open stefan-it opened 1 year ago
Any updates on this?
Kindly pinging @DaoTranbk and @HyTruongSon.
Thank @stefan-it for your interest in the v3 pretraining of DeBERTa.
In this work, we have modified the code of DeBERTa to add GDES in pretraining, following the DeBERTaV3 paper. If you are interested in that implementation, you can take a look on the latest v3 pretraining code at the original source: https://github.com/microsoft/DeBERTa.
Hope it can be helpful for you.
Regards, Cong Dao
Hi @DaoTranbk and @HyTruongSon,
many thanks for open sourcing the repo for ViDeBERTa!
I'm very interested in the v3 pretraining of a DeBERTa model. In the current version of the pretraining code, I can see that the normal DeBERTa package is called:
https://github.com/HySonLab/ViDeBERTa/blob/8270cceb4833bbfa13b4b4d9c4859968501a96be/pre-training/bash/pre-train_model.sh#L13
However, the publicly available DeBERTa code does not yet include the support of Gradient Disentangled Embedding Sharing (GDES), see e.g.: https://github.com/microsoft/DeBERTa/issues/93.
Did you modify the code to add support for GDES? I would highly be interested in that implementation.
Many thanks and cheers,
Stefan