Question about v3 pretraining code of DeBERTa

stefan-it commented 1 year ago

Hi @DaoTranbk and @HyTruongSon,

many thanks for open sourcing the repo for ViDeBERTa!

I'm very interested in the v3 pretraining of a DeBERTa model. In the current version of the pretraining code, I can see that the normal DeBERTa package is called:

https://github.com/HySonLab/ViDeBERTa/blob/8270cceb4833bbfa13b4b4d9c4859968501a96be/pre-training/bash/pre-train_model.sh#L13

However, the publicly available DeBERTa code does not yet include the support of Gradient Disentangled Embedding Sharing (GDES), see e.g.: https://github.com/microsoft/DeBERTa/issues/93.

Did you modify the code to add support for GDES? I would highly be interested in that implementation.

Many thanks and cheers,

Stefan

musabgultekin commented 1 year ago

Any updates on this?

musabgultekin commented 1 year ago

Kindly pinging @DaoTranbk and @HyTruongSon.

DaoTranbk commented 1 year ago

Thank @stefan-it for your interest in the v3 pretraining of DeBERTa.

In this work, we have modified the code of DeBERTa to add GDES in pretraining, following the DeBERTaV3 paper. If you are interested in that implementation, you can take a look on the latest v3 pretraining code at the original source: https://github.com/microsoft/DeBERTa.

Hope it can be helpful for you.

Regards, Cong Dao

HySonLab / ViDeBERTa

Question about v3 pretraining code of DeBERTa #1