ccdv-ai / convert_checkpoint_to_lsg

Efficient Attention for Long Sequence Processing
MIT License
84 stars 11 forks source link

Convert DeBERTa to longDeBERTa #3

Open duyvuleo opened 1 year ago

duyvuleo commented 1 year ago

Hi,

Thanks for the great work.

Is it possible to convert DeBERTa models to longDeBERTa ones? Would you please help advise specific steps that I can follow?

Looking forward to your response. Thanks!

ccdv-ai commented 1 year ago

Hi @duyvuleo

Currently, converting DeBERTa to Long DeBERTa is not possible because this model uses on a specific attention mecanism called "disentangled attention" which relies on different inputs + relative positional embedding.

To make DeBERTa compatible, some things need to be rethought specifically for this model. I may add DeBERTa in the future.