chenyangh / DSLP

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation
MIT License
43 stars 5 forks source link

Training time cost per epoch in GLAT with DSLP #6

Closed hemingkx closed 2 years ago

hemingkx commented 2 years ago

Hi all,

Thanks very much for your awesome code!

I noticed there are some differences between your GLAT implementation and the repo here. I tried both and found that the training time cost increased rapidly during the training (for epoch1, it cost 10 min but for epoch 50, 120min). I wonder if you have encountered this in your experiments and what causes this.

Thanks very much! hemingkx

chenyangh commented 2 years ago

Hi,

I did not have such experience, the training time seemed to be linear. Is it because of insufficient RAM or something? Do you observe this slowing down on my code base, the official GLAT, or both?

hemingkx commented 2 years ago

Thanks for your early reply!

I found that this is because I installed fairseq without '--editable' option. BTW, do you know https://github.com/FLC777/GLAT is the official GLAT implementation or not?

chenyangh commented 2 years ago

Yes, it is.