THUDM / Chinese-Transformer-XL

218 stars 36 forks source link

twice dropout for input embedding?? #8

Closed zixiliuUSC closed 2 years ago

zixiliuUSC commented 2 years ago

I step through this repo and find an interesting typo (maybe?) that the hidden states are dropout twice. So why do you impletement in this way but not use a higher dropout rate? Is it a typo? If it is a type, do you use this typoed code for pre-training?

https://github.com/THUDM/Chinese-Transformer-XL/blob/0451869ee1c435929fcf5851e4a86a8b228a5e8f/mpu/transformer.py#L534

https://github.com/THUDM/Chinese-Transformer-XL/blob/0451869ee1c435929fcf5851e4a86a8b228a5e8f/mpu/transformer.py#L540

duzx16 commented 2 years ago

Yes, it's a typo. Thank you for pointing it out. We pre-trained the model with this typo in the code and it led to a higher dropout rate.