Closed zhenshiqi1996 closed 3 years ago
i also have this question
i also have this question
我觉得两次dropout其实就是把同一个输入输入到模型两次。两次会产生一些微小差别。
i also have this question
我觉得两次dropout其实就是把同一个输入输入到模型两次。两次会产生一些微小差别。
那KL散度呢 源码里没找打哎
i also have this question
我觉得两次dropout其实就是把同一个输入输入到模型两次。两次会产生一些微小差别。
那KL散度呢 源码里没找打哎
它文章里的Usage里已经写了损失计算公式呀。
写在了modeling_bert.py 里面bert_kl 函数就是
Can you find the code for the two dropouts?