issues
search
dropreg
/
R-Drop
867
stars
107
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Clarification on Using Concatenated Input for R-Drop Training
#33
xyb314
closed
7 months ago
1
How to use the data parallel in r-drop.
#32
xinxinxing
closed
1 year ago
1
Question of the proof
#31
SYSUykLin
closed
2 years ago
1
Some question about reproducing GLUE
#30
wpwpwpyo
opened
2 years ago
0
kl loss in ViT example supposed to be divided by 2?
#29
sieu-n
closed
2 years ago
1
How the `warmup steps` affects the performance?
#28
Doragd
closed
2 years ago
2
Can I use R-Drop in Semantic Search?
#27
ralgond
closed
2 years ago
1
Update README.md
#26
double22a
closed
2 years ago
0
Can not reproduce following the hyperparameter in the paper for finefuning ViT on Cifar100
#25
NamlessM
closed
2 years ago
4
About the implementation in transformers, where the reduction in ce_loss uses the mean (by default), while KL uses the reduction is sum ?
#24
XiaoqingNLP
closed
2 years ago
1
can not reproduce the results following the hyparameters in the paper
#23
leoozy
closed
2 years ago
1
error: argument --task: invalid choice: 'rdrop_translation'
#22
tairan-w
closed
2 years ago
1
difference between R-Drop and SimCse + Smart
#21
cuixuage
closed
2 years ago
1
Can mseloss replace KL divergence?
#20
18335100284
closed
2 years ago
1
Training configuration for the WMT14 EnDe dataset?
#19
frankang
closed
2 years ago
5
Where is R-Drop code in R-Drop/huggingface_transformer_src/bert_rdrop/run_glue.py?
#18
zhenshiqi1996
closed
2 years ago
6
JS divergence in the research paper?
#17
sieu-n
closed
2 years ago
1
unable to reproduce results on GLUE
#16
1024er
closed
2 years ago
2
pip install --editable .报错
#15
Shiwen-Ni
closed
2 years ago
1
Unable to preprocess data for summarization
#14
samiksome92
closed
2 years ago
2
What's Wrong with my TensorFlow (1.14 or 1.15) implementation?
#13
guotong1988
closed
2 years ago
2
R-drop makes my model broken.
#12
MayDomine
closed
2 years ago
9
Readme File for RoBerta example.
#11
ShreyPandit
closed
3 years ago
1
A simple way to double the impact of R-Drop
#10
guotong1988
closed
3 years ago
3
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED
#9
paul-chelarescu
closed
3 years ago
2
Summarization task fails with 'Trying to backward through the graph a second time'
#8
paul-chelarescu
closed
3 years ago
2
what the dropout should be set when we predict or test?
#7
hitwangshuai
closed
3 years ago
2
Inconsistency for KL loss and CE loss hyper-parameters and baselines results in GLUE
#6
zhangzhenyu13
closed
3 years ago
5
Will KLD loss degrease very fast?
#5
snsun
closed
2 years ago
9
Fairseq tasks install work?
#4
kungfu-eric
closed
3 years ago
1
Why you use (p, q_tec) and (q, p_tec) rather than (p, q) and (q, p) to compute kl-loss?
#3
JaheimLee
closed
3 years ago
0
What are the core code lines of R-Drop? Thank you very much.
#2
guotong1988
closed
3 years ago
1
What are the core code lines of R-Drop? Thank you very much.
#1
guotong1988
closed
3 years ago
0