Closed leehelenah closed 4 years ago
Transformer have more parameters than RCNN, which need more data to fit it. It's also the reason that transformer-based pretrain LM models needs huge corpus. So if you have a large dataset, maybe the result will be different slightly.
Hello,
Thanks for the nice implementation. I notice you set n_layers= 1 in
conf/train.json
I thought most of the time, people set n_layers to 6 or even higher in their experiments. Would that be a reason that the Transformer model doesn't outperform RCNN in your results? Thank you.