Closed zhou-quan closed 4 years ago
Thanks for your interest in our work! About “StackPropagation+BERT”, I hope the following suggestions may help you. 1) We only replace the self-attentive encoder with BERT-base model and keep other components unchanged.
2) We adopt the first piece of word representation as the whole word representation.
3) We keep the BERT model as fine-tuning mode.
4) We carefully tune the learning rate of BERT model in a range of [10^{-6} to 10^{-5}]
Hope it helps.
Could you just release your code for BERT? It's painful to do the data process. And I find that encoder and decoder output is not formated as Batch Times E, which is troublesome to change for BERT.
Could you just release your code for BERT? It's painful to do the data process. And I find that encoder and decoder output is not formated as Batch Times E, which is troublesome to change for BERT.
Hi, I also encountered this problem, did you succeed in implementing it?
您好,看到您论文中有使用bert作为词向量的测试结果,但是在module中没有找到相关代码,可否补充一下?