Open g-jing opened 5 years ago
should be better
@LiangYuHai Thanks for your answer. Did you try such comparison before ?
Crf is the best model for ner tasks. I guess it will definitely improve the performance. I'm trying on that as well.
@geyingli I think crf is a good decoder for sequence task. But bert is very powerful and already contains sequence information. Do you get a good result on crf now ? Besides, I find finetuning bert on NER task helps a lot.
@geyingli I think crf is a good decoder for sequence task. But bert is very powerful and already contains sequence information. Do you get a good result on crf now ? Besides, I find finetuning bert on NER task helps a lot.
Hi! Could you share some details in your training ? Like learning rate , batch_size or other tricks. Cause I finetune bert on a Chinese NER dataset and didn't get better result than traditional bilstm-crf model. It would be helpful if you could share some details.Thanks~
There is not much trick on bert finetuning. I could share some details if that helps. Batch size is 32, optimizer is SGD instead of Adam. It is finetune on NER task.
in my case, i got the best result(92.1 ~ 92.23 f1 score by conlleval) for CoNLL(english) data given :
if no crf, the f1 scores are in range 91.2 ~ 91.8.
but 92.23p is not the average and still behind the score in the paper(BERT).
i think the ELMo + Glove embedding is more powerful for NER. (92.5 ~ 92.8 f1 score)
@dsindex I noticed that you added LSTM layer on the top of BERT, do you think it performs better that without LSTM ? Thanks
@RoderickGu
i think the difference is not that significant but better to use. my experiment shows that lstm gives 0.1~ 0.2% gain over bert only with fine-tune.
@dsindex Thanks for your suggestions
@dsindex I noticed that you added LSTM layer on the top of BERT, do you think it performs better that without LSTM ? Thanks
For all my NER tasks, LSTM on top of BERT consistently boosts the performance.
@dsindex I noticed that you added LSTM layer on the top of BERT, do you think it performs better that without LSTM ? Thanks
For all my NER tasks, LSTM on top of BERT consistently boosts the performance.
That could be interesting results.
In my NER task, bert-crf got a better F1 score than bert-softmax, about 2% better.
I know in the original paper, they use softmax at the end of the module, but I wonder whether using crf will improve the performance ? Thanks