ericput / bert-ner

This is a named entity recognizer based on pytorch-pretrained-bert.
MIT License
33 stars 8 forks source link

How to reproduce the performance result #1

Open lxy444 opened 5 years ago

lxy444 commented 5 years ago

Could you please add some guide on how to reproduce the performance result?

For example, how to reproduce the result on the MSRA dataset?

Thanks.

ericput commented 5 years ago
  1. Download the pretrained chinese bert and convert it to PyTorch version.

  2. Preprocess the MSRA dataset

l Split sentences to avoid too much truncation. Especially in the test phase, truncation will harm the scores.

l Turn chunk level label to bert_token level label, for example:

n 希望工程/o -> 希/O 望/O工/O程/O

n 北京市/ns -> 北/B-NS京/I-NS市/I-NS

l You can refer to ‘preprocess_msra.py’

  1. Just follow the ‘task_config.yaml’.

发件人: 李向阳mailto:notifications@github.com 发送时间: 2019年3月23日 17:17 收件人: ericput/bert-nermailto:bert-ner@noreply.github.com 抄送: Subscribedmailto:subscribed@noreply.github.com 主题: [ericput/bert-ner] How to reproduce the performance result (#1)

Could you please add some guide on how to reproduce the performance result?

For example, how to reproduce the result on the MSRA dataset?

Thanks.

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ericput/bert-ner/issues/1, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJy2y3BibuLOwweqf5KM_I2N-vKtR2wsks5vZfE6gaJpZM4cEz04.

Louis-udm commented 5 years ago

I implemented a new version using BERT+CRF, If you think my version is good, give it a star please. @lxy444

lxy444 commented 5 years ago

I implemented a new version using BERT+CRF, If you think my version is good, give it a star please. @lxy444

OK, thanks.

lxy444 commented 5 years ago
  1. Download the pretrained chinese bert and convert it to PyTorch version. 2. Preprocess the MSRA dataset l Split sentences to avoid too much truncation. Especially in the test phase, truncation will harm the scores. l Turn chunk level label to bert_token level label, for example: n 希望工程/o -> 希/O 望/O工/O程/O n 北京市/ns -> 北/B-NS京/I-NS市/I-NS l You can refer to ‘preprocess_msra.py’ 3. Just follow the ‘task_config.yaml’. 发件人: 李向阳mailto:notifications@github.com 发送时间: 2019年3月23日 17:17 收件人: ericput/bert-nermailto:bert-ner@noreply.github.com 抄送: Subscribedmailto:subscribed@noreply.github.com 主题: [ericput/bert-ner] How to reproduce the performance result (#1) Could you please add some guide on how to reproduce the performance result? For example, how to reproduce the result on the MSRA dataset? Thanks. ― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#1>, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJy2y3BibuLOwweqf5KM_I2N-vKtR2wsks5vZfE6gaJpZM4cEz04.

Thanks, I follow your instruction and after finished training, I got a prediction result file named "test.predict". I guess it should be the predicted label of the test data.

However, I don't see any evaluation metric performance, even in the training stage, there are not any accuracy output. How could I get the performance result?