crownpku / Information-Extraction-Chinese

Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取
2.22k stars 814 forks source link

Questions about data #69

Open zhiyuanhubj opened 6 years ago

zhiyuanhubj commented 6 years ago

Hi: I've read the description about the corpus in your blog. But I still have some questions about it. (1). It seems that you haven't consider to reduce the noises in datasets which generated by distant supervision. Have you ever use any priori knowledge to handle this datasets. Or do these two attentions on characters and sentences can reduce the noises? (2).Do the total train datasets consist of the train.txt including 1000 sentences in your Github and the open source project Roshanson/TextInfoExp including 89183 sentences? Hope to get your reply Thank you in advance!

MrRace commented 5 years ago

Hope to know the denoising part, I find that noise is common in this data set. For example: 赵一荻 张学良 夫妻 与赵一荻其实对张学良、赵一荻来说,他们真正最坏的结果,就是自由的丧失。 we can not extract the spousal relationship between "赵一荻" and "张学良" from the source text.