Open zhiyuanhubj opened 6 years ago
Hope to know the denoising part, I find that noise is common in this data set. For example:
赵一荻 张学良 夫妻 与赵一荻其实对张学良、赵一荻来说,他们真正最坏的结果,就是自由的丧失。
we can not extract the spousal relationship between "赵一荻" and "张学良" from the source text.
Hi: I've read the description about the corpus in your blog. But I still have some questions about it. (1). It seems that you haven't consider to reduce the noises in datasets which generated by distant supervision. Have you ever use any priori knowledge to handle this datasets. Or do these two attentions on characters and sentences can reduce the noises? (2).Do the total train datasets consist of the train.txt including 1000 sentences in your Github and the open source project Roshanson/TextInfoExp including 89183 sentences? Hope to get your reply Thank you in advance!