duterscmy / ccks2019-ckbqa-4th-codes

中文知识库问答代码,CCKS2019 CKBQA评测第四名解决方案
477 stars 91 forks source link

prop_extractor.py #7

Closed zhengxiaoxuer closed 4 years ago

zhengxiaoxuer commented 4 years ago

@duterscmy 在prop_extractor.py 48行这一部分提取属性值,这里为什么要用question2mention来提取实体? try: max_props = self.question2mention[QUES][1] for p in max_props: mark_props[p] = p except: print('this question dont have long props') pass

duterscmy commented 4 years ago
font{
    line-height: 1.6;
}
ul,ol{
    padding-left: 20px;
    list-style-position: inside;
}

    是这样,因为无论是实体识别还是属性值抽取,都是为了保证实体的召回率够高,不然如果没把候选实体找出来的话,实体链接环节性能再好也是错的。所以我们是分别写了抽mention的方法再合并,这个question2mention就是另一个同学抽出来的实体存成了文件。我当时觉得加了这个程序结构有些乱就没传上来,现在放到data/里了。还有就是你跑完属性值抽取后就到了实体链接,实体链接我们做的一般,你可以参考第一名论文把特征完善一下。

On 11/5/2019 16:37,zhengxiaoxuer<notifications@github.com> wrote: 

@duterscmy 在prop_extractor.py 48行这一部分提取属性值,这里为什么要用question2mention来提取实体?

try:

max_props = self.question2mention[QUES][1]

for p in max_props:

mark_props[p] = p

except:

print('this question dont have long props')

pass

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.

zhangyanbo2007 commented 4 years ago

加我啊,兄弟,我也在复现楼主的代码,15821444815

MrRace commented 4 years ago

这个代码文件中有一处:

            for x in gold_entitys:
                if x[0] == '\"':
                    gold_props.append(x)

这个逻辑看不懂。。。。为啥引号开头的就是 gold_props,没有引号开头就不是???

duterscmy commented 4 years ago

因为在pkubase知识库里面,实体是用<>表示的,文本属性值是用双引号表示的,所以用双引号来进行了判断。pkubase知识库的预处理部分不是我写的,代码有些乱,周末整理下再传上来吧。

---Original--- From: "JaonLiu"<notifications@github.com> Date: Tue, Nov 19, 2019 19:19 PM To: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; Cc: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; Subject: Re: [duterscmy/ccks2019-ckbqa-4th-codes] prop_extractor.py (#7)

这个代码文件中有一处: for x in gold_entitys: if x[0] == '\"': gold_props.append(x)
这个逻辑看不懂。。。。为啥引号开头的就是 gold_props,没有引号开头就不是???

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.