tuple_filter.py中的疑问

MrRace commented 4 years ago

在tuple_filter.py 中的GetData_train函数有如下代码：

        for t in candidate_tuples:
            features = candidate_tuples[t]
            if len(gold_tuple) == len(set(gold_tuple).intersection(set(t))):
                X.append([features[9][0][1]])
                Y.append([1])
            else:
                prop = random.random()
                if prop<0.5:
                    X.append([features[9][0][1]])
                    Y.append([0])

为啥是取[features[9][0][1]] ? 请问下其背后的思考逻辑。谢谢！

duterscmy commented 4 years ago

font{
    line-height: 1.6;
}
ul,ol{
    padding-left: 20px;
    list-style-position: inside;
}

    是这样，因为之前这一步不仅有bert的相似度特征，还有一些字面匹配的，后来没什么用就放弃了，但是数据里还是保留了这些特征的，9就是bert特征的索引，[0][1]是因为我调的那个bert包得到的数据就得这样取索引。

在2019年12月11日 09:14，JaonLiu<notifications@github.com> 写道：

在tuple_filter.py 中的GetData_train函数有如下代码：

    for t in candidate_tuples:

        features = candidate_tuples[t]

        if len(gold_tuple) == len(set(gold_tuple).intersection(set(t))):

            X.append([features[9][0][1]])

            Y.append([1])

        else:

            prop = random.random()

            if prop<0.5:

                X.append([features[9][0][1]])

                Y.append([0])

为啥是取[features[9][0][1]] ? 请问下其背后的思考逻辑。谢谢！

—You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or unsubscribe.

ZainZhou commented 4 years ago

@MrRace 请问楼主你在运行entity_filter.py之后实体的召回率能达到多少？

MrRace commented 4 years ago

@duterscmy 那现在上传的这个版本其实仅利用到了BERT的特征？现在上传的这个版本features数据如下： (1)这种情况，怎么写X.append()？ (2)在生成负样本时，这种随机数生成的方式为啥能够确保0.05的负样本比例？谢谢~

ZainZhou commented 4 years ago

@MrRace 我是直接使用的X.append([features[2]])

MrRace commented 4 years ago

@MrRace 我是直接使用的X.append([features[2]]) 你的feature也是类似的结构吗？

duterscmy commented 4 years ago

那就直接append(x[-1])好了不能保证吧就是个大概的负例比例

---原始邮件--- 发件人: "JaonLiu"<notifications@github.com> 发送时间: 2019年12月11日(周三) 下午3:28 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18)

@duterscmy 那现在上传的这个版本其实仅利用到了BERT的特征？现在上传的这个版本features数据如下：

(1)这种情况，怎么写 X.append()？ (2)在生成负样本时，这种随机数生成的方式为啥能够确保0.05的负样本比例？谢谢~

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

MrRace commented 4 years ago

@duterscmy 那在SaveFilterCandiT中的new_features = features[0:2]+[features[9][0][1]] 需要改成： new_features = features ? 还是？

MrRace commented 4 years ago

单实体问题中，候选答案可召回的的比例为:0.730
候选答案能覆盖标准查询路径的比例为:0.461

在验证集上逻辑回归筛选后top10 召回率为0.72
单实体问题中，候选答案可召回的的比例为:0.731
候选答案能覆盖标准查询路径的比例为:0.560

@ZainZhou 你的呢？

1234560o commented 4 years ago

第二个逻辑回归模型只用bert特征吗，不加上之前的词频、长度、字重合度等特征吗？我理解的Bert返回的特征是一个数即正例的概率吧？

ZainZhou commented 4 years ago

@MrRace 我跑的tuple_filter的比你这个低很多，因为我前面实体抽取的召回率就偏低，所以才问你entity_filter.py你可以召回多少实体

MrRace commented 4 years ago

@MrRace 我跑的tuple_filter的比你这个低很多，因为我前面实体抽取的召回率就偏低，所以才问你entity_filter.py你可以召回多少实体在entity_filter.py上， 在验证集上逻辑回归top5筛选后，所有问题实体召回率为0.774，单实体问题实体召回率0.820 训练集的话，大概是0.8左右。

ZainZhou commented 4 years ago

@MrRace 那其实差不了多少，但不知道为什么后面tuple_filter的差20个点，我再研究研究吧

duterscmy commented 4 years ago

对没有加只用bert效果就很好

---原始邮件--- 发件人: "zwj"<notifications@github.com> 发送时间: 2019年12月11日(周三) 下午5:28 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18)

第二个逻辑回归模型只用bert特征吗，不加上之前的词频、长度、字重合度等特征吗？我理解的Bert返回的特征是一个数即正例的概率吧？

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

MrRace commented 4 years ago

@duterscmy 我运行tuple_filter.py的结果：

单实体问题中，候选答案可召回的的比例为:0.730
候选答案能覆盖标准查询路径的比例为:0.461
单实体问题中，候选答案可召回的的比例为:0.772
候选答案能覆盖标准查询路径的比例为:0.638

在验证集上逻辑回归筛选后top10 召回率为0.72
单实体问题中，候选答案可召回的的比例为:0.731
候选答案能覆盖标准查询路径的比例为:0.560

这个结果是偏低吗？你的大概多少？

duterscmy commented 4 years ago

是偏低啊，我这看单实体0.92，筛到5个人0.902。。可能是代码版本传错了但最近两天没空闲gpu用等我确定了一个对的版本传上来

---原始邮件--- 发件人: "JaonLiu"<notifications@github.com> 发送时间: 2019年12月12日(周四) 上午8:57 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18)

@duterscmy 我运行tuple_filter.py的结果：单实体问题中，候选答案可召回的的比例为:0.730 候选答案能覆盖标准查询路径的比例为:0.461 单实体问题中，候选答案可召回的的比例为:0.772 候选答案能覆盖标准查询路径的比例为:0.638 在验证集上逻辑回归筛选后top10 召回率为0.72 单实体问题中，候选答案可召回的的比例为:0.731 候选答案能覆盖标准查询路径的比例为:0.560
这个结果是偏低吗？你的大概多少？

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

duterscmy commented 4 years ago

我理解错了，这是候选答案的数据啊，我今晚把流程重新跑一下告诉你

---原始邮件--- 发件人: "JaonLiu"<notifications@github.com> 发送时间: 2019年12月12日(周四) 上午8:57 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18)

@duterscmy 我运行tuple_filter.py的结果：单实体问题中，候选答案可召回的的比例为:0.730 候选答案能覆盖标准查询路径的比例为:0.461 单实体问题中，候选答案可召回的的比例为:0.772 候选答案能覆盖标准查询路径的比例为:0.638 在验证集上逻辑回归筛选后top10 召回率为0.72 单实体问题中，候选答案可召回的的比例为:0.731 候选答案能覆盖标准查询路径的比例为:0.560
这个结果是偏低吗？你的大概多少？

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Keerlsm commented 4 years ago

我理解错了，这是候选答案的数据啊，我今晚把流程重新跑一下告诉你 … ---原始邮件--- 发件人: "JaonLiu"<notifications@github.com> 发送时间: 2019年12月12日(周四) 上午8:57 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18) @duterscmy 我运行tuple_filter.py的结果：单实体问题中，候选答案可召回的的比例为:0.730 候选答案能覆盖标准查询路径的比例为:0.461 单实体问题中，候选答案可召回的的比例为:0.772 候选答案能覆盖标准查询路径的比例为:0.638 在验证集上逻辑回归筛选后top10 召回率为0.72 单实体问题中，候选答案可召回的的比例为:0.731 候选答案能覆盖标准查询路径的比例为:0.560 这个结果是偏低吗？你的大概多少？ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

我运行tuple_filter.py的结果和上面相近，是不是参数或模型哪里有变化？我最近在做相关的工作，希望能够复现你提交的结果

counten commented 4 years ago

我理解错了，这是候选答案的数据啊，我今晚把流程重新跑一下告诉你 … ---原始邮件--- 发件人: "JaonLiu"[notifications@github.com](mailto:notifications@github.com) 发送时间: 2019年12月12日(周四) 上午8:57 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"[ccks2019-ckbqa-4th-codes@noreply.github.com](mailto:ccks2019-ckbqa-4th-codes@noreply.github.com); 抄送: "Mention"[mention@noreply.github.com](mailto:mention@noreply.github.com);"Caomingyu"[1054527636@qq.com](mailto:1054527636@qq.com); 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18) @duterscmy 我运行tuple_filter.py的结果：单实体问题中，候选答案可召回的的比例为:0.730 候选答案能覆盖标准查询路径的比例为:0.461 单实体问题中，候选答案可召回的的比例为:0.772 候选答案能覆盖标准查询路径的比例为:0.638 在验证集上逻辑回归筛选后top10 召回率为0.72 单实体问题中，候选答案可召回的的比例为:0.731 候选答案能覆盖标准查询路径的比例为:0.560 这个结果是偏低吗？你的大概多少？ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

我运行tuple_filter.py的结果和上面相近，是不是参数或模型哪里有变化？我最近在做相关的工作，希望能够复现你提交的结果

朋友，问题解决了吗，我运行的结果也差不多：还望指教

单实体问题中，候选答案可召回的的比例为:0.745
候选答案能覆盖标准查询路径的比例为:0.471
单实体问题中，候选答案可召回的的比例为:0.755
候选答案能覆盖标准查询路径的比例为:0.579

在验证集上逻辑回归筛选后top10 召回率为0.74
单实体问题中，候选答案可召回的的比例为:0.748
候选答案能覆盖标准查询路径的比例为:0.573

liupenggg commented 4 years ago

@duterscmy 我运行tuple_filter.py的结果：

单实体问题中，候选答案可召回的的比例为:0.730
候选答案能覆盖标准查询路径的比例为:0.461
单实体问题中，候选答案可召回的的比例为:0.772
候选答案能覆盖标准查询路径的比例为:0.638

在验证集上逻辑回归筛选后top10 召回率为0.72
单实体问题中，候选答案可召回的的比例为:0.731
候选答案能覆盖标准查询路径的比例为:0.560

这个结果是偏低吗？你的大概多少？

为啥跑出来全是0，是哪里出问题了吗？

binglinchengxiash commented 3 years ago

@duterscmy 那在SaveFilterCandiT中的new_features = features[0:2]+[features[9][0][1]] 需要改成： new_features = features ? 还是？

这个features应该怎么写啊？解决了吗？

JeffSuu commented 3 years ago

@duterscmy 那在SaveFilterCandiT中的new_features = features[0:2]+[features[9][0][1]] 需要改成： new_features = features ? 还是？

这个features应该怎么写啊？解决了吗？

请问这个features的问题解决了吗？写成new_features = features的效果好差。

duterscmy / ccks2019-ckbqa-4th-codes

tuple_filter.py中的疑问 #18