JasonForJoy / MPC-BERT

GIFT (ACL 2023) & MPC-BERT (ACL 2021) for Multi-Party Conversation Understanding
39 stars 7 forks source link

EMNLP2016 数据集存在异常情况? #4

Closed iseesaw closed 2 years ago

iseesaw commented 2 years ago

作者您好,感谢您提供的数据集。 我发现 EMNLP2016 数据集中,answercontext 的部分句子结尾包含 01 等数字,很多和语义不相关,貌似不太正常,请问这是本来就有的吗?

[
    {
        "context": [
            "i do n t have enough time to give it what it needs",
            "i m french living in mauritius currently working in paris for a week dspace will be the end of me",
            "i do n t have enough time to give it what it needs",
            "enjoy ur week there i have n t used dspace properly but having installed it locally and seen some of"
        ],
        "relation_at": [[3, 2]],
        "ctx_spk": [1, 1, 1, 2],
        "ctx_adr": [-1, 2, -1, 1],
        "answer":
        "as far as tomcat apps goes it s really nice hehe 0",
        "ans_idx":
        3,
        "ans_spk":
        1,
        "ans_adr":
        2
    },
    {
        "context": [
            "i do n t have enough time to give it what it needs",
            "enjoy ur week there i have n t used dspace properly but having installed it locally and seen some of",
            "as far as tomcat apps goes it s really nice hehe 0",
            "i m just drowning in the workload dspace is 10 of my salary but takes like half of my time"
        ],
        "relation_at": [[1, 0], [2, 1]],
        "ctx_spk": [1, 2, 1, 1],
        "ctx_adr": [-1, 1, 2, -1],
        "answer":
        "yes as far as tomcat apps goes it s really nice 1",
        "ans_idx":
        1,
        "ans_spk":
        1,
        "ans_adr":
        2
    },
    {
        "context": [
            "you can see who is where at https wiki ubuntu com africanteams preview",
            "o", "hello africa", "welcome padroni"
        ],
        "relation_at": [],
        "ctx_spk": [1, 2, 3, 3],
        "ctx_adr": [-1, -1, -1, -1],
        "answer":
        "and get us some chicks lol i missed bq flash today 1",
        "ans_idx":
        0,
        "ans_spk":
        3,
        "ans_adr":
        1
    },
    {
        "context": [
            "it s more like the city of terrible weather", "lol",
            "got a bq for my friend", "yeey"
        ],
        "relation_at": [],
        "ctx_spk": [1, 2, 3, 3],
        "ctx_adr": [-1, -1, -1, -1],
        "answer":
        "i m here flash sale is still open i think 1",
        "ans_idx":
        1,
        "ans_spk":
        3,
        "ans_adr":
        2
    },
    {
        "context": [
            "lol", "got a bq for my friend", "yeey",
            "i m here flash sale is still open i think 1"
        ],
        "relation_at": [[3, 0]],
        "ctx_spk": [1, 2, 2, 2],
        "ctx_adr": [-1, -1, -1, 1],
        "answer":
        "workload is heavy in all of africa looks like do u believe all the shit about paris haha 1",
        "ans_idx":
        0,
        "ans_spk":
        2,
        "ans_adr":
        1
    }
]
JasonForJoy commented 2 years ago

@iseesaw 您好,抱歉回复晚了。 数据集确实原本就是这样的,为了进行公平的比较,我们使用的是Le et al., (2019)论文中使用的数据版本。 Le et al., (2019)对原始的Ouchi and Tsuboi (2016)数据集进行了处理,但具体的处理脚本未知。是否原始的Ouchi and Tsuboi (2016)数据集也是如此,我们没有使用原始数据集,因此没有办法给出回答。

Le et al., Who Is Speaking to Whom? Learning to identify utterance addressee in multi-party conversations. EMNLP 2019. Ouchi and Tsuboi. Addressee and response selection for multi-party conversation. EMNLP 2016.

iseesaw commented 2 years ago

好的,谢谢!!

另外请问下 ACL2022 的 HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations 预计什么时候放出来呢? 期待看到这份工作 ^_^

JasonForJoy commented 2 years ago

@iseesaw 感谢对我们工作的关注。 HeterMPC这篇论文的Preprint已提交arXiv,应该会在近一两天审核通过后可以访问,代码预计会在五月初整理好发布在 https://github.com/lxchtan/HeterMPC 欢迎关注和交流。

iseesaw commented 2 years ago

好滴,谢谢~