EMNLP2016 数据集存在异常情况？

iseesaw commented 2 years ago

作者您好，感谢您提供的数据集。我发现 EMNLP2016 数据集中，answer 和 context 的部分句子结尾包含 0 和 1 等数字，很多和语义不相关，貌似不太正常，请问这是本来就有的吗？

[
    {
        "context": [
            "i do n t have enough time to give it what it needs",
            "i m french living in mauritius currently working in paris for a week dspace will be the end of me",
            "i do n t have enough time to give it what it needs",
            "enjoy ur week there i have n t used dspace properly but having installed it locally and seen some of"
        ],
        "relation_at": [[3, 2]],
        "ctx_spk": [1, 1, 1, 2],
        "ctx_adr": [-1, 2, -1, 1],
        "answer":
        "as far as tomcat apps goes it s really nice hehe 0",
        "ans_idx":
        3,
        "ans_spk":
        1,
        "ans_adr":
        2
    },
    {
        "context": [
            "i do n t have enough time to give it what it needs",
            "enjoy ur week there i have n t used dspace properly but having installed it locally and seen some of",
            "as far as tomcat apps goes it s really nice hehe 0",
            "i m just drowning in the workload dspace is 10 of my salary but takes like half of my time"
        ],
        "relation_at": [[1, 0], [2, 1]],
        "ctx_spk": [1, 2, 1, 1],
        "ctx_adr": [-1, 1, 2, -1],
        "answer":
        "yes as far as tomcat apps goes it s really nice 1",
        "ans_idx":
        1,
        "ans_spk":
        1,
        "ans_adr":
        2
    },
    {
        "context": [
            "you can see who is where at https wiki ubuntu com africanteams preview",
            "o", "hello africa", "welcome padroni"
        ],
        "relation_at": [],
        "ctx_spk": [1, 2, 3, 3],
        "ctx_adr": [-1, -1, -1, -1],
        "answer":
        "and get us some chicks lol i missed bq flash today 1",
        "ans_idx":
        0,
        "ans_spk":
        3,
        "ans_adr":
        1
    },
    {
        "context": [
            "it s more like the city of terrible weather", "lol",
            "got a bq for my friend", "yeey"
        ],
        "relation_at": [],
        "ctx_spk": [1, 2, 3, 3],
        "ctx_adr": [-1, -1, -1, -1],
        "answer":
        "i m here flash sale is still open i think 1",
        "ans_idx":
        1,
        "ans_spk":
        3,
        "ans_adr":
        2
    },
    {
        "context": [
            "lol", "got a bq for my friend", "yeey",
            "i m here flash sale is still open i think 1"
        ],
        "relation_at": [[3, 0]],
        "ctx_spk": [1, 2, 2, 2],
        "ctx_adr": [-1, -1, -1, 1],
        "answer":
        "workload is heavy in all of africa looks like do u believe all the shit about paris haha 1",
        "ans_idx":
        0,
        "ans_spk":
        2,
        "ans_adr":
        1
    }
]

JasonForJoy commented 2 years ago

@iseesaw 您好，抱歉回复晚了。数据集确实原本就是这样的，为了进行公平的比较，我们使用的是Le et al., (2019)论文中使用的数据版本。 Le et al., (2019)对原始的Ouchi and Tsuboi (2016)数据集进行了处理，但具体的处理脚本未知。是否原始的Ouchi and Tsuboi (2016)数据集也是如此，我们没有使用原始数据集，因此没有办法给出回答。

Le et al., Who Is Speaking to Whom? Learning to identify utterance addressee in multi-party conversations. EMNLP 2019. Ouchi and Tsuboi. Addressee and response selection for multi-party conversation. EMNLP 2016.

iseesaw commented 2 years ago

好的，谢谢！！

另外请问下 ACL2022 的 HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations 预计什么时候放出来呢？期待看到这份工作 ^_^

JasonForJoy commented 2 years ago

@iseesaw 感谢对我们工作的关注。 HeterMPC这篇论文的Preprint已提交arXiv，应该会在近一两天审核通过后可以访问，代码预计会在五月初整理好发布在 https://github.com/lxchtan/HeterMPC 欢迎关注和交流。

iseesaw commented 2 years ago

好滴，谢谢~

JasonForJoy / MPC-BERT

EMNLP2016 数据集存在异常情况？ #4