alibaba / EasyNLP

EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
Apache License 2.0
2.03k stars 250 forks source link

通过_知识预训练实践_教程下载的dkplm/train_corpus.txt等2个数据集是错的吗? #311

Open LemonWade opened 1 year ago

LemonWade commented 1 year ago

下载下来的训练集都是一下这句话。

{'text': '通常来说,人类想获得针对某种的[ENT]特异性抗体[ENT]有两种方式,要么是通过自然感染,要么是通过[ENT]疫苗接种[ENT]。但是,我们显然不会让婴幼儿冒着生病的危险去主动感染某个病毒,而对于 3 岁以下婴幼儿,目前各国尚没有[ENT]新冠疫苗[ENT]获批使用。', 'relation_id': [1, 2, 3], 'replced_entity_id': [1, 2, 3]}。

请问有没有解决办法?提前感谢!

还是说训练集就是这种重复的一句话。

LemonWade commented 1 year ago

这是我执行下载的指令,请问是哪里出了问题吗?提前感谢 wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/language_modeling/dkplm/train_corpus.txt wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/language_modeling/dkplm/dev_corpus.txt wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/language_modeling/dkplm/entity_emb.txt wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/language_modeling/dkplm/rel_emb.txt

chywang commented 1 year ago

@ztl-35