ShannonAI / mrc-for-flat-nested-ner

Code for ACL 2020 paper `A Unified MRC Framework for Named Entity Recognition`
643 stars 117 forks source link

大家是如何准备自己的数据集的? #126

Open YowFung opened 4 months ago

YowFung commented 4 months ago

各位大佬,我还是个新手,请教一下大家都是怎么准备自己的数据集的?

我现在都不知道怎么让程序跑起来,根据 README.md 的指示下载一些文件(如下图),但是不知道怎么存放、怎么重命名。

image

看了代码中好多用了绝对路径的地方,应该都是要改成自己的路径吧,具体是怎么改呢,对应的文件上哪找呢?

/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc
/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc/test.100.simcse.dev.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/mrc-ner.test.100
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/test.100.simcse.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/test.random.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/low_resource
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/low_resource/test.10000.simcse.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.entity.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.entity.rectify.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.knn.sequence.fullprompt.verified
/nfs1/shuhe/gpt3-ner/features/conll03
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/test.100.verify.knn.jsonl
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/test.verify.knn.jsonl
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/mrc-ner.train.dev
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-3/
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-3/openai.17.knn.train.dev.sequence.fullprompt
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-full/openai.15.knn.train.dev.sequence.fullprompt
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_bert
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll/results.tmp
/nfs1/shuhe/gpt3-ner/origin_data/conll03_mrc
/nfs1/shuhe/gpt3-nmt/sup-simcse-roberta-large
/nfs1/shuhe/gpt3-nmt/data/en-fr/dev.en
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding/test.100.full.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding_sorted
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding_sorted/test.full.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/low_resource
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/low_resource/low_resource_1_knn/test.simcse.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/ontonotes5_mrc/
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/test.embedding.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/start_word_embedding
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/start_word_embedding/test.mrc.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/zh_msra
/nfs/shuhe/gpt3-ner/gpt3-data/zh_msra/test.embedding.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/ace2004/
/nfs/shuhe/gpt3-ner/gpt3-data/ace2005/
/nfs/shuhe/gpt3-ner/gpt3-data/genia/
/nfs/shuhe/gpt3-ner/models/text2vec-base-chinese
/home/wangshuhe/gpt-ner/openai_access/low_resource_data/conll_en
/home/wangshuhe/gpt-ner/openai_access/low_resource_data/conll_en/test.8.embedding.knn.jsonl
25qx-qx commented 2 weeks ago

各位大佬,我还是个新手,请教一下大家都是怎么准备自己的数据集的?

我现在都不知道怎么让程序跑起来,根据 README.md 的指示下载一些文件(如下图),但是不知道怎么存放、怎么重命名。 image

看了代码中好多用了绝对路径的地方,应该都是要改成自己的路径吧,具体是怎么改呢,对应的文件上哪找呢?

/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc
/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc/test.100.simcse.dev.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/mrc-ner.test.100
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/test.100.simcse.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/test.random.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/low_resource
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/low_resource/test.10000.simcse.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.entity.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.entity.rectify.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.knn.sequence.fullprompt.verified
/nfs1/shuhe/gpt3-ner/features/conll03
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/test.100.verify.knn.jsonl
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/test.verify.knn.jsonl
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/mrc-ner.train.dev
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-3/
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-3/openai.17.knn.train.dev.sequence.fullprompt
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-full/openai.15.knn.train.dev.sequence.fullprompt
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_bert
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll/results.tmp
/nfs1/shuhe/gpt3-ner/origin_data/conll03_mrc
/nfs1/shuhe/gpt3-nmt/sup-simcse-roberta-large
/nfs1/shuhe/gpt3-nmt/data/en-fr/dev.en
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding/test.100.full.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding_sorted
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding_sorted/test.full.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/low_resource
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/low_resource/low_resource_1_knn/test.simcse.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/ontonotes5_mrc/
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/test.embedding.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/start_word_embedding
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/start_word_embedding/test.mrc.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/zh_msra
/nfs/shuhe/gpt3-ner/gpt3-data/zh_msra/test.embedding.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/ace2004/
/nfs/shuhe/gpt3-ner/gpt3-data/ace2005/
/nfs/shuhe/gpt3-ner/gpt3-data/genia/
/nfs/shuhe/gpt3-ner/models/text2vec-base-chinese
/home/wangshuhe/gpt-ner/openai_access/low_resource_data/conll_en
/home/wangshuhe/gpt-ner/openai_access/low_resource_data/conll_en/test.8.embedding.knn.jsonl

你好 最近在复现代码 新手 可以讨论一下吗

yangguoer commented 2 weeks ago

各位大佬,我还是个新手,请教一下大家都是怎么准备自己的数据集的? 我现在都不知道怎么让程序跑起来,根据README.md的指示下载一些文件(如下图),但是不知道怎么存放、怎么重命名。image 看了代码中好多用了绝对路径的地方,应该都是要改成自己的路径吧,具体是怎么改呢,对应的文件上哪找呢?

/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc
/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc/test.100.simcse.dev.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/mrc-ner.test.100
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/test.100.simcse.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/test.random.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/low_resource
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/low_resource/test.10000.simcse.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.entity.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.entity.rectify.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.knn.sequence.fullprompt.verified
/nfs1/shuhe/gpt3-ner/features/conll03
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/test.100.verify.knn.jsonl
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/test.verify.knn.jsonl
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/mrc-ner.train.dev
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-3/
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-3/openai.17.knn.train.dev.sequence.fullprompt
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-full/openai.15.knn.train.dev.sequence.fullprompt
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_bert
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll/results.tmp
/nfs1/shuhe/gpt3-ner/origin_data/conll03_mrc
/nfs1/shuhe/gpt3-nmt/sup-simcse-roberta-large
/nfs1/shuhe/gpt3-nmt/data/en-fr/dev.en
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding/test.100.full.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding_sorted
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding_sorted/test.full.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/low_resource
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/low_resource/low_resource_1_knn/test.simcse.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/ontonotes5_mrc/
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/test.embedding.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/start_word_embedding
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/start_word_embedding/test.mrc.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/zh_msra
/nfs/shuhe/gpt3-ner/gpt3-data/zh_msra/test.embedding.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/ace2004/
/nfs/shuhe/gpt3-ner/gpt3-data/ace2005/
/nfs/shuhe/gpt3-ner/gpt3-data/genia/
/nfs/shuhe/gpt3-ner/models/text2vec-base-chinese
/home/wangshuhe/gpt-ner/openai_access/low_resource_data/conll_en
/home/wangshuhe/gpt-ner/openai_access/low_resource_data/conll_en/test.8.embedding.knn.jsonl

你好 最近在复现代码 新手 可以讨论一下吗

同新手,可以讨论一下吗

25qx-qx commented 1 week ago

各位大佬,我还是个新手,请教一下大家都是怎么准备自己的数据集的? 我现在都不知道怎么让程序跑起来,根据README.md的指示下载一些文件(如下图),但是不知道怎么存放、怎么重命名。image 看了代码中好多用了绝对路径的地方,应该都是要改成自己的路径吧,具体是怎么改呢,对应的文件上哪找呢?

/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc
/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc/test.100.simcse.dev.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/mrc-ner.test.100
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/test.100.simcse.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/test.random.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/low_resource
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/low_resource/test.10000.simcse.32.knn.jsonl
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.entity.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.entity.rectify.knn.sequence.fullprompt
/data2/wangshuhe/gpt3_ner/gpt3-data/conll_mrc/100-results/openai.32.knn.sequence.fullprompt.verified
/nfs1/shuhe/gpt3-ner/features/conll03
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/test.100.verify.knn.jsonl
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/test.verify.knn.jsonl
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/mrc-ner.train.dev
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-3/
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-3/openai.17.knn.train.dev.sequence.fullprompt
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_2003/text-full/openai.15.knn.train.dev.sequence.fullprompt
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll_bert
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll
/nfs1/shuhe/gpt3-ner/gpt3-data/en_conll/results.tmp
/nfs1/shuhe/gpt3-ner/origin_data/conll03_mrc
/nfs1/shuhe/gpt3-nmt/sup-simcse-roberta-large
/nfs1/shuhe/gpt3-nmt/data/en-fr/dev.en
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding/test.100.full.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding_sorted
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/start_word_embedding_sorted/test.full.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/low_resource
/nfs/shuhe/gpt3-ner/gpt3-data/conll_mrc/low_resource/low_resource_1_knn/test.simcse.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/ontonotes5_mrc/
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/test.embedding.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/start_word_embedding
/nfs/shuhe/gpt3-ner/gpt3-data/zh_onto4/start_word_embedding/test.mrc.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/zh_msra
/nfs/shuhe/gpt3-ner/gpt3-data/zh_msra/test.embedding.knn.jsonl
/nfs/shuhe/gpt3-ner/gpt3-data/ace2004/
/nfs/shuhe/gpt3-ner/gpt3-data/ace2005/
/nfs/shuhe/gpt3-ner/gpt3-data/genia/
/nfs/shuhe/gpt3-ner/models/text2vec-base-chinese
/home/wangshuhe/gpt-ner/openai_access/low_resource_data/conll_en
/home/wangshuhe/gpt-ner/openai_access/low_resource_data/conll_en/test.8.embedding.knn.jsonl

你好 最近在复现代码 新手 可以讨论一下吗

同新手,可以讨论一下吗

当然可以!!!!!