ShuheWang1998 / GPT-NER

208 stars 19 forks source link

Can not reproduce on your code. #2

Open sangmandu opened 1 year ago

sangmandu commented 1 year ago

First thank you for good effort. But there are some blocking on reproducing your experiments. There is no data "ontonotes5_mrc". And it's hard to locate path on each code. It will be good if setting data and path easily. Also if file path tree exists, can grasp easily too.

ShuheWang1998 commented 1 year ago

Sorry, compared to the English Conll2003 dataset which you can get from the hugging face community or some other places like:

https://huggingface.co/datasets/conll2003

, for the English ontonotes5 dataset, we are not sure if it can be released in the community. For your question, we use the same shell file for both the English Conll2003 dataset and the English Ontonotes5, and you can first use the public Conll2003 dataset to test our script and then use your own authorized Ontonotes5 dataset to reproduce.

---- Replied Message ---- | From | Jeon @.> | | Date | 05/10/2023 19:35 | | To | @.> | | Cc | @.***> | | Subject | [ShuheWang1998/GPT-NER] Can not reproduce on your code. (Issue #2) |

First thank you for good effort. But there are some blocking on reproducing your experiments. There is no data "ontonotes5_mrc". And it's hard to locate path on each code. It will be good if setting data and path easily. Also if file path tree exists, can grasp easily too.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

vrunm commented 1 year ago

Where is this file conll_en/test.8.embedding.knn.jsonl?

Cansal7159 commented 8 months ago

Hi,the paper is wonderful.

But I see the code : test_mrc_data = read_mrc_data(dir_="/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc", prefix="test.100") train_mrc_data = read_mrc_data(dir_="/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc", prefix="dev") index_, value_ = compute_simcse_knn(test_mrc_data=test_mrc_data, train_mrc_data=train_mrc_data, knn_num=32) write_file(dir_="/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc/test.100.simcse.dev.32.knn.jsonl", data=index_)

It seems not give the way how to get the test、train data and the test.100.simcse.dev.32.knn.jsonl

It will be well if you could add it to the Github. Wish your reply.

Cansal7159 commented 8 months ago

Hi,the paper is wonderful.

But I see the code : test_mrc_data = read_mrc_data(dir_="/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc", prefix="test.100") train_mrc_data = read_mrc_data(dir_="/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc", prefix="dev") index_, value_ = compute_simcse_knn(test_mrc_data=test_mrc_data, train_mrc_data=train_mrc_data, knn_num=32) write_file(dir_="/data2/wangshuhe/gpt3_ner/gpt3-data/ontonotes5_mrc/test.100.simcse.dev.32.knn.jsonl", data=index_)

It seems not give the way how to get the test、train data and the test.100.simcse.dev.32.knn.jsonl

It will be well if you could add it to the Github. Wish your reply.

Ohhhh, there's my fault .Actually, the data could be found in the other project .

You can find :

(https://github.com/ShannonAI/mrc-for-flat-nested-ner)

And, finally you should read MRC-NER: Prepare Datasets