Closed yanzhangnlp closed 2 years ago
I have the same problem on ACE 05 NER dataset.
I download the ACE 05 NER dataset from the link provided in datasets.py and renamed it to {split}.ner.json, but it does not work :(
@iambabao
I have the same problem on ACE 05 NER dataset.
I download the ACE 05 NER dataset from the link provided in datasets.py and renamed it to {split}.ner.json, but it does not work :(
Yes, but I believe modifying it by simply adding:
if 'label' not in x:
x['label'] = {
x['entity_label']:x['span_position'],
}
could work.
However, @giove91 , please add more links to all the datasets used in tanl if available. Most of the datasets reported in paper and defined in dataset.py
are currently not provided with acquisition method, preprocessing scripts, or instructions. I would really appreciate it if you could complete the datasets.
Hi, thanks for your interest in this project!
@yanzhangnlp We added the instructions to process the Multiwoz dataset (thanks @jasonkrone). Hope this helps!
@iambabao Apparently the version I downloaded from that link is not available anymore (it is different from the version that can be currently downloaded). Thanks @Magolor for providing a possible fix. I'll check and update the instructions.
Hi,
The data files provided for the ACE2005 dataset are of .test, .train, and .dev file types. @iambabao how did you obtain .json files?
Here is where I am attempting to obtain the ACE2005 data: https://github.com/ShannonAI/mrc-for-flat-nested-ner/blob/master/ner2mrc/download.md https://drive.google.com/file/d/1iodaJ92dTAjUWnkMyYm8aLEi5hj3cseY/view
Thanks,
Hi,
The data files provided for the ACE2005 dataset are of .test, .train, and .dev file types. @iambabao how did you obtain .json files?
Here is where I am attempting to obtain the ACE2005 data: https://github.com/ShannonAI/mrc-for-flat-nested-ner/blob/master/ner2mrc/download.md https://drive.google.com/file/d/1iodaJ92dTAjUWnkMyYm8aLEi5hj3cseY/view
Thanks,
The files are in JSON format, you can directly rename them.
@iambabao
I have the same problem on ACE 05 NER dataset. I download the ACE 05 NER dataset from the link provided in datasets.py and renamed it to {split}.ner.json, but it does not work :(
Yes, but I believe modifying it by simply adding:
if 'label' not in x: x['label'] = { x['entity_label']:x['span_position'], }
could work.
However, @giove91 , please add more links to all the datasets used in tanl if available. Most of the datasets reported in paper and defined in
dataset.py
are currently not provided with acquisition method, preprocessing scripts, or instructions. I would really appreciate it if you could complete the datasets.
hey guys, after preprocess ace2005 ner dataset following guidence here, and run tanl , i get F1 = 88.3 (tanl paper is 84.9). Is there a bug or else?
Interesting! Are the splits correct and have you used the same hyperparameters as in the paper? (50 epochs, initial learning rate 0.0005, ...)
Hi Giovanni,
Nice work and thanks for the sharing. I am reproducing the results of the DST task. However, I found the processed data format of multiwoz 2.1 dataset using the script from https://github.com/jasonwu0731/trade-dst does not match your code. May I ask if you do additional preprocessing procedure? If so, would you mind sharing the script?
Sincerely, Yan