Can you also provide your train.txt, dev.txt, test.txt file as well?

glample / tagger

Named Entity Recognition Tool

Apache License 2.0

1.16k stars 426 forks source link

Can you also provide your train.txt, dev.txt, test.txt file as well? #2

Closed leefionglee closed 7 years ago

leefionglee commented 8 years ago

Hi,

Good work!! Can you also provide these train.txt, dev.txt and test.txt for training your model used in your paper? Thanks a lot!!

zero76114 commented 8 years ago

@glample : Can you help us? Your work is completely new and interesting. I am really new with Python and Theano, so I don't know how I can rerun your work. Please help me more detail. Thanks

a455bcd9 commented 8 years ago

@leefionglee I think it's this data set: http://www.cnts.ua.ac.be/conll2003/ner/

The English data is a collection of news wire articles from the Reuters Corpus. The annotation has been done by people of the University of Antwerp. Because of copyright reasons we only make available the annotations. In order to build the complete data sets you will need access to the Reuters Corpus. It can be obtained for research purposes without any charge from NIST.

zero76114 commented 8 years ago

@a455bcd9 Thank you. I try to download and preprocessing data Conll2003 and after I get 3 file for English Reuter: eng.train, eng.testa and eng.testb. @glample can you show me how i can change train.txt , dev.test and test.txt;

prashant-puri commented 8 years ago

@glample Hey can you help me with creation of train.txt , dev.test and test.txt. @zero76114 Hey can you help me with creation of train.txt , dev.test and test.txt.

Zhangzirui commented 8 years ago

@zero76114 Hi, can you help me with creation of train.txt , dev.test and test.txt. @prashant-puri Hi, can you help me with creation of train.txt , dev.test and test.txt.

glample commented 7 years ago

The dataset is now available on the repo.

kewlcoder commented 7 years ago

@glample - Sir, if I am not wrong, the dataset you provided,namely, eng.train, eng.testa and eng.testb is free but copywrited by Reuters. I would like to suggest you to add this warning to that particular commit so that people take proper precautions before using it.