XuezheMax / LasagneNLP

NLP tools on Lasagne
Apache License 2.0
61 stars 31 forks source link

Data directory needed #4

Closed umas closed 8 years ago

umas commented 8 years ago

Hi, Thanks a lot for making this code available. Can you please also add the data folder? I am getting errors such as IOError: [Errno 2] No such file or directory: 'data/POS-penn/wsj/split1/wsj1.train.original' Cheers, Uma

XuezheMax commented 8 years ago

Hi Uma,

Sorry, because of the licence of Penn Treebank, I cannot publish the data folder. But you can get the data from the WSJ section of Penn Treebank. The split of training/validation/test data sets is clarified in my paper.

The format of the data follows the CoNLL format: 1 Mr. NN NNP 2 NMOD 2 Vinken NN NNP 3 SUB 3 is VB VBZ 0 ROOT 4 chairman NN NN 3 PRD 5 of IN IN 4 NMOD 6 Elsevier NN NNP 7 NMOD 7 N.V. NN NNP 12 NMOD 8 , , , 12 P 9 the DT DT 12 NMOD 10 Dutch NN NNP 12 NMOD 11 publishing VB VBG 12 NMOD 12 group NN NN 5 PMOD 13 . . . 3 P

The first column is the position index and the second one is the lexicon. The coarse and fine-grained POS tags are at the 4th and 5th columns. Columns from 6th to 8th are for dependency parsing. Since we only do POS tagging, we only need to provide the first, second and 5th columns. For other columns, we can just write a underline symbol "_".

Hopefully, the above information is helpful. If you have any other questions, please feel free to ask me. Thanks.

On Fri, Oct 7, 2016 at 11:06 PM, umas notifications@github.com wrote:

Hi, Thanks a lot for making this code available. Can you please also add the data folder? I am getting errors such as IOError: [Errno 2] No such file or directory: 'data/POS-penn/wsj/split1/ wsj1.train.original' Cheers, Uma

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/XuezheMax/LasagneNLP/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/ADUtljX4iNnwW0wSC3QM5exqrsH98lnnks5qxwjNgaJpZM4KRmjr .


Best regards, Ma,Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977

umas commented 8 years ago

Thank you for the quick response. I am able to run the code now.