erickrf / nlpnet

A neural network architecture for NLP tasks, using cython for fast performance. Currently, it can perform POS tagging, SRL and dependency parsing.
http://nilc.icmc.usp.br/nlpnet/
MIT License
407 stars 104 forks source link

"This version of Mac-Morpho is contained in a single file for training and another for testing..." where could I find? #13

Closed owlmsj closed 9 years ago

owlmsj commented 9 years ago

Hi!

read in the docs that:

 "This version of Mac-Morpho is contained in a single file for training and another for testing, unlike the original distribution. Many errors were corrected and many sentences were discarded, providing a more reliable resource. Also unlike the original, each line contains a sentence, and POS tags are appended to tokens after a "_".

So, I see those files are no longer in the repo, where can I find them?

Thanks and great job in nlpnet!

PS: http://www.nilc.icmc.usp.br/macmorpho/ is offline

erickrf commented 9 years ago

Thank you!

The corpus should be available in the URL you mentioned. There seems to be some problem with the server, though... I'll see if I can get it back online tomorrow.

owlmsj commented 9 years ago

Great! Waiting for it. Thanks again!

erickrf commented 9 years ago

The site is up now.

tgalery commented 9 years ago

Probably not the best place to ask this, but is there a big reason for the annotations to not use the peen treebank tags ?

erickrf commented 9 years ago

Penn Treebank tags were specifically designed for English; it wouldn't make sense to use them in another language. Mac-morpho uses a tagset tailored for Portuguese.

But to make it clear: nlpnet is tagset-agnostic.