andreaskuster / ner-and-pos-when-nothing-is-capitalized

ner and pos when nothing is capitalized - paper reproduction
GNU General Public License v3.0
2 stars 0 forks source link

Import twitter dataset #8

Closed andreaskuster closed 4 years ago

andreaskuster commented 4 years ago

Include splits and label extraction (i.e. for pos) according to the paper. Include cased/uncased/truecased data export.

Fetch twitter corpus from here: https://github.com/GateNLP/broad_twitter_corpus

balbok0 commented 4 years ago

Twitter dataset loaded as numpy arrays. However, when we figure out what vectors are assigned to which labels for NER in CoNLL, then we should create a torch Dataset, or whatever is needed for tensorflow/keras.

andreaskuster commented 4 years ago

The implementation has been tested and works fine for my purpose.