Closed andreaskuster closed 4 years ago
Twitter dataset loaded as numpy arrays. However, when we figure out what vectors are assigned to which labels for NER in CoNLL, then we should create a torch Dataset, or whatever is needed for tensorflow/keras.
The implementation has been tested and works fine for my purpose.
Include splits and label extraction (i.e. for pos) according to the paper. Include cased/uncased/truecased data export.
Fetch twitter corpus from here: https://github.com/GateNLP/broad_twitter_corpus