GateNLP / gateplugin-LearningFramework

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
https://gatenlp.github.io/gateplugin-LearningFramework/
GNU Lesser General Public License v2.1
26 stars 6 forks source link

BIO encoder seems to generate targets for overlaps where there are none #64

Closed johann-petrak closed 6 years ago

johann-petrak commented 6 years ago

This happens with "Organization" in the GATE training course, module-2-ml chunking hands on corpus, when exporting to ARFF.

johann-petrak commented 6 years ago

Turns out this happens because there are three "Organization" annotations within a single Token (instance annotation).

johann-petrak commented 6 years ago

This was actually fixed by a07c96a5b7a1e80f12e829b8cb509668f3fc0430

johann-petrak commented 6 years ago

This seems to be still buggy, it introduces an additional label which is the empty string, apparently. Reopening.