GateNLP / gateplugin-LearningFramework

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
https://gatenlp.github.io/gateplugin-LearningFramework/
GNU Lesser General Public License v2.1
26 stars 6 forks source link

ArrayIndexOutOfBoundsException during application #112

Closed johann-petrak closed 5 years ago

johann-petrak commented 5 years ago

With the pytorch backend, we get the following error when applying a model to chunking:

Controller ended with error 1
java.lang.ArrayIndexOutOfBoundsException: 1
    at gate.plugin.learningframework.ModelApplication.addSurroundingAnnotations(ModelApplication.java:298)
    at gate.plugin.learningframework.LF_ApplyChunking.process(LF_ApplyChunking.java:156)
    at gate.plugin.learningframework.AbstractDocumentProcessor.execute(AbstractDocumentProcessor.java:259)
    at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
    at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172)
    at gate.creole.SerialController.executeImpl(SerialController.java:157)
    at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225)
    at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)
    at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
    at gate.gui.SerialControllerEditor$RunAction$1.run(SerialControllerEditor.java:1759)
    at java.lang.Thread.run(Thread.java:748)
johann-petrak commented 5 years ago

What seems to happen is that the target we get back is just an empty string. This in turns seems to be caused by the fact that some targets in the training set are just an empty string. So the real bug is that such targets get created for some reason.