GateNLP / gateplugin-LearningFramework

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
https://gatenlp.github.io/gateplugin-LearningFramework/
GNU Lesser General Public License v2.1
26 stars 6 forks source link

NPE in LF_ApplyChunking #118

Closed johann-petrak closed 4 years ago

johann-petrak commented 4 years ago

Reported on the mailing list: https://groups.io/g/gate-users/message/636

Exception:

at gate.plugin.learningframework.LF_ApplyChunking.process(LF_ApplyChunking.java:138)
at gate.plugin.learningframework.AbstractDocumentProcessor.execute(AbstractDocumentProcessor.java:259)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172)
at gate.creole.SerialController.executeImpl(SerialController.java:157)
at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:288)
at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)
at gate.composite.impl.SegmentProcessingPR.execute(SegmentProcessingPR.java:184)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172)
at gate.creole.SerialController.executeImpl(SerialController.java:157)
at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225)
at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.gui.SerialControllerEditor$RunAction$1.run(SerialControllerEditor.java:1777)
at java.base/java.lang.Thread.run(Thread.java:835)
Jacky-Miu commented 4 years ago

I now believe that the issue is one between the Segment Processing PR and the LF_ApplyChunking PR. It is no longer a saving application state issue, because I have not been able to run my Main Application even if it is freshly created, so long as there is a Segment Processing PR in which a LF_ApplyChunking PR is contained. Having said that, sometimes I can still run the same Main Application without any error, which is quite strange! Following screenshot is the PR of the Main Application: PR The reason for adding the Segment PR is because it takes too long for LF_ApplyChunking to run on the entire document, so I am using the Segment PR to apply the LF_ApplyChunking only on the Liquidated-Damages-Related section.
Following is the screenshot of the PR of the Segment Processing, which also shows the parameters of the LF_ApplyChunking: LF_ApplyChunking-entries

Attached is the .xgapp file for this Application. I think you will get an error running this Application: Sample.xgapp.zip

However, if I don't use the Segment Processing PR in my Main Application, like the Application below, the LF_ApplyChunking can run without any error: PR-no-segment-processing If needed, I can upload this error free .xgapp file, but it may be quite easy to just moving the PR's around to reproduce this.

Given the above, I think it is the issue between the Segment Processing PR and the Learning Framework PR. Thank you again for your help!

Jacky

ianroberts commented 4 years ago

The LearningFramework PRs depend critically on the ControllerAwarePR callbacks being fired in the right way at the right time, but this doesn’t happen the way the segment PR calls its child analyser (the segment logic predates the introduction of ControllerAwarePR and the alignment plugin hasn’t been updated).

The fix would need to be in the segment PR to make that controller aware and have it feed through the callbacks to its analyser if required - we should make an issue in the alignment plugin for this as I’m not sure what the exact logic should be.

johann-petrak commented 4 years ago

Thanks Ian, this was what I was thinking too, I will try to have a look at it!

The other thing here is that I will try to have a look at if/why large documents slow down the application PR more than they should. Ideally it should be possible to avoid the segment PR completely here.

ianroberts commented 4 years ago

The underlying issue with the controller callbacks in the segment PR has now been fixed in the latest snapshot of the Alignment plugin. If you upgrade to version 8.6.1-SNAPSHOT of that plugin then this issue should go away. @johann-petrak shall we split off the large document slowness thing into a separate issue?

Jacky-Miu commented 4 years ago

I have tested running Alignment 8.6.1-SNAPSHOT on my documents, and the result looks good. Thank you, Ian and Johann, for your great help and effort here! Really appreciate!!

Regards, Jacky