dkpro / dkpro-tc

UIMA-based text classification framework built on top of DKPro Core and DKPro Lab.
https://dkpro.github.io/dkpro-tc/
Other
34 stars 19 forks source link

Preprocessing demo throws NPE #379

Closed Horsmann closed 8 years ago

Horsmann commented 8 years ago

The demo org.dkpro.tc.examples.single.document.WekaTwentyNewsgroupsPreprocessing is supposed to show how to use a varying preprocessing. Demo fails with a NPE at the moment.

The receiving lab-variable is null. I am not sure when this broke.

Horsmann commented 8 years ago

@reckart I think I have Lab issue in this demo here. It tries to show how to use various preprocessing settings and does this:

A dimension to iterate the preprocessing components is provided Dimension<String> dimSegmenter = Dimension.create("segmenter", "break", "opennlp");

but when the if-statement is reached the receiving variable is (still?) null. How would that have too look like to work? I am not sure when this demo broke probably quite some time ago. Any ideas?

ExperimentCrossValidation batch = new ExperimentCrossValidation(
                "TwentyNewsgroupsCV-preprocessing", WekaClassificationAdapter.class, NUM_FOLDS)
        {
            @Discriminator
            String segmenter;

            @Override
            public AnalysisEngineDescription getPreprocessing()
            {

                try {
                    if (segmenter.equals("break")) {
                        return createEngineDescription(BreakIteratorSegmenter.class);
                    }
                    else if (segmenter.equals("opennlp")) {
                        return createEngineDescription(OpenNlpSegmenter.class);
                    }
                    else {
                        throw new RuntimeException("unexpected discriminator value: " + segmenter);
                    }
                }
                catch (ResourceInitializationException e) {
                    throw new RuntimeException(e);
                }

            }
        };
reckart commented 8 years ago

Could it be that there is no dimension called "segmenter" anymore?

reckart commented 8 years ago

Ah, no you say it is there.

Horsmann commented 8 years ago

It is this example: https://github.com/dkpro/dkpro-tc/blob/master/dkpro-tc-examples/src/main/java/org/dkpro/tc/examples/single/document/WekaTwentyNewsgroupsPreprocessing.java

Horsmann commented 8 years ago

@reckart do you have a suggestion how to find out what could be wrong with this demo case? Any pointer where I could start to look?

reckart commented 8 years ago

Check where LifeCycleManager.configure(TaskContext, Task, Map<String, Object>) is called and why it is not called before your batch.

Horsmann commented 8 years ago

@reckart I just had a look and the getPreprocessing() is called already during initialize() of the DefaultLifeCycleManager long before the configure method is called. I don't see why this has ever worked. Has there been many changes regarding the execution order?

reckart commented 8 years ago

There was a change to the lifecycle in 0.12.0: https://github.com/dkpro/dkpro-lab/issues/85

Horsmann commented 8 years ago

Hm, ok. Sorry for my keep asking, but how open is Lab to configurations from outside i.e. before a task runs? I basically need to have configure called by the time the CrossValidationTask runs, right? Is there a way to do that.

By the way I don't think this is a particular sever problem. I would be fine with simply deleting the test case and giving up on this feature which seemingly has been broken since 0.7.0 with nobody noticing it. (@daxenberger any objections :o?)

reckart commented 8 years ago

What do you mean "open to configurations from the outside"? What do you need to configure?

Horsmann commented 8 years ago

For this to work:

ExperimentCrossValidation batch = new ExperimentCrossValidation(
                "TwentyNewsgroupsCV-preprocessing", WekaClassificationAdapter.class, NUM_FOLDS)
        {
            @Discriminator
            String segmenter;

I need to have configure called before the preprocessing is executed. How things are at the moment this is not happening anymore which seems to be the problem. I somehow needs to have configure called. This is what I meant with configuring from the outside e.g. hack-in this information to make it available when needed during the initialization.

reckart commented 8 years ago

The order in which the lifecycle events are triggered is defined in the BatchTaskEngine. If you change that, other things may stop working.

The order is:

If I remember correctly these events are only called for subtasks of a batch task. These are the only ones that live in a parameter space. The root task cannot have discriminators and cannot have a parameter space affecting itself.

Horsmann commented 8 years ago

I think I will delete this test case then. This is something I don't want to touch at least not for a functionality like this one. Thx for the info.