dkpro / dkpro-tc

UIMA-based text classification framework built on top of DKPro Core and DKPro Lab.
https://dkpro.github.io/dkpro-tc/
Other
34 stars 19 forks source link

Improve code quality in reports #371

Closed zesch closed 8 years ago

zesch commented 8 years ago

Our reports are a mess of duplicated code and almost impossible to understand (even for me who wrote parts of that stuff). Needs to be improved.

Horsmann commented 8 years ago

When running WekaTwentyNewsgroupsDemo the resulting evaluation_results.xls in the Evaluation folder seems to be broken now. Seems to be a general problem probably affecting all created Excels?

zesch commented 8 years ago

I can confirm that some (not all) .xls are broken, but could not find out yet, what is actually the problem.

Horsmann commented 8 years ago

The broken .xls (which do only break occasionally not always) is probably related to the unsolved status of #353 when the entire toString of the CollectionReaderDescription is written to the Excel it can break something as it seems.

Horsmann commented 8 years ago

The broken xmls are fixed now that we have a proper CollectionReaderDescription toString().

@zesch @reckart Any suggestions how to get rid of the ugly context checks? if (subcontext.getType().contains("TestTask")) {} I think almost every report we use in TC has such a line.

reckart commented 8 years ago

@Horsmann what kind of check would look less ugly to you?

Horsmann commented 8 years ago

Something like a boolean check i.e. subcontext.isMachineLearningAdapter()

the naming is probably too TC specific but some flag that we would set when launching an experiment and would allow to identify *TestTask* would be what I want to use here.

daxenberger commented 8 years ago

I'd suggest to introduce a generic flag that is stored with a task context in DKPro Lab. DKPro TC could make use of this flag by setting it to sth like "Constants.isMlTask" when instantiating the task.

Horsmann commented 8 years ago

The tasks we have a rather fixed I would say. It would quite convenient if you could retrieve an ENUM or something. Then you could do rather simple and easy to read checks such as: if(subcontext.getNewFlag() == TcTasks.MachineLearningAdapter if(subcontext.getNewFlag() == TcTasks.Meta if(subcontext.getNewFlag() == TcTasks.ExtractFeatureTrain if(subcontext.getNewFlag() == TcTasks.ExtractFeatureTest

This would make this probably a lot easier for navigating through the subcontexts. @reckart Is there some feature in Lab that would allow giving a task a nickname or something which we could use for that?

reckart commented 8 years ago

The "nickname" of a task is it's type (constructor: TaskBase(type), methods: Task.setType(type), Task.getType()).

The way to maintain auxiliary non-discriminator information with tasks is through "attributes" (methods Task.setAttribute(name, value), Task.getAttribute(name), Task.getAttributes()).

Horsmann commented 8 years ago

Sounds close to what I am looking for. I just don't see how I can access this information from within a report. The information is located in TaskBase but all I have is a TaskContext which does not gives me access to the attributes?

reckart commented 8 years ago

The type is in the TaskContextMetadata which is accessible from the TaskContext.

The attributes can be retrieved in the same way as the discriminators. There is getProperties() in ReportBase which should return the map of attributes. At some point I had to rename get/setProperty to get/setAttribute, but apparently I forgot to update the ReportBase.

Horsmann commented 8 years ago

When I want to set an attribute it is kind of fatal to reinitialize the properties map during the analysis step.

@Override
    public final void analyze()
    {
        properties = new HashMap<String, String>();
        discriminators = new HashMap<String, String>();
        analyze(getClass(), Property.class, properties);
        analyze(getClass(), Discriminator.class, discriminators);
    }

I thought of setting an attribute/property before the experiments run i.e.

 // collecting meta features only on the training data (numFolds times)
        metaTask = new MetaInfoTask();
        metaTask.setOperativeViews(operativeViews);
        metaTask.setType(metaTask.getType() + "-" + experimentName);
        metaTask.setAttribute("KEY", "VAL");

but then the analyse step clears the properties. Might be a bug - do you remember intention behind reinitialization ?

reckart commented 8 years ago

The reinitialization is performed to return the Task to a defined state when it is reused - i.e. to avoid "leaking" information between different runs of a task.

reckart commented 8 years ago

See: https://github.com/dkpro/dkpro-lab/issues/97

Horsmann commented 8 years ago

@reckart Is something like the loadAttributes() by context id something for adding into Lab or is this too specific already https://github.com/dkpro/dkpro-tc/commit/75d0c5bed2ec665748adc6ff350ac4864f29bfff?

Horsmann commented 8 years ago

@daxenberger Do we still need ExperimentPrediction this doesn't seemed to be used anywhere. Is this a left over?

daxenberger commented 8 years ago

Do we still need ExperimentPrediction this doesn't seemed to be used anywhere. Is this a left over?

Not needed anymore; superseded by Save/LoadModelTasks.

reckart commented 8 years ago

I believe adding a getDiscriminators() and getAttributes() makes sense. How about adding it in the TaskContext?