dkpro / dkpro-argumentation

DKPro Argumentation is a general framework for argumentation mining
Apache License 2.0
4 stars 3 forks source link

Attribute map used for "SpanTextLabel" is too inflexible #21

Closed errantlinguist closed 7 years ago

errantlinguist commented 8 years ago

Currently, the map of attributes of SpanTextLabel (which is filled from various UIMA-supplied information at runtime) is too inflexible when it allows only keys of type de.tudarmstadt.ukp.dkpro.argumentation.io.annotations.Attribute; This should be changed to e.g. String so that arbitrary attributes can be added after the annotations have been converted from UIMA format (e.g. further downstream, such as in applications which use dkpro-argumentation as a library).

reckart commented 8 years ago

Can you remind me again: why is it not sufficient to work with the UIMA types?

errantlinguist commented 8 years ago

Because changing everything now to directly use UIMA types would probably take about 2-3 weeks?

reckart commented 8 years ago

Change where? In DKPro Argumentation? Why was a decision taken to depart from using the UIMA types anyway?

errantlinguist commented 8 years ago

I assumed that I didn't need all the huge amount of baggage from UIMA and so I decided a small wrapper class was appropriate. My main mistake was including the code directly in dkpro-argumentation because it is extremely difficult to make changes to this project.

reckart commented 8 years ago

Then how about factoring out your changes again from this project and keeping them in whatever downstream application you are developing?

errantlinguist commented 8 years ago

That would also take a number of weeks.

reckart commented 8 years ago

That sounds like you have a serious problem with your architecture that should be resolved as soon as possible before the effort of undoing it starts taking months or years...

errantlinguist commented 8 years ago

I am unsure how the effort of merging one commit (one change to one file in this case) would be greater than me re-arranging a project which is a proof of concept anyway.

As I said, my biggest mistake was incorporating my own JSON deserialization logic (which is necessary for indexing with Elasticsearch) directly into this project instead of creating my own. I can either fix this issue now (and delay the other project which depends on this) or I can fix it later, when there is more time for refactoring.

reckart commented 8 years ago

Projects should be focussed. I see no use and no reason of maintaining serialization logic for a downstream application in the codebase of DKPro Argumentation. Please remove it. Hopefully you can solve the problem "simply" by moving the respective classes out of DKPro Argumentation and into your code (wherever/whatever it is). Contributions to the project should improve the project or fix bugs - they should not mutate it to become a part of your application.

reckart commented 8 years ago

@habernal please comment if you see this differently.

errantlinguist commented 8 years ago

If adding logic for reading generic UIMA into a JSON structure specifically for argumentation is not part of the focus of dkpro-argumentation, why is there the package de.tudarmstadt.ukp.dkpro.argumentation.io in the first place?-- the only other thing the package can do is print out some debugging information to the console via de.tudarmstadt.ukp.dkpro.argumentation.io.writer.ArgumentDumpWriter.

reckart commented 8 years ago

Serialization (i.e. reading/writing) argumentation-related data to/from CAS is certainly good thing to have. But the change mentioned in this issue does not seem to be related to supporting specific input/output formats but rather "so that arbitrary attributes can be added after the annotations have been converted from UIMA format (e.g. further downstream, such as in applications which use dkpro-argumentation as a library)." - which seems to be like bleeding concerns from your downstream application into the framework. Maybe I am misunderstanding something - happy to be corrected.

errantlinguist commented 8 years ago

I agree that the "cleanest" separation would be to (still) have an ImmutableSpanTextLabel class which represents nothing that is not in the original UIMA files, (e.g. Map<Attribute,Object> getAttributes()) and then e.g. to have the downstream app use something like a generic factory class like e.g. SpanTextLabelFactory<T extends SpanTextLabel> and then pass its own implementation of SpanTextLabel to this factory used to create object from the UIMA data (e.g. MutableSpanTextLabel, which has Map<String,Object> getAttributes()), but this would also require some non-insignificant refactoring, and it's very likely that some other logic either inside dkpro-argumentation or outside of it could also make use of a MutableSpanTextLabel class anyway.

Granted, debating these topics here don't actually save much time over actually doing the refactoring..

reckart commented 8 years ago

I still have the feeling that this is going into an odd direction, but I would have to look at the code in more detail. I somehow doubt that a factory pattern is required here to allow an outside application to tell DKPro Argumentation how to create its objects. Sounds a bit like over-engineering...

habernal commented 7 years ago

Closed by #23