Closed errantlinguist closed 7 years ago
Can you remind me again: why is it not sufficient to work with the UIMA types?
Because changing everything now to directly use UIMA types would probably take about 2-3 weeks?
Change where? In DKPro Argumentation? Why was a decision taken to depart from using the UIMA types anyway?
I assumed that I didn't need all the huge amount of baggage from UIMA and so I decided a small wrapper class was appropriate. My main mistake was including the code directly in dkpro-argumentation because it is extremely difficult to make changes to this project.
Then how about factoring out your changes again from this project and keeping them in whatever downstream application you are developing?
That would also take a number of weeks.
That sounds like you have a serious problem with your architecture that should be resolved as soon as possible before the effort of undoing it starts taking months or years...
I am unsure how the effort of merging one commit (one change to one file in this case) would be greater than me re-arranging a project which is a proof of concept anyway.
As I said, my biggest mistake was incorporating my own JSON deserialization logic (which is necessary for indexing with Elasticsearch) directly into this project instead of creating my own. I can either fix this issue now (and delay the other project which depends on this) or I can fix it later, when there is more time for refactoring.
Projects should be focussed. I see no use and no reason of maintaining serialization logic for a downstream application in the codebase of DKPro Argumentation. Please remove it. Hopefully you can solve the problem "simply" by moving the respective classes out of DKPro Argumentation and into your code (wherever/whatever it is). Contributions to the project should improve the project or fix bugs - they should not mutate it to become a part of your application.
@habernal please comment if you see this differently.
If adding logic for reading generic UIMA into a JSON structure specifically for argumentation is not part of the focus of dkpro-argumentation, why is there the package de.tudarmstadt.ukp.dkpro.argumentation.io
in the first place?-- the only other thing the package can do is print out some debugging information to the console via de.tudarmstadt.ukp.dkpro.argumentation.io.writer.ArgumentDumpWriter
.
Serialization (i.e. reading/writing) argumentation-related data to/from CAS is certainly good thing to have. But the change mentioned in this issue does not seem to be related to supporting specific input/output formats but rather "so that arbitrary attributes can be added after the annotations have been converted from UIMA format (e.g. further downstream, such as in applications which use dkpro-argumentation as a library)." - which seems to be like bleeding concerns from your downstream application into the framework. Maybe I am misunderstanding something - happy to be corrected.
I agree that the "cleanest" separation would be to (still) have an ImmutableSpanTextLabel
class which represents nothing that is not in the original UIMA files, (e.g. Map<Attribute,Object> getAttributes()
) and then e.g. to have the downstream app use something like a generic factory class like e.g. SpanTextLabelFactory<T extends SpanTextLabel>
and then pass its own implementation of SpanTextLabel
to this factory used to create object from the UIMA data (e.g. MutableSpanTextLabel
, which has Map<String,Object> getAttributes()
), but this would also require some non-insignificant refactoring, and it's very likely that some other logic either inside dkpro-argumentation or outside of it could also make use of a MutableSpanTextLabel
class anyway.
Granted, debating these topics here don't actually save much time over actually doing the refactoring..
I still have the feeling that this is going into an odd direction, but I would have to look at the code in more detail. I somehow doubt that a factory pattern is required here to allow an outside application to tell DKPro Argumentation how to create its objects. Sounds a bit like over-engineering...
Closed by #23
Currently, the map of attributes of
SpanTextLabel
(which is filled from various UIMA-supplied information at runtime) is too inflexible when it allows only keys of typede.tudarmstadt.ukp.dkpro.argumentation.io.annotations.Attribute
; This should be changed to e.g.String
so that arbitrary attributes can be added after the annotations have been converted from UIMA format (e.g. further downstream, such as in applications which use dkpro-argumentation as a library).