Closed ramonziai closed 4 years ago
Pull request #107 implements the option to pass an already existing type system to the deserializer. @jcklie rightly objected that the conceptually more correct way of achieving this is to add a merge() function for type systems. However, it seems to me that this involves rather more code than the change I submitted, but I might be wrong. At the end of the day, it's not that important to me how this is implemented, so I leave it to the maintainers to decide :-)
I would agree with @jcklie - instead of chaining one typesystem into the next via the deserializer, it would be better to have a method which takes multiple type system descriptions and merges them.
Ok, I can see that this is the favored approach, and I understand why. I'm interested in getting this functionality in there soon, so I'd be willing to put in the work. I'm assuming merge() would basically traverse the types and features of one type system, create clones with identical values and add them to the other type system. Is that correct or is there a better way?
I think that is the way. One also needs to make sure that redefines are identical and that inheritance stays correct. I wanted to have a look at it today and over the weekend if that is fast enough for you.
Yes it is, thanks a lot :-)
UIMA has the concept of type merging.
So your input are n type systems which are all not modified during the merge process.
The output is a new type system.
The merging process needs to enter into every individual type. If a type is defined in two source file systems, then the features of all of the these types are joined together in the target type system.
If a feature is defined in both and it is not equal in both (e.g. an integer in one and a float in the other), then an error is generated.
Likewise, if the inheritance of the types differs across type systems.
Official documentation on type merging is here: https://uima.apache.org/d/uimaj-2.10.4/references.html#ugr.ref.cas.typemerging
The relevant method in UIMA is : org.apache.uima.util.CasCreationUtils.mergeTypeSystems(Collection<? extends TypeSystemDescription>)
code
@ramonziai Do you have an example type system that should be merged with DKPro?
@jcklie Here's a (part of the) type system I use, with some references to DKPro types in it: https://unitc-my.sharepoint.com/:u:/g/personal/nnszi01_cloud_uni-tuebingen_de/EZe4a17Bs-xDpsEyASsOOp4Ba72FKDw8F8g94WNDGRxI6w?e=oAdQAa
@ramonziai I implemented the merging logic from uimaj. I will try to release a new version this or next week. You can just use the master via pip using python -m pip install git+https://github.com/dkpro/dkpro-cassis . Please close this issue if it works for you.
@jcklie Thanks a lot, merging seems to work just fine. Closing issue.
Is your feature request related to a problem? Please describe.
I was looking for a way to import my own typesystem which builds on top of DKPro types. However, currently there is only the possibility to deserialize one XML file into one TypeSystem object, and references to other type systems in the XML files are ignored.
Describe the solution you'd like
I'd like a way to combine my type system and others (e.g. the DKPro type system).
Describe alternatives you've considered
The long way around would be to load both type systems into separate objects, then traverse all types of one of them and add them to the other. This seems clumsy and error-prone.
Additional context