Support custom data type (de)serialization with EFactory

hallvard commented 2 years ago

The implementation uses EFactory (de)serialization for (built-in) types that are not known to be handled well in (emf)json, e.g. types implemented in the ecore package or involving java.lang classes.

Tests are implemented for a custom data type for LocalDate.

martin-fleck-at commented 2 years ago

@hallvard Thank you very much for your contribution! I'll have a look at this a little later today.

eneufeld commented 2 years ago

I added your test to the master branch and run it. The error pointed me to use com.fasterxml.jackson.datatype:jackson-datatype-jsr310 . Thus I added:

<dependency>
    <groupId>com.fasterxml.jackson.datatype</groupId>
    <artifactId>jackson-datatype-jsr310</artifactId>
    <version>2.11.3</version>
</dependency>

to the pom.xml . In the StandardFixture class I added: mapper.registerModule(new JavaTimeModule());

With this the test succeeds for me. I'm wondering do we need additional custom code or does this solve your problem?

eneufeld commented 2 years ago

I think it would be cleaner to add deserializer and serializer similar to how it is done in the module for the cases you need. If we find multiple cases missing we could create a new optional module.

hallvard commented 2 years ago

My contribution is an attempt to find a middle ground between emfjson's approach, and EMF approach to serialisation. The latter assumes all data value serialisation is handled by some EFactory, the basic cases are delegated to the default Ecore factory while any model may define its own datatypes with custom serialisation. You may argue that it doesn't really matter who does the serialization, as long as what you write is what you get back, but one thing you will miss is validation of values. A custom datatype based on String may e.g. limit the size and syntax, and one based on int can limit the range, and you will miss this if you don't use the EFactory. Hence, in my opinion, if the goal of emfjson is a json-based replacement of EMF serialization, it should delegate all serialisation of data values to the EFactory, to ensure compatible behavior.

An argument for emfjson's approach is when you want to consume/produce json from/to other sources, e.g. REST services. Then you (may argue that you) need to ensure that you are able to read the format. In my opinion, this should rather be done by defining custom EMF model with datatypes and serialisation suitable for that format. Emfjson could help by defining basic data types that can read/write the standard json data types, in the cases where Ecore's EFactory doesn't work.

The solution above where support for time types is provided by an existing library is a very special case that will not work in general. In my models, there are other types that will require using the EFactory. I don't consider it a solution to write a jackson-specific module with serialisers and deserialisers, I already have the necessary logic in my EFactory and want to use that.

As mentioned, my code tries to find a middle ground, a better approach would be to have some feature identifiers which allows you to select between various approaches with a "pure" EFactory approach being one option, a default approach which is similar to what I suggest above and maybe other relevant variations.

eneufeld commented 2 years ago

So after thinking about what you wrote, I see your point.

I agree that we should support the transformation of custom datatypes if it is supported by the EFactory without the need to rely on external modules.

Could you look at the review comments @martin-fleck-at wrote?

martin-fleck-at commented 2 years ago

@hallvard Please excuse the delayed reply. First of all, thank you very much for taking the time to elaborate on your thoughts and the contribution, specifically. We had a deeper discussion about this and think the the extra annotation to mark which strategy should be used for datatype serialization/deserialization is a good middle ground to take here.

Our main concern was breaking the current behavior for adopters and also adding somewhat arbitrary checks for when to use the EFactory. However, using the annotation approach we can keep backwards compatibility while also extending the usage of the EFactory to a degree that feels more appropriate. As you mentioned, we should be able to rely on the convert/create EFactory contract baked into EMF and actually should use only the EFactory for serialization/deserialization. Adopters can then switch between the new and old strategy based on the new annotation.

What do you think? Would you have some time to implement that approach as part of this PR as well as have a look at the how to get the serializer/deserializer?

maho7791 commented 2 years ago

I think EMFJson should be able to support built in serializers and de-serializers out of the box. Forcing EMFJ to use just the EMF to string method should be activated by an annotation. Increasing a version in a right way, and noting this change in the release note can point users to the change. We e.g. want to serialize Double[] wrapped in a EDatatype. Serializing these array types is oob supported by Jackson, why not using this? Regards, Mark

martin-fleck-at commented 2 years ago

@maho7791 We totally agree that both options (serializing through Jackson and serializing through EMF Factory) should be supported properly and managed through an annotation. This is basically what we are missing in this PR for it to be merged.

koegel commented 2 years ago

This PR seems to have stalled. I will now close it, feel free to reopen if you want to address the remaining issues as indicated by @martin-fleck-at

eclipse-emfcloud / emfjson-jackson

Support custom data type (de)serialization with EFactory #26