eclipse-emfcloud / emfjson-jackson

emfjson-jackson
Other
16 stars 15 forks source link

'mixed' FeatureMapEntries are not serialized to JSON #8

Closed reuteran closed 2 months ago

reuteran commented 3 years ago

Hi,

it seems that org.emfjson.jackson.databind.ser.EObjectSerializer only serializes FeatureMap entries if they have a 'group' key in their EAnnotation details map. Entries of 'mixed' FeatureMaps are skipped, as they don't have a 'group' entry in their annotation details. Such entries are not an edge case, for example if you have an XSD File with multiple top-level complex types, the resulting Ecore will have a 'DocumentRoot' EClass with a mixed feature map containing the top level types/comments/whitespace etc.

Here is the function responsible + the callstack: image

To reproduce this, I've attached an archive with a dummy model. It includes an Ecore model, the corresponding genmodel and a model instance. The model instance is serialized to Json like this:

{ "type": "success", "data": { "eClass": "http://tempuri.org/PurchaseOrderSchema.xsd#//DocumentRoot" } }

, but it should be like this:

{ "type": "success", "data": { "eClass": "http://tempuri.org/PurchaseOrderSchema.xsd#//DocumentRoot", "PurchaseOrder": { "ShipTo": [ { "name": "ff", "street": "dsdsd", "city": "ff", "state": "ff", "zip": 111 } ], "BillTo": { "name": "dasd", "street": "asd", "city": "dasd", "state": "dasd", "zip": 11 } } } }

eneufeld commented 2 years ago

Sorry for the delayed reply. We gladly accept a PR for this.

vhemery commented 10 months ago

I've been looking at it this morning and it's a bit more complicated than that. Actually, I'm not satisfied with the current implementation with 'group' either.

Usually, we have a featureMap when we have a "choice" in XSD. But when there are several values, their order matter.

For example: in https://www.omg.org/spec/XTCE/20180204/SpaceSystem.xsd, there is

<complexType name="EntryListType" mixed="false">
<annotation>
<documentation xml:lang="en">Contains an ordered list of Entries. Used in Sequence Container</documentation>
</annotation>
<choice minOccurs="0" maxOccurs="unbounded">
<element name="ParameterRefEntry" type="xtce:ParameterRefEntryType">
<annotation>
<documentation xml:lang="en">Specify a Parameter to be a part of this container layout definition.</documentation>
</annotation>
</element>
<element name="ParameterSegmentRefEntry" type="xtce:ParameterSegmentRefEntryType">
<annotation>
<documentation xml:lang="en">Specify a portion of a Parameter to be a part of this container layout definition. This is used when the Parameter is reported in fractional parts in the container before being fully updated.</documentation>
</annotation>
</element>
<element name="ContainerRefEntry" type="xtce:ContainerRefEntryType">
<annotation>
<documentation xml:lang="en">Specify the content of another Container to be a part of this container layout definition.</documentation>
</annotation>
</element>
<element name="ContainerSegmentRefEntry" type="xtce:ContainerSegmentRefEntryType">
<annotation>
<documentation xml:lang="en">Specify a portion of another Container to be a part of this container layout definition.</documentation>
</annotation>
</element>
<element name="StreamSegmentEntry" type="xtce:StreamSegmentEntryType">
<annotation>
<documentation xml:lang="en">Specify a portion of a Stream to be a part of this container layout definition.</documentation>
</annotation>
</element>
<element name="IndirectParameterRefEntry" type="xtce:IndirectParameterRefEntryType">
<annotation>
<documentation xml:lang="en">Specify a previous (not last reported) value of a Parmeter to be a part of this container layout definition.</documentation>
</annotation>
</element>
<element name="ArrayParameterRefEntry" type="xtce:ArrayParameterRefEntryType">
<annotation>
<documentation xml:lang="en">Specify an Array Type Parameter to be a part of this container layout definition when the Container does not populate the entire space of the Array contents. If the entire space of the Array is populated, a tolerant implementation will accept ParameterRefEntry also.</documentation>
</annotation>
</element>
</choice>
</complexType>

This is a sequence of entries, in a specific order, which can be of various nature. When I take a look at FeatureMapTest#testSaveFeatureMap, any sense of "featureMapAttributeCollection" is lost. The order is json does not matter and we have lost the order between "Hello" and "World".

I tried and replaced

      primaryObject.getFeatureMapAttributeType1().add("Hello");
      primaryObject.getFeatureMapAttributeType2().add("World");

with

primaryObject.getFeatureMapAttributeCollection().add(ModelPackage.Literals.PRIMARY_OBJECT__FEATURE_MAP_ATTRIBUTE_TYPE1, "Hello");
primaryObject.getFeatureMapAttributeCollection().add(ModelPackage.Literals.PRIMARY_OBJECT__FEATURE_MAP_ATTRIBUTE_TYPE2, "World");
primaryObject.getFeatureMapAttributeCollection().add(ModelPackage.Literals.PRIMARY_OBJECT__FEATURE_MAP_ATTRIBUTE_TYPE1, "!");

and the result in json is

{
  "eClass":"http://www.emfjson.org/jackson/model#//PrimaryObject",
  "name":"junit",
  "featureMapAttributeType1":["Hello","!"],
  "featureMapAttributeType2":["World"]
}

The closest thing to the XML/XMI would be

{
  "eClass":"http://www.emfjson.org/jackson/model#//PrimaryObject",
  "name":"junit",
  "featureMapAttributeType1":"Hello",
  "featureMapAttributeType2":"World",
  "featureMapAttributeType1":"!"
}

but this is not a valid json.

I think a better alternative would be something like

{
  "eClass":"http://www.emfjson.org/jackson/model#//PrimaryObject",
  "name":"junit",
  "featureMapAttributeCollection": [
    {
      "featureName":"featureMapAttributeType1",
      "value":"Hello"
    },
    {
      "featureName":"featureMapAttributeType2",
      "value":"World"
    },
    {
      "featureName":"featureMapAttributeType1",
      "value":"!"
    }
  ]
}

which is close to the EMF notion of FeatureMap as a list of FeatureMapEntry.

@ndoschek @martin-fleck-at What do you think of this option ? Do we have actual examples using FeatureMapEntries that would break ?

vhemery commented 10 months ago

For the original case, we could have the same structure with "mixed" instead of "featureMapAttributeCollection".

If I take the ecore and complexifies it just a little bit (adding "address" feature in addition to purchaseOrder), we get

  <eClassifiers xsi:type="ecore:EClass" name="DocumentRoot">
    <eAnnotations source="http:///org/eclipse/emf/ecore/util/ExtendedMetaData">
      <details key="name" value=""/>
      <details key="kind" value="mixed"/>
    </eAnnotations>
    <eStructuralFeatures xsi:type="ecore:EAttribute" name="mixed" unique="false" upperBound="-1"
        eType="ecore:EDataType http://www.eclipse.org/emf/2002/Ecore#//EFeatureMapEntry">
      <eAnnotations source="http:///org/eclipse/emf/ecore/util/ExtendedMetaData">
        <details key="kind" value="elementWildcard"/>
        <details key="name" value=":mixed"/>
      </eAnnotations>
    </eStructuralFeatures>
    <eStructuralFeatures xsi:type="ecore:EReference" name="xMLNSPrefixMap" upperBound="-1"
        eType="ecore:EClass http://www.eclipse.org/emf/2002/Ecore#//EStringToStringMapEntry"
        transient="true" containment="true" resolveProxies="false">
      <eAnnotations source="http:///org/eclipse/emf/ecore/util/ExtendedMetaData">
        <details key="kind" value="attribute"/>
        <details key="name" value="xmlns:prefix"/>
      </eAnnotations>
    </eStructuralFeatures>
    <eStructuralFeatures xsi:type="ecore:EReference" name="xSISchemaLocation" upperBound="-1"
        eType="ecore:EClass http://www.eclipse.org/emf/2002/Ecore#//EStringToStringMapEntry"
        transient="true" containment="true" resolveProxies="false">
      <eAnnotations source="http:///org/eclipse/emf/ecore/util/ExtendedMetaData">
        <details key="kind" value="attribute"/>
        <details key="name" value="xsi:schemaLocation"/>
      </eAnnotations>
    </eStructuralFeatures>
    <eStructuralFeatures xsi:type="ecore:EReference" name="purchaseOrder" upperBound="-2"
        eType="#//PurchaseOrderType" volatile="true" transient="true" derived="true"
        containment="true" resolveProxies="false">
      <eAnnotations source="http:///org/eclipse/emf/ecore/util/ExtendedMetaData">
        <details key="kind" value="element"/>
        <details key="name" value="PurchaseOrder"/>
        <details key="namespace" value="##targetNamespace"/>
      </eAnnotations>
    </eStructuralFeatures>
    <eStructuralFeatures xsi:type="ecore:EReference" name="address" upperBound="-2"
        eType="#//USAddress" volatile="true" transient="true" derived="true" containment="true"
        resolveProxies="false">
      <eAnnotations source="http:///org/eclipse/emf/ecore/util/ExtendedMetaData">
        <details key="kind" value="element"/>
        <details key="name" value="Address"/>
        <details key="namespace" value="##targetNamespace"/>
      </eAnnotations>
    </eStructuralFeatures>
  </eClassifiers>

In this case, we have to tell which comes first between the "Adress" value and the "PurchaseOrder" value.

So json with values could look like

{
  "eClass": "http://tempuri.org/PurchaseOrderSchema.xsd#//DocumentRoot",
  "mixed": [
    {
      "featureName":"Address",
      "value": {
        "name": "Home"
       }
    },
    {
      "featureName":"PurchaseOrder",
      "value": {
        "OrderDate": "2023-10-26"
       }
    }
  ]
 }

so we still know the intended order between feature entries.

vhemery commented 9 months ago

To better illustrate it, I've created a sample model named "Nora'sCaravan", that we can add later to the wiki. This valid model would be impossible to export to json without the proposed approach. So

<?xml version="1.0" encoding="UTF-8"?>
<caravan:Caravan xmlns:caravan="http://www.example.org/caravan" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <caravan:Person firstName="Nora"/>
  <caravan:DromaderyCamel name="Grumpy"/>
  <caravan:BactrianCamel name="Humpy"/>
  <caravan:DromaderyCamel name="Happy"/>
</caravan:Caravan>

would become (approximatively, to be confirmed after development)

{
  "eClass": "http://www.example.org/caravan#//DocumentRoot",
  "mixed": [
    {
      "featureName":"caravan",
      "value": {
        "group": [
         {
           "featureName":"Person",
           "value" : { "firstName":"Nora" }
         },
         {
           "featureName":"DromaderyCamel",
           "value" : { "name":"Grumpy" }
         },
         {
           "featureName":"BactrianCamel",
           "value" : { "name":"Humpy" }
         },
         {
           "featureName":"DromaderyCamel",
           "value" : { "name":"Happy" }
         }
        ]
       }
    }
  ]
 }

I'm attaching the concerned files (model, xsd and derived ecore). Nora'sCaravan.zip

martin-fleck-at commented 9 months ago

@vhemery Thank you for your suggestions! I'll have a look at it either tomorrow or early next week.

vhemery commented 9 months ago

A valid alternative would be

{
  "eClass": "http://www.example.org/caravan#//DocumentRoot",
  "mixed": [
    {
      "caravan": {
        "group": [
         {
           "Person": { "firstName":"Nora" }
         },
         {
           "DromaderyCamel": { "name":"Grumpy" }
         },
         {
           "BactrianCamel": { "name":"Humpy" }
         },
         {
           "DromaderyCamel": { "name":"Happy" }
         }
        ]
       }
    }
  ]
 }

But it makes harder to access the feature name. Which one do you prefer ? (not a big difference on development side)

martin-fleck-at commented 9 months ago

@vhemery I had a look into your proposal and I believe this is a very good solution! Thank you also for pointing out the problem with the order in the choice options! I just think we need to make sure that also the other side of it, i.e. the EObjectDeserializer, can deal with that. Would you be willing to open a PR for it and maybe add test cases?

Purely from a JSON point of view, I'd really prefer the second option where the feature names are actually the keys on the respective object and we do not have the "featureName" and "value" boilerplate. However, if you believe this will make things much more difficult (especially when deserializing it again), then we can also go with the first option.

As for the impact on the framework I'll double check once we have an implementation. However, if we actually do break the current behavior, I'd rather release a newer version than miss out on the extended functionality. This might also be a good opportunity to double check our version ranges on our dependencies to see if we can upgrade somewhere easily.

Anyway, thank you very much again for taking a look at this!