Json formatted CQL fails to deserialize via JsonCqlLibraryReader

csandersdev commented 4 years ago

I am running on latest from the develop branch. When I try to use JsonCqlLibraryReader to deserialize ELM in JSON format, I get the exception below. Looking at the generated JSON and comparing it to the sample JSON-formatted CQL in the test resources, there is clearly a difference of missing "type" properties in several places in the newly generated JSON.

Attached is a test patch that will reproduce the problem. test-patch.txt

com.fasterxml.jackson.databind.exc.InvalidTypeIdException: Missing type id when trying to resolve subtype of [simple type, class org.cqframework.cql.elm.execution.Library]: missing type id property 'type' (for POJO property 'library')
 at [Source: (StringReader); line: 40, column: 4] (through reference chain: org.opencds.cqf.cql.engine.elm.execution.LibraryWrapper["library"])
    at com.fasterxml.jackson.databind.exc.InvalidTypeIdException.from(InvalidTypeIdException.java:43)
    at com.fasterxml.jackson.databind.DeserializationContext.missingTypeIdException(DeserializationContext.java:1768)
    at com.fasterxml.jackson.databind.DeserializationContext.handleMissingTypeId(DeserializationContext.java:1297)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._handleMissingTypeId(TypeDeserializerBase.java:299)
    at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedUsingDefaultImpl(AsPropertyTypeDeserializer.java:164)
    at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:105)
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithType(BeanDeserializerBase.java:1178)
    at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:138)
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288)
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151)
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4202)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3218)
    at org.opencds.cqf.cql.engine.execution.JsonCqlLibraryReader.read(JsonCqlLibraryReader.java:27)
    at org.opencds.cqf.cql.engine.execution.CqlEngineTests.readJson(CqlEngineTests.java:74)
    at org.opencds.cqf.cql.engine.execution.CqlEngineTests.toLibrary(CqlEngineTests.java:62)
    at org.opencds.cqf.cql.engine.execution.CqlEngineTests.toLibrary(CqlEngineTests.java:53)
    at org.opencds.cqf.cql.engine.execution.CqlEngineTests.test_simpleLibraryJson_returnsResult(CqlEngineTests.java:119)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    at java.lang.reflect.Method.invoke(Method.java:508)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:89)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:41)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:542)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:770)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:464)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:210)

JPercival commented 4 years ago

Hi @csandersdev .

There are actually two JSON serialization formats supported by the CQL translator one for Jaxb (JSON) and one for Jackson (JXSON). The CQL Engine supports the Jackson format.

The translator options are here.

csandersdev commented 4 years ago

Thanks, @JPercival. I am trying to import libraries that our users create using the CDS authoring tool codebase and execute those in the CQL engine. The CDS authoring tool uses the cqframework/cql-translation-service docker image to stand up a web service that will translate the CQL and provide results, then it can export the CQL and compiled JSON as a ZIP file. I'm processing the exported ZIP files. The parse errors are coming when I try to read in the JSON libraries from those exported ZIP files. It looks like we can update the CDS code to request XML instead of JSON format, but would be nice if I didn't have to do that. I'd be happy to try to add support for JSON format in the CqlEngine, but it wasn't clear to me how I would do that.

cmoesel commented 4 years ago

I remember the JXSON PR for the translator -- and based on the conversation there, we merged it since it was only an option and did not deprecate JAXB. But we also noted that further regression needs to be done and annotations need to be fixed.

So... it looks like JXSON serialization is a requirement for the Java engine now but the CDS Authoring Tool exports using JAXB serialization. @JPercival , @brynrhodes -- Do you have a better understanding now of any possible regressions and have you addressed (or plan to address) the annotations issue? If so, then I can consider moving AT to JXSON serialization.

JPercival commented 4 years ago

We initially introduced the use of Jackson because of lack of JAXB support on Android. At the time we did the merge we thought it was likely backwards compatible though it was not extensively tested. We've since discovered that there are significant differences between the two formats. I've done some preliminary exploration that sets various options on both the Jackson export and import to make it compatible with JAXB and I haven't managed to rule that out as a possibility but I've not pushed far enough to achieve parity either. Assuming we do achieve parity there wouldn't be the need for both formats.

I've not done any further work on the annotations specifically.

The cql-engine also supports the XML format which is much more extensively tested. Would it be possible to use that instead?

cmoesel commented 4 years ago

The cql-engine also supports the XML format which is much more extensively tested. Would it be possible to use that instead?

I'm not sure if that was a question for me or for @csandersdev -- but we could look into having the CDS Authoring Tool export JSON and XML. I assume it wouldn't be a big deal, but we'd need to look at it more closely. I'm not sure the translator supports exporting both; we might have to run it twice, which is kind of unfortunate.

brynrhodes commented 4 years ago

@cmoesel , the translator component does support producing all the serialization formats off the same translation, but the translation service doesn't expose that. The integration we did with the FHIR Publisher does support that capability though, and we added multiple output formats to the cql-options structure specifically to support that use case, so it should be straightforward to expose that in the service.

Ideally, we wouldn't have to produce both, but as @JPercival noted, we are still working on getting to parity between all the formats and target environments.

csandersdev commented 4 years ago

How would you folks like to move forward with this issue? Should I move the conversation to the CDS authoring repo since it appears that the CqlEngine is working as designed right now? Or maybe the CQL Translation service repo? My naive point of view is that the simplest possible change would be to update the CQL Translation Service to export JXSON vs JSON, but I have no idea what kind of impact that might on downstream processing.

cmoesel commented 4 years ago

Based on @JPercival's comments above, I'm not sure we can/should outright switch the translator to JXSON -- or switch the CDS Authoring Tool to JXSON. My concern is that there are other engines out there (like the JavaScript one) that currently assume the JAXB serialization and may not work correctly w/ the JXSON serialization. This is directly relevant because the CDS Authoring Tool uses the JavaScript engine in its "Testing" tab.

We could update the translation service to support more modes/formats -- so you could ask it for JAXB JSON, JXSON JSON, XML, or some combination of those. After that, we could update the CDS Authoring Tool to invoke the translation service and ask for JAXB JSON and XML -- and you could use the XML w/ the CqlEngine.

Or if you're running your own instance of the CDS AT (not the AHRQ-hosted instance) and you're comfortable w/ the risk of JXSON, we could expose a configuration parameter that allows you to say that your instance of CDS AT should export JXSON JSON.

Last -- I don't know your workflow, but another approach would be for you to take the CQL that the CDS Authoring Tool provides and re-translate it using the JXSON serialization. CDS AT gives you the .cql files and the .json files, so there's no reason you couldn't just re-translate the .cql files (if you have a workflow where you could integrate such a step).

brynrhodes commented 4 years ago

It seems like a reasonable step would be to surface the ability to request multiple translation formats in the translation service. Submit a tracker to that effect on the translation service repository: https://github.com/cqframework/cql-translation-service ?

csandersdev commented 4 years ago

We have our own instance of CDS AT and could certainly test with JXSON if it was available as a configuration option. We could also work with the previous suggestion of exporting both JAXB-JSON and XML. If the JXSON really is that different, maybe the both-format-export option is better. I was basing my suggestion above on my own limited testing where I saw missing "type" attributes that impacted serialization, but were unlikely to change behavior in the inflated objects.

I will log an issue in the CQL translation service repo for multi format export as @brynrhodes suggests and maybe take a look at implementing it. Doesn't seem too hard. CDS AT will also need to change if and when that becomes available.

Regarding @cmoesel 's other suggested workaround to recompile the CQL, that is my current plan barring any changes in the public codebases. It allows for the possibility that authors are using a different version of the translator or a different set of translation options during testing than what is used at runtime, but that seems like a minor risk. I can certainly make things work with double compilation. I was hoping here to make that unnecessary for those that come after me.

JPercival commented 3 years ago

I've dug into this enough to confirm that the JAXB and Jackson formats can't be fully reconciled at this time. I did a fair amount of investigating on getting JAXB working on Android and the general consensus appears to be "don't do that". There's a related issue with translating CQL on Android, which is that the ModelInfos are currently specified in XML. The solution that appears to be the most compatible with the largest spread of Java platforms is Jackson-based JSON (as opposed to JAXB-based XML, JAXB-based JSON, or XML serialized via some other framework).

The next step from the translator perspective is to support JSON-based ModelInfos.These would need to be Jackson-based in order to support translation on Android. The ModelInfo format is not a normative part of the ELM specification so that change could be relatively easily made.

I think it makes sense to have the discussion as to whether to support both JSON and JXSON going forward or to standardize on Jackson. This would be a backwards-incompatible change for all the ELM out in the wild. Unfortunately, we're in that position already as there's not currently a way to distinguish between the two formats based on mime-type, AFAIK.

cmoesel commented 3 years ago

Is it possible to enumerate the differences between JAXB-produced JSON and Jackson-produced JSON (for ELM)? That would be helpful in trying to determine the implications of deprecating JAXB-produced JSON.

brynrhodes commented 3 years ago

We can definitely do a detailed comparison on a body of ELM, I have a whole bunch of ELM in the engine that I use for regression of the reader and we can produce a diff based on that, but at the highest level, the biggest change is to the way the content serializes inside annotations. Which has long been a sore spot anyway, and I'm actually wondering if we would be better served by just turning that into an HTML string, instead of trying to represent as mixed content. How much effort would be involved from your perspective in consuming an HTML string instead of reconstructing it from the JSON mixed-content representation?

cmoesel commented 3 years ago

When you talk about mixed content annotations, do you mean like this?

{
  "type": "Annotation",
  "s": {
    "r": "5",
    "s": [
      {
        "value": [
          "",
          "define ",
          "HelloWorld",
          ": "
        ]
      },
      {
        "r": "4",
        "s": [
          {
            "r": "2",
            "s": [
              {
                "value": [
                  "'Hello'"
                ]
              }
            ]
          },
          {
            "value": [
              " + "
            ]
          },
          {
            "r": "3",
            "s": [
              {
                "value": [
                  "'World'"
                ]
              }
            ]
          }
        ]
      }
    ]
  }
}

I don't personally use those at all, but I think that Bonnie and some of the code coverage / highlighting tools might use it. So I'd definitely want them to be involved in any conversation that changes that.

JPercival commented 3 years ago

So are the next steps here to shoot an e-mail to the Bonnie team to discuss the possibility of annotations as HTML?

cmoesel commented 3 years ago

I think so, and I'd also reach out to the Abacus team at MITRE. I think I already pointed a few of them to this before (as a heads up), but I'll let them know that you are looking to talk to them.

mgramigna commented 3 years ago

We (MITRE Abacus) rely pretty heavily on the current JSON structure of the annotations to do clause highlighting of the CQL. We need some way to look back into the execution results for localIds to know if a clause was truthy or falsy, and the r property on an annotation helps us do that.

Assuming that annotations being an HTML string would just be an un-styled nested structure of HTML tags, would there be a way to preserve some ability for the localIds to persist? Otherwise, it could be really difficult to keep the same highlighting functionality.

brynrhodes commented 3 years ago

Hi @mgramigna , the HTML string contents would actually be the XML that comes out in the XML rendering, so:

            <a:s r="4">
               <a:s>define HelloWorld:
  </a:s>
               <a:s r="3">
                  <a:s r="1">
                     <a:s>'Hello'</a:s>
                  </a:s>
                  <a:s> + </a:s>
                  <a:s r="2">
                     <a:s>'World'</a:s>
                  </a:s>
               </a:s>
            </a:s>

Then JSON-escaped of course. So it would still have the localId ties to the expression nodes, it would just be represented as embedded XML, rather than trying to serialize it as mixed-content XML as part of the ELM. Does that help?

JPercival commented 3 years ago

What are the next steps on this? Does the XML structure Bryn posted above work for your use case @mgramigna ?

mgramigna commented 3 years ago

@JPercival yeah, we could update our measure highlighting code to support this.

@brynrhodes -- do you envision the Measure bundles that exist and connectathon/ecqm-content-r4 to have ELM that is updated with the new annotation format once that exists?

brynrhodes commented 3 years ago

@mgramigna , yes, and we'll make the change with a switch to make sure we support a transition phase.

JPercival commented 3 years ago

So, I understand the resolution to be that we're phasing out JAXB in favor of Jackson with a transition period. Post transition period it'll all be Jackson, and it'll include the annotations as Bryn described above.

What's a reasonable transition period? The last cql-engine release was about 6 months ago. Is 6 months from now a good target?

mgramigna commented 3 years ago

@JPercival Sounds good from our side

brynrhodes commented 3 years ago

I'd like to press for quarterly releases, I think 6 months is too long. So once the smoke has cleared from this release, we should plan the next one for end of year.

JPercival commented 2 years ago

There are some pending updates to the cql-translator that supersede some of this discussion: cqframework/clinical_quality_language/pull/752

cqframework / clinical_quality_language

Json formatted CQL fails to deserialize via JsonCqlLibraryReader #950