EXIficient / exificient-grammars

Java Implementation of EXI (grammars part)
http://exificient.github.io/java/
MIT License
4 stars 9 forks source link

Serialize a SchemaInformedGrammars #1

Closed amarant closed 6 years ago

amarant commented 7 years ago

Hi, for now it is possible to serialize to JSON a SchemaInformedGrammars to be used by the js version. It would be nice to be able load it to get the SchemaInformedGrammars back. It would be useful because the generation of Grammars can be slow if there is many xsd. Also it would maybe make sense to serialize it to EXI, using an adapted XSD model, and load it back.

Thanks.

danielpeintner commented 7 years ago

Hi,

you bring up an important topic that has been discussed also within EXI working group. Anyhow, there has not been consensus whether there should be a standardized EXI grammars exchange format.

In EXIficient I do see 2 possibilities

  1. Describe EXI grammars with an XML schema and serialize/deserialize this format to internal EXI grammar classes
  2. Create/generate the Java sourcefile of an com.siemens.ct.exi.grammars.Grammars implementation based on EXI grammars

The latter has been explored a bit a while ago (see below some Snippets). Do you think 2. would solve your use-case or do you still need a way to exchange grammar files as described in 1.

Thanks,

-- Daniel

public class StaticSampleGrammar implements Grammars {
        /* GrammarContext ----- */
    final String ns0 = "";
    final QNameContext qnc0 = new QNameContext(0, 0, new QName(ns0, "Note"));
    final QNameContext qnc1 = new QNameContext(0, 1, new QName(ns0, "body"));
    final QNameContext qnc2 = new QNameContext(0, 2, new QName(ns0, "category"));
    final QNameContext qnc3 = new QNameContext(0, 3, new QName(ns0, "date"));
    final QNameContext qnc4 = new QNameContext(0, 4, new QName(ns0, "note"));
    final QNameContext qnc5 = new QNameContext(0, 5, new QName(ns0, "notebook"));
    final QNameContext qnc6 = new QNameContext(0, 6, new QName(ns0, "subject"));
    final QNameContext[] grammarQNames0 = {qnc0, qnc1, qnc2, qnc3, qnc4, qnc5, qnc6};
    final String[] grammarPrefixes0 = {""};
    final GrammarUriContext guc0 = new GrammarUriContext(0, ns0, grammarQNames0, grammarPrefixes0);
...
    /* Grammars ----- */
    com.siemens.ct.exi.grammars.grammar.Document g0 = new com.siemens.ct.exi.grammars.grammar.Document();
    com.siemens.ct.exi.grammars.grammar.SchemaInformedDocContent g1 = new com.siemens.ct.exi.grammars.grammar.SchemaInformedDocContent();
    com.siemens.ct.exi.grammars.grammar.SchemaInformedFirstStartTag g2 = new com.siemens.ct.exi.grammars.grammar.SchemaInformedFirstStartTag();
    com.siemens.ct.exi.grammars.grammar.SchemaInformedStartTag g3 = new com.siemens.ct.exi.grammars.grammar.SchemaInformedStartTag();
....
    /* Grammar Events ----- */
    g0.addProduction(new com.siemens.ct.exi.grammars.event.StartDocument(), g1);
    g1.addProduction(globalSE5, g15);
    g1.addProduction(new com.siemens.ct.exi.grammars.event.StartElementGeneric(), g15);
    g2.addProduction(globalAT3, g3);
...

    public boolean isSchemaInformed() {
        return true;
    }
    public String getSchemaId() {
        return schemaId;
    }
    public void setSchemaId(String schemaId) throws UnsupportedOption {
        this.schemaId = schemaId;
    }
    public boolean isBuiltInXMLSchemaTypesOnly() {
        return false;
    }
    public Grammar getDocumentGrammar() {
        return g0;
    }
    public Grammar getFragmentGrammar() {
        return g16; 
    }
    public GrammarContext getGrammarContext() {
        return gc;
    }
}
amarant commented 7 years ago

I use the java version of exificient in .net using IKVM (a JVM implemented in .NET), so it would be hard to use a generated java file, that I would have to compile. So for me I can only use option 1.

On a side note, I was able to replicate the json generation using an object graph in C# similar the one generated in Grammars2JSON, that I fill from a SchemaInformedGrammar and serialize, but I found that the opposite way is harder : the creation of a SchemaInformedGrammar from the content of the json grammar. My idea was to serialize the object graph in xml and use the associated xsd to transform it in exi. I don't know java very well but this is the idea behind jaxb I think. The main work is to transform the SchemaInformedGrammar cyclical graph to a tree-like object, and back.

danielpeintner commented 7 years ago

Hi,

I use the java version of exificient in .net using IKVM (a JVM implemented in .NET), so it would be hard to use a generated java file, that I would have to compile. So for me I can only use option 1.

BTW, do you plan to release your work. I think .NET interfaces and such by means of IKVM might be of help for others also!

W.r.t. your problem I wonder why the Java SourceCode is not of help. You can generate out of the schema first Java Source code which can than be compile to a class just like any other Java class you are using in IKVM, don't you?

Maybe I miss anything here?

danielpeintner commented 7 years ago

The commit https://github.com/EXIficient/exificient-grammars/commit/697714faa5f91eb8ae2f0e617797209566543da7 adds initial support for what I described. Maybe you can have a look and report what you found.

Essentially use the class Grammars2JavaSourceCode to create java Source file which you need to compile as part of your project.

Note: The tool still needs to be improved.

amarant commented 7 years ago

Yeah if I find some motivation I will create a NuGet package of Exificient, because I had to do some workarounds to have it works under IKVM.

The problem in my case is that I have to accept user generated schemas, and I don't want to have to distribute a java compiler, and I find the idea a little bit fragile and risky from a security standpoint to compile and execute code from user generated content.

I will look at Grammars2JavaSourceCode because it will help me understand how to re-create a SchemaInformedGrammars from data, possibly serialized as XML and EXI, I think it is definitely doable, and the best solution from a technical, durability and security point of view.

danielpeintner commented 7 years ago

FYI: I uploaded an initial version of exchanging EXI grammars with XML/EXI (see https://github.com/EXIficient/exificient-grammars/blob/master/src/main/java/com/siemens/ct/exi/grammars/persistency/Grammars2X.java)

The according schema can be found here: https://github.com/EXIficient/exificient-grammars/blob/master/src/main/resources/SchemaForGrammars.xsd.

Comments & feedback is appreciated. Note: There are still some parts not fully implemented!

amarant commented 7 years ago

I was actually doing the same thing, using jaxb too and a similar schema, I successfully done a round-trip with the notebook example, but had errors with my more extensive schemas. I will try this new code with my schemas. The next steps will be to complete it and then use Grammars2JavaSourceCode to generate the grammars of exiGrammars to bootstrap the whole process. Thanks.

danielpeintner commented 7 years ago

Thanks for your contribution! I will add some more test-cases that show enum and restricted char set type is missing.

Will try to provide a fix soon.

danielpeintner commented 7 years ago

In Grammars2X I added some more code for handling Enumeration datatype but I guess there is a bug in JAXB generation. In the code I marked 3 places with // TODO JAXB binding seems to be somewhat broken here !?!

There JAXB expects special classes instead of normal Boolean, Float/Double, and String and does not allow to set the according value.

Maybe you have also time to check what goes on here. Honestly I am a bit puzzled there.

danielpeintner commented 7 years ago

I changed the nesting of the XML schema w.r.t. Enumeration and now JAXB works fine (see also https://github.com/EXIficient/exificient-grammars/commit/011d3fc50ea9acfadd381dc4a69336a1106c0311).

danielpeintner commented 7 years ago

@amarant do you think we can close this issue?

amarant commented 6 years ago

I've added some PR to exificient-grammars :

7 for Grammars2JavaSourceCode

8 for Grammars2X

9 add the source code generated from SchemaForGrammars.xsd grammars

in exificient-core to add equals methods override for multiples classes (useful for the following tests)

and in exificient to add Grammars2Exi to marshal or unmarshal between SchemaInformedGrammars and EXI with tests

danielpeintner commented 6 years ago

I am ok with all PRs except

and in exificient to add Grammars2Exi to marshal or unmarshal between SchemaInformedGrammars and EXI with tests

I am not sure why PR https://github.com/EXIficient/exificient/pull/12 needs to be in the exificient project and can't stay in exificient-grammars ? I think this is also important for future Java9 module support which (as far as I understand it) requires one package export by project (see https://github.com/EXIficient/exificient/issues/11). Any thoughts?

amarant commented 6 years ago

I tried to put it in exificient-grammars at first but couldn't because it needs EXIResult, and SAXFactory that are both in exificient. I didn't knew about the Java9 module restriction to not use a same package name in different modules, but it seems changing the package name would solve it. I can update the PR with a different package name.

danielpeintner commented 6 years ago

I think adding

<dependency>
   <groupId>com.siemens.ct.exi</groupId>
   <artifactId>exificient</artifactId>
   <version>0.9.7-SNAPHSOT</version>
   <scope>test</scope>
</dependency>

to POM would allow you to put your files into exificient-grammars .. It is somewhat strange but it is just a test dependency..

amarant commented 6 years ago

This dependency is already present in the exificient-grammars POM, and as it is restricted to the test scope the importscom.siemens.ct.exi.api.sax.EXIResult and com.siemens.ct.exi.api.sax.SAXFactory cannot be resolved. If we remove the test scope of the exificient in the exificient-grammars POM, it would make exificient that already depends on exificient-grammars cyclically dependent.

danielpeintner commented 6 years ago

Maybe I am misunderstanding the issue. Hence I simply slightly modified the code and added the according test-cases. With the latest commit https://github.com/EXIficient/exificient-grammars/commit/631c759c4ec99316c9bc94671c7ad2b105a17275 one can marshal/unmarshal XMl or EXI-based streams. Does this resolve your issue?

amarant commented 6 years ago

Ok, it was about the tests. The user can still reproduce the code that was in Grammars2Exi.