GeethanadhP / xml-avro

Convert XSD -> AVSC and XML -> AVRO
Apache License 2.0
36 stars 26 forks source link

[xsd-to-schema] Only process first root element #13

Closed rlouapre closed 5 years ago

rlouapre commented 5 years ago

There are scenario where with nested xsd where each of them contains root elements. The tool seems to generate an invalid avro schema. Processing only the first root element fixes this issue. It would be useful to add a new parameter to handle this use case.

rlouapre commented 5 years ago

@roycem90 - I am not sure if the best option is to use the first root element or maybe safer explicitly provide the root element QName. Option 1 works for me but happy to implement option 2 instead...

GeethanadhP commented 5 years ago

for my better understanding can you mock up a xsd and paste it here

rlouapre commented 5 years ago

parent.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns="ns:Parent" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xmlns:c="ns:Child" 
    targetNamespace="ns:Parent" 
    elementFormDefault="qualified" version="1.0">

    <xs:import namespace="ns:Child" schemaLocation="child.xsd"/>
...
<xs:element name="Trade" type="Parent">
</xs:schema>

child.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns="ns:Child"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="ns:Child"
    elementFormDefault="qualified"  version="1.0">
...
    <xs:element name="Trade" type="Child">
    </xs:element>
</xs:schema>
rlouapre commented 5 years ago

A better might be to provide the root element QName - b2a817042cfbe0ab199267ea441db8628f482b07

GeethanadhP commented 5 years ago

i believe it already works before, because most of the xsd's we have in our organization are nested like you specified.. can you tell me if you are facing any problem?

rlouapre commented 5 years ago

xml-avro tool generates an invalid avro schema:

{
  "type" : "record",
  "name" : "type0",
  "fields" : [ {
    "name" : "Trade",
    "type" : [ "null", {
...
    } ],
    "source" : "element ns:Parent:Trade"
  }, {
    "name" : "Trade",
    "type" : [ "null", {
...
    } ],
    "source" : "element ns:Child:Trade"
  } ],
  "source" : "document"
}

Output from avro-tool:
org.apache.avro.AvroRuntimeException: Duplicate field Trade in record type0: Trade type:UNION pos:1 and Trade type:UNION pos:0.

GeethanadhP commented 5 years ago

i think i understood the problem after seeing this output and error u added, thanks for helping me understand the problem :)

but i am not exactly sure if this is the best possible solution, but for your solutions seems good enough, if we get any trouble in future we can see then (^_^)

If you have tested it and the avro generation worked all fine then i m good to merge can you re-base the fix as there are merge conflicts, i will do the merge once that is done

Thanks for the help dude

rlouapre commented 5 years ago

Updated merge request provided (also include support for command line).

GeethanadhP commented 5 years ago

i think you can ignore that command line stuff, bcoz seems i haven't updated it for long time and even the basic stuff seems to be erroring on the command line (got 2 or 3 tickes on that), so i removed it from the ReadMe to avoid confusion and let users use config.yml :)