Closed shuch3ng closed 1 year ago
Tried changing Line 182 to
val refQName = e.getRef.getTargetQName
val baseType =
if (refQName != null)
getStructField(xmlSchema, xmlSchema.getParent.getElementByQName(refQName).getSchemaType).dataType
else getStructField(xmlSchema, e.getSchemaType).dataType
and the extracted schema looks like below
StructType(StructField(note,StructType(StructField(null,StringType,false), StructField(null,StringType,false), StructField(null,StringType,false), StructField(null,StringType,false)),false), StructField(heading,StringType,false), StructField(from,StringType,false), StructField(to,StringType,false), StructField(body,StringType,false))
Don't have the corresponding XML to test the schema but the null
names in the StructField
s in note
don't look right to me.
Right, that isn't supported. Your change looks to be in the right direction, to follow the 'ref', but seems like it needs a different change to be correct.
However it's reading the fields like "to" as both members of the struct and top-level elements. Is that the intent? that's what the schema seems to say too.
Yes it's intent because an XSD can have multiple top-level elements. In this example, "to", "from", "heading" and "body" are all globally defined so they can be referenced in the schema and also be used as the root elements.
I managed to get the correct field names from ref. Will add the test and create a PR.
No that's not what I mean. The 'global' definitions are part of the schema too. Is that what you intend? that is, does "body" really appear twice in the schema?
Yes it appears twice and yes it's what I'm trying to achieve. Probably a better example below, which is a modified XSD from what I encountered in my work.
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="book">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="name" type="xsd:string" form="qualified"/>
<xsd:element name="author" type="xsd:string" form="qualified"/>
<xsd:element name="isbn" type="xsd:string" form="qualified"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="bookList" type="BookList"/>
<xsd:complexType name="BookList">
<xsd:sequence>
<xsd:element ref="book" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
This XSD contains two top-level elements book
and bookList
. Depending on demand, bookList
or book
should be extracted from the following XMLs using the XSD.
<bookList>
<book>
<name>Functional Programming in Scala</name>
<author>Michael Pilquist, Runar Bjarnason, Paul Chiusano</author>
<isbn>9781617299582</isbn>
</book>
<book>
<name>Spark : The Definitive Guide</name>
<author>Bill Chambers, Matei Zaharia</author>
<isbn>9781491912218</isbn>
</book>
</bookList>
<book>
<name>Spark : The Definitive Guide</name>
<author>Bill Chambers, Matei Zaharia</author>
<isbn>9781491912218</isbn>
</book>
And with the current XSDToSchema, the XSD cannot be parsed because it cannot handle ref attribute and throws an exception. So even the book
schema cannot be retrieved.
OK. I don't think that's going to work here without more significant change, but you're welcome to try it. You can of course just write out the desired schema, or infer it from actual data.
OK. I don't think that's going to work here without more significant change, but you're welcome to try it. You can of course just write out the desired schema, or infer it from actual data.
I did consider these two but there are two problems.
I tried to parse the Example 3 from https://www.w3schools.com/xml/el_element.asp
and got the following exception
It's caused by the elements inside the complexType not having the
schemaType
and hencenull
is passed into thegetStructField
function in Line 182 XSDToSchema.scala