Closed tooptoop4 closed 2 years ago
I can't make out what you're trying to demonstrate here. Describe it, and simplify?
isdentifier value appears as null in 2nd example @srowen
It doesn't appear in any of your output - please boil down and format the output. The two code snippets look identical too.
it does appear in 1st output, see I_5.xml
I see it now, yeah. The problem is this:
<definition>Proper Motion in RA
<footnote>
<para>the relation is pmRA = 15 * pmRAs * cos(DE)
if pmRAs is expressed in s/yr and pmRA in arcsec/yr</para>
</footnote>
</definition>
Is that XML in the body meant to be escaped? Regardless, it's a 'bug' that this just ends up throwing off the parser, and this is ultimately related to not supporting mixed elements (text, but also tags). That's why I wonder if that content is intended, as this would be unusual for "tabular"-like XML representations (i.e. what is the desired type of definition?)
not fixed
It's a duplicate, really, of other issues. Not fixed, yes.
XML file one shows all values as expected
XML file two shows isdentifier value as null
code used:
scala> val df = spark.read.format("xml").option("rowTag","dataset").load("sii.txt") df: org.apache.spark.sql.DataFrame = [_subject: string, _xmlns:xlink: string ... 7 more fields]
scala> df.show() +---------+--------------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+ | _subject| _xmlns:xlink| altname| descriptions|isdentifier| keywords| reference| tableHead| title| +---------+--------------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+ |astronomy|http://www.w3.org...|[{1005, ADC}, {I/...|{{This catalog, l...| I_5.xml|{http://messier.g...|{{{[{[J, H], Spen...|{{[{Number 5, ---...|Proper Motions of...| +---------+--------------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+
scala> val df = spark.read.format("xml").option("rowTag","dataset").load("sii.txt") df: org.apache.spark.sql.DataFrame = [_subject: string, _xmlns:xlink: string ... 7 more fields]
scala> df.show() +---------+--------------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+ | _subject| _xmlns:xlink| altname| descriptions|isdentifier| keywords| reference| tableHead| title| +---------+--------------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+ |astronomy|http://www.w3.org...|[{1005, ADC}, {I/...|{{This catalog, l...| null|{http://messier.g...|{{{[{[J, H], Spen...|{{[{Number 5, ---...|Proper Motions of...| +---------+--------------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+
scala>
cc @srowen