databricks / spark-xml

XML data source for Spark SQL and DataFrames
Apache License 2.0
499 stars 226 forks source link

Wrapping elements of nested array #678

Closed vitaliyb-adorama closed 6 months ago

vitaliyb-adorama commented 6 months ago

Hi there!

Is there a way to wrap elements of nested array with other element? Here is my schema:

root
 |-- pid: string (nullable = true)
 |-- variants: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- skuid: string (nullable = true)
 |    |    |-- new_item_num: string (nullable = true)
 |    |    |-- Student_00_Program: string (nullable = true)
 |    |    |-- mfg_partno: string (nullable = true)
 |    |    |-- shortDescription: string (nullable = true)
 |    |    |-- mfg_upc: string (nullable = true)

It's being translated into next output:

<Products>
    <Product>
        <pid>AC45ML</pid>
        <variants>
            <skuid>AC45MLTPM</skuid>
            <mfg_partno>MTJF3AM/A</mfg_partno>
            <shortDescription>desc 1</shortDescription>
            <mfg_upc>195949003127</mfg_upc>
        </variants>
        <variants>
            <skuid>AC45MLTPS</skuid>
            <mfg_partno>MTJE3AM/A</mfg_partno>
            <shortDescription>desc 2</shortDescription>
            <mfg_upc>195949003080</mfg_upc>
        </variants>
    </Product>
</Products>

But instead I would to have multiple elements inside single tag, something like this:

<Products>
    <Product>
        <pid>AC45ML</pid>
        <variants>
            <variant>
                <skuid>AC45MLTPM</skuid>
                <mfg_partno>MTJF3AM/A</mfg_partno>
                <shortDescription>desc 1</shortDescription>
                <mfg_upc>195949003127</mfg_upc>
            </variant>
            <variant>
                <skuid>AC45MLTPS</skuid>
                <mfg_partno>MTJE3AM/A</mfg_partno>
                <shortDescription>desc 2</shortDescription>
                <mfg_upc>195949003080</mfg_upc>
            </variant>
        </variants>
    </Product>
</Products>

Is it possible?

srowen commented 6 months ago

Yes, but it's not a function of this library, but of your DataFrame structure. The variant array needs to be a member of a struct called variants instead and then it will output this