Open Matew92 opened 1 week ago
I don't know if one is right-er than the other. They are slighly different situations: a child with nothing in it, vs a parent with no children. That said I don't think the current behavior is strongly motivated, just how it happened.
I would probably not change behavior at this point unless it's demonstrably problematic.
Hi Srowen,
thanks for your fast reply. I get the same behaviour with the fields (if a field on the df is null will be not printed in the xml file)so i was expecting the same for a empty array (or at least an option for it?)
I think there's a difference between []
and None
which is sort of mirrored here - that's not a missing array, it's an empty array. I think you could argue behavior either way, neither is that much more reasonable. But I would not change behavior that's stood for so long unless it was clearly wrong.
Yes, I agree with you that an empty array is different from a None (so indeed, I would not change the default behavior). However, for big data purposes, having an option to print or not print empty nested arrays would be really helpful because it optimizes the size of the XML file.
For example, in my case, I get 2-3 level nested data frames, and the results are all these empty tags for the arrays in a 100GB file.
The result is something like this for each row:
<a>
<b>
<c/>
<d/>
</b>
<e/>
<f>
<g/>
<h/>
<i>
<m/>
<n/>
</i>
<o/>
</f>
</a>
Im using the library on a nested dataframe ex:
this is my schema:
This my data:
What would i expect would be somthing like:
But i get :
Did someone find the same issue? Is there a way to get the behaviour i want ? i tried with .option("ignoreNullFields", "true") but i get the same described above