Now that roundtripping xml2kvp input and output, the setting include_xml_prop is often needed to be True. Might be worth defaulting to True for this flag, but setting explicitly in spark.jobs for efficiency at scale?
Except, this could be a considerable performance hit for ES indexing, which is common and a bottleneck already. Thinking it might be better to guess when this flag is needed, and apply if False.
Now that roundtripping xml2kvp input and output, the setting
include_xml_prop
is often needed to beTrue
. Might be worth defaulting toTrue
for this flag, but setting explicitly inspark.jobs
for efficiency at scale?