MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

xml2kvp: `include_xml_prop` should be default when speed not concern #362

Open ghukill opened 5 years ago

ghukill commented 5 years ago

Now that roundtripping xml2kvp input and output, the setting include_xml_prop is often needed to be True. Might be worth defaulting to True for this flag, but setting explicitly in spark.jobs for efficiency at scale?

ghukill commented 5 years ago

Except, this could be a considerable performance hit for ES indexing, which is common and a bottleneck already. Thinking it might be better to guess when this flag is needed, and apply if False.