Open ghukill opened 5 years ago
Confirmed.
Including this string in XML record:
<mods:location>
<mods:url usage="primary"
><![CDATA[http://digital.library.wayne.edu/item/wayne:Livingto1876b22354748?goober=tronic&horse=smelt]]></mods:url>
</mods:location>
is returned as this after harvest:
<mods:location>
<mods:url usage="primary">http://digital.library.wayne.edu/item/wayne:Livingto1876b22354748?goober=tronic&horse=smelt</mods:url>
</mods:location>
<![CDATA[]]>
is gone, and &
has been encoded as &
.
Using this to parse XML for static harvests: https://github.com/databricks/spark-xml#hadoop-inputformat
Which, appears to use XmlInputFormat
from Apache Mahout project.
Been reported that ampersands in XML records during static harvest, even when enclosed in
<![CDATA[]]>
tags, are replaced with&
.This is not ideal when ampersands are required output, and would be beneficial if
<![CDATA[]]>
were untouched.