Closed streino closed 3 weeks ago
I can't reproduce today... :/
Not sure what happened but I can systematically reproduce now... And fix it by adding <xsl:output encoding="UTF-8"/>
in the XSLT.
However I'm not exactly sure what's going on, because the input file is UTF-8, the lxml parser doesn't have any specific instructions, and etree.tostring
is also set to UTF-8.
We can add <xsl:output>
to all XSLTs but I'd feel better knowing why this happens on a few files and apparently only on attributes (text in that same XML contains accents and nothing's wrong with them...).
Turns out lxml expects xsl:output
for encodings other than ascii:
https://lxml.de/xpathxslt.html#xslt-result-objects
The result is always a plain string, encoded as requested by the xsl:output element in the stylesheet. If you want a Python Unicode/Text string instead, you should set this encoding to UTF-8 (unless the ASCII default is sufficient).
ecospheres-xslt/xslts/default-record-type.xsl
https://inspire.ternum-bfc.fr/geonetwork/srv/fre/catalog.search#/metadata/f97dec5c-aec2-4c75-9ed6-611ef49fd227
Happened only on a few records out of the 3,5K transformed. Didn't look into the cause, but looking at the above example, it might be related to XML attributes?