Closed ralpha closed 4 years ago
I think I fixed this in https://github.com/DFHack/scripts/commit/47d9b3c9390d21ecc49ddaba477b16071884ee87 - I assume you're running 0.47.04-r1? Can you try upgrading the script and see if that fixes it?
This was still an old export from a previous version. Updated everything and the fix above indeed resolved the issue. Export now shows (as expected):
<historical_event>
<id>219696</id>
<type>item_stolen</type>
<item_type>instrument</item_type>
<item_subtype>thêmnol</item_subtype>
<mat>dwarf bone</mat>
<item>949</item>
<entity>814</entity>
<histfig>1592</histfig>
<site>310</site>
<structure>-1</structure>
<circumstance>
<type>233</type>
<hist_event_collection>9237</hist_event_collection>
</circumstance>
<reason>
<type>none</type>
</reason>
</historical_event>
I was parsing a legends(_plus) file and there is a non UTF-8 encoded character in the export.
This character was in
historical_event.item_subtype
. See the<88>
below. This is a 0x88 this is probably a CP437 encoded character. And should be anê
I think. (Latin Small Letter E with Circumflex U+00EA) So anthêmnol
? This only appreas in[T_WORD:INCONVENIENT:thêmnol]
So this field should be escaped.
Part of file output: (my editor converted the
0x88
to<88>
so that is not the problem.) (0x88 is not a printable UTF-8 character)