DFHack / dfhack

Memory hacking library for Dwarf Fortress and a set of tools that use it
Other
1.86k stars 471 forks source link

ExportLegends: historical_events non UTF-8 #1580

Closed ralpha closed 4 years ago

ralpha commented 4 years ago

I was parsing a legends(_plus) file and there is a non UTF-8 encoded character in the export.

This character was in historical_event.item_subtype. See the <88> below. This is a 0x88 this is probably a CP437 encoded character. And should be an ê I think. (Latin Small Letter E with Circumflex U+00EA) So an thêmnol? This only appreas in [T_WORD:INCONVENIENT:thêmnol]

So this field should be escaped.

Part of file output: (my editor converted the 0x88 to <88> so that is not the problem.) (0x88 is not a printable UTF-8 character)

<historical_event>
        <id>219696</id>
        <type>item_stolen</type>
        <item_type>instrument</item_type>
        <item_subtype>th<88>mnol</item_subtype>
        <mat>dwarf bone</mat>
        <item>949</item>
        <entity>814</entity>
        <histfig>1592</histfig>
        <site>310</site>
        <structure>-1</structure>
        <circumstance>
                <type>233</type>
                <HistEventCollection>9237</HistEventCollection>
        </circumstance>
        <reason>
                <type>none</type>
        </reason>
</historical_event>
lethosor commented 4 years ago

I think I fixed this in https://github.com/DFHack/scripts/commit/47d9b3c9390d21ecc49ddaba477b16071884ee87 - I assume you're running 0.47.04-r1? Can you try upgrading the script and see if that fixes it?

ralpha commented 4 years ago

This was still an old export from a previous version. Updated everything and the fix above indeed resolved the issue. Export now shows (as expected):

<historical_event>
        <id>219696</id>
        <type>item_stolen</type>
        <item_type>instrument</item_type>
        <item_subtype>thêmnol</item_subtype>
        <mat>dwarf bone</mat>
        <item>949</item>
        <entity>814</entity>
        <histfig>1592</histfig>
        <site>310</site>
        <structure>-1</structure>
        <circumstance>
                <type>233</type>
                <hist_event_collection>9237</hist_event_collection>
        </circumstance>
        <reason>
                <type>none</type>
        </reason>
</historical_event>