Update df.unit_thoughts.xml:unit_thought_type and df.units.xml:emotion_type data

The structure of this info has a rather quirky layout where emotion_type contains a couple of general purpose fields and unit_thought_type sort of hints at what text some field or other may result in. There's a need to fill in additional info, such as what field are used for which thoughts, as well as what the types of those fields are, and the current structure isn't very suitable for that.

I can see a few different alternatives for restructuring:

Converting emotion_type into a base type where each thought has a derived type. In this scenario you'd e.g. get something like: emotion_type_needsunfulfilledst containing the fields "need" of the type df.need_type and "hfid" of a general integer type with the comment that it only applies to PrayOrMeditate. and emotion_type_spousegavebirthst containing the fields "offspring_count" of an integer type and "relation" of a new enum type containing the values "married" = 1, "sibling" = 11, "parent" = 12 (although it might be better to have a more general relation type containing "spouse" = 1, "mother" = 2, "father" = 3, "lover" = 9, "sibling" = 11, "child" = 12, "friend" = 13, "still annoying acquaintance" = 14, and "animal training partner" = 18 [with pet=0 removed, as it doesn't seem to work anymore], with the comment that only those 3 values are used).

The captions of unit_thought_type would be moved into these derived types. Given the size of the results, emotion_type, its derived types, and the contents of df.unit_thoughts.xml should probably be moved into a new df.unit_emotion.xml file.

Make a really messy union out of emotion type, where you'd end up with the fields
- subthought (the current field unchanged)
- subthought_need of the df.need_type type as a union overload/alias of subthought
- subthought_relation of the enum type above as a union overload/alias
- severity (the current field unchanged)
- severity_hfid as an integer field union overload/alias of severity
- severity_child_count, an integer field as an overload/alias of severity plus a bazillion others to cover the various usages (although you may skip union overload/alias fields where the type is an integer anyway, and just use the base fields in those cases). The unit_thought_type caption texts would be changed to name the parameters they're using, rather than indicating the topic of the resultant text, and any descriptions on how to get those texts would be added as additional comments and possibly enum types with captions.
As 2, but skip the union mess in emotion_type, and instead have the caption parameters refer to either subthought or severity, and comments mention the types to be used for field interpretation.
All the alternatives I haven't seen, which may very well include the good ones.

I've done a fair bit of work mapping out the various combinations, but there's still work remaining to be done (not to mention that my script/note is a rather messy work in progress). I wanted to start a discussion to hopefully get some kind of resolution, rather than go though a pull request containing an implementation, only to have to rework it all when the discussion ends up resulting in a different structure.

There seems to be some parallel contents in this adventure mode stuff that might be useful to track down parameters as one would assume they're used in the same way, but nothing that seems to directly involve the types discussed here. Once there's some actual info here I would guess it could be used the other way as well. anon_3 corresponding values have the relation "worthlessness" < -10, "value" > 10, "nuances" in between for fortress mode RealizeValue. GaveBirth only has boy/girl for single children, while multiples get twins/triples,...quindecaplets, and then "many babies" . Office qualities are the same (and also applies to other rooms that use quality levels rather than value). GhostHaunt likewise have the same levels (the current XML list lacks one of the 4 values. SpouseGaveBirth actually works differently. 1 is getting married, 11 getting 1 or more siblings, and 12 is becoming a parent. Note that the comments above are fortress data. I don't know how that relates to adventure mode.

Edit: My script/research notes can be found here https://github.com/PatrikLundell/scripts/blob/own_scripts/thoughts.lua. Note that it may be updated without any other indication than the version number inside it.

We can't "convert" emotion_type and unit_thought_type into a class hierarchy because they are both enums (think integers with fancy names for certain values). We can't make up our own "emotion_type_spousegavebirthst" type, for example, because DF doesn't have that - it has one emotion_type enum, where 1=ADORATION, 2=AFFECTION, etc..

I'm not sure what you're referring to in df.units.xml - there's a thing in unit.personality.emotions that has a field of type emotion_type, but that's it. Again, the layout of that has to match DF's, although we can rename fields if we want.

If by "emotion_type contains a couple of general purpose fields" you mean the enum attributes, those are metadata that we've added (e.g. color and divider) that are useful in some way. You're welcome to add new enum-attr entries to link unit_thought_type and emotion_type somehow if there is an appropriate way to link them.

I know C(++) doesn't have enums, just named constants that can be defined to look like they made up enums...

Anyway, you're correct in that it's not emotion_type I'm actually after, but the nameless type making up the elements of the unit_personality.emotions vector. However, I guess this type can't be converted into a virtual class with subclasses determined by the "type" field either, although that would definitely be the best alternative if it was possible.

Alternative 2 is a really bad one, as the union the nameless type would be converted into would be a really horrible mess to deal with. It should not be considered further.

This is what alternative 3 would look like: For emotion_type, I really meant the attributes. This is what the element LearnTopic currently looks like (it's actually 0.44.07 which I have at hand locally, but I don't think it has changed):

        <enum-item name='LearnTopic'>
            <item-attr name='caption' value='after learning about [topic]'/>
            <item-attr name='xml_caption' value='learned scholar flag'/>
        </enum-item>

and this is what I'd change it to:

        <enum-item name='LearnTopic'>
            <item-attr name='caption' value='after learning about [subthought_severity]'/>
            <item-attr name='subthought_severity' value='knowledge_scholar_category_flag index, flag index'/>
            <item-attr name='xml_caption' value='learned scholar flag'/>
        </enum-item>

There would be three optional attributes, "subthought", "severity", and "subthought_severity" that would be present only if the corresponding token was in the caption. These attributes would try to indicate how the parameters are to be used, although they wouldn't be useful programmatically, only as a guide to people reading the attributes. Apart from these three tokens, the caption would continue to contain "[he]", "[him]", and "[his]". It would be possible to have both [subthought] and [severity] as in "near a [severity] [subthought]" for AdmireBuilding.

I would also add an optional "extended_caption" attribute that would show the data that's actually there, but which DF only displays as the top thought (or not at all). It would use the same parameters, but the "caption" attribute wouldn't contain parameters not displayed. e.g. for MadeArtifact: "caption" = "after creating an artifact" and "extended_caption" = "after creating [subthought]",

This is basically what I've got in thoughts.lua, although that actually contains functions to extract the appropriate values in it, but I don't see any reasonable way to get that into XML attributes.

It can also be noted that I haven't got a complete mapping of everything, but most of it is there.

Edit: And I obviously still don't understand how to make code sections that actually display like code and get them to end...

I've done some further thinking, and have come to the conclusion that making attributes for subthought, severity, and subthought_severity won't add anything that comments can't provide, so I think it's better to have the caption and optional extended_caption attributes, but provide the parameter information as comments instead.

I fixed your code sections - all of the triple backticks need to be on their own lines. You can edit your comment to see what I did.

I don't think "subthought_severity" is a good name for that, and changing "[topic]" to "[subthoughts_severity]" in the caption attribute makes it harder to understand. I think at least one of the caption and xml_caption attributes was taken from a string dump or an XML export, so it would be good to keep those intact, but if additional data isn't useful programmatically, a comment might be a better way to go.

I don't know what xml_caption is supposed to be, but suspect it might be the one taken from an XML export. The caption has a small number of spelling errors which indicates somebody has typed those.

These are examples of what I've done so far.

    <enum-type type-name='emotion_type' base-type='int32_t'>
        <enum-attr name='color' type-name='int8_t' default-value='7'/>
        <enum-attr name='divider' type-name='int8_t' default-value='0'/>
        <enum-attr name='feeling_type' type-name='int8_t' default-value='1'/>
        <enum-attr name='text' default-value=''/>
        <enum-attr name='prefix' default-value=''/>
       <comment>
            color: The color the thought is displayed as.
            divider: Modifies the strength of a thought by the strength with the
              divider to get the effective strength. A negative divider indicates it's
              a positive, stress reducing, emotion.
              A divider of 0 means there is no stress effect of the thought.
            feeling_type: 1 means the feeling is expressed as "is/was", while
              0 means it's expressed as "feels/felt", and -1 means nothing is
              printed.
            text: The text DF prints for the feeling.
            prefix: a prefix that goes in between the feeling_type and the text, but
              printed with the standard color, not the feeling one.
        </comment>

        <enum-item name='ANYTHING' value='-1'/>
            <item-attr name='feeling_type' value='0'/>
            <item-attr name='text' value='ANYTHING'/>
        <enum-item name='ACCEPTANCE'>
            <item-attr name='color' value='7'/>
            <item-attr name='divider' value='-8'/>
            <item-attr name='feeling_type' value='1'/>
            <item-attr name='text' value='accepting'/>
        </enum-item>
        <enum-item name='ADORATION'>
            <item-attr name='color' value='11'/>
            <item-attr name='divider' value='-1'/>
            <item-attr name='feeling_type' value='0'/>
            <item-attr name='text' value='adoration'/>
        </enum-item>

   <enum-type type-name='request_type'>
        <enum-item name='Job_Scarcity' value='25'/>
        <enum-item name='Work_Allocation' value='26'/>
        <enum-item name='Weapon_Production' value='27'/>
        <enum-item name='Yelling_At_Official' value='28'/>
        <enum-item name='Crying_At_Official' value='29'/>
        <enum-item name='Petitioning_For_Citizenship' value='48'/>
   </enum-type>

    <enum-type type-name='haunt_type'>  --  May be defined elsewhere
        <enum-item name='Haunted'/>
        <enum-item name='Tormented'/>
        <enum-item name='Possessed'/>
        <enum-item name='Tortured'/>
   </enum-type>

    <enum-type type-name='official_room_type'>  --  May be defined elsewhere
        <enum-item name='Office'/>
        <enum-item name='Bedroom'/>
        <enum-item name='Dining_Room'/>
        <enum-item name='Tomb'/>
   </enum-type>

    <enum-type type-name='unit_thought_type'>
        <enum-attr name='caption'/>
        <enum-attr name='extended_caption'/>
        <enum-attr name='xml_caption'/>
        <comment>
            caption: The text as displayed by DF.
            extended_caption: The caption text modified to display the information
              that exists for the thought, but isn't displayed normally. It may be
              displayed by DF as the primary thought, and it may be entirely hidden.
              This attribute is not present if it would be identical to caption.

            Tokens used in the captions:
            - [he]: -> he/she/it depending on gender 
            - [his]: -> his/her/its depending on gender
            - [him]: -> him/her/it depending on gender
            - [subthought]: The contents of the subthought field is used to derive this.
            - [severity]: The contents of the severity field is used to derive this.
            - [subthought_severity]: The contents of both the subthought and the severity
                fields are used to jointy derive this.
            The subthought and severity fields are the corresponding fields in the type
            df.unit_personality.emotions [*] anonymous type.
        </comment>

        <enum-item name='None' value='-1'/>

        <enum-item name='Conflict'>
            <item-attr name='caption' value='while in conflict'/>
            <item-attr name='xml_caption' value='conflict'/>
        </enum-item>
        <enum-item name='Trauma'>
            <item-attr name='caption' value='after experiencing trauma'/>
            <item-attr name='xml_caption' value='death and injury'/>
        </enum-item>
        <enum-item name='WitnessDeath'>
            <item-attr name='caption' value='after seeing [subthought] die'/>
            <comment>
                subthought: id of df.global.world.incident.all entry
                  Used as: incident = df.incident.find (subthought)
                    incident.victim_race refers to the raws creature id,
                    incident.victim_hf.hfid refers to the hf to derive its name
                      if the victim is a hf.
            </comment>
            <item-attr name='xml_caption' value='witnessed death in incident'/>
        </enum-item>
        <enum-item name='UnexpectedDeath'>
            <item-attr name='caption' value='at the unexpected death of somebody'/>  --  Is this correct, or is it really extended_caption here as well?
            <item-attr name='extended_caption' value='at the unexpected death of [subthought]'/>
            <comment>
                subthought: hf id
                  Used as: Derive name from referenced hf.
            </comment>
            <item-attr name='xml_caption' value='hf died unexpectedly'/>
        </enum-item>

Putting further waste of effort on hold until a decision is made. This is why I wanted a discussion before starting.

A possible, but still very ugly, way of dealing with the issue of the anonymous "emotions" element type's subthought/severity overloading would be to create a struct called e.g. thought_parameters. This struct would then replace the subthought/severity element pair in the anonymous type. The thought_parameters struct would be an incredibly ugly union of single fields, one for each thought enum value, each of which would refer to a struct with two element, one for subthought and one for severity, but with the elements having (somewhat) descriptive names and being of appropriate types. Unused parameters would be defined with unnamed placeholder fields to ensure all of these types get the same size (and the second element ending up in the correct place). There is room for these lowest level structs to be merged into a smaller number than the number of unit_thought_type elements: incident parameters are used in several cases and can be shared for example, and all the cases where no parameter is used can make use of the same "no parameters" version. You'd end up with extra levels of referencing with this scheme, but the horror of the massive union overload can be kept slightly away from the data type itself.

And some extra pebbles thrown into the mix: The current unit_thought_type captions don't work properly when thoughts are remembered. Some work fine, but those starting with "after" drop that word when remembered, and those starting with "while" don't make sense if used as is when remembered. I haven't seen those, but suspect they may transform "while" into "being" or something similar. As far as I can tell there are 3 different wordings for memories ("remembering", "reliving", and "dwelling upon") with the one used being tied to the emotion. The starts I've found being gobbled up when remembered are: -- while in -- after -- at -- when -- upon -- as [he] was caught up in -- near -- to be -- to have -- during -- by -- due to The way I'd want to handle those in captions is by surrounding the part that may be removed in curly braces, e.g.: "{after }seeing [subthought] die" Doesn't quite work, as there's sometimes a replacement for a remembered thought. Thus, I'd use: "{to have |having }[his] punishment reduced" for PunishmentReduced, and no vertical bar if the replacement is empty, so the WitnessDeath example above would still look the same.

DFHack / df-structures

Update df.unit_thoughts.xml:unit_thought_type and df.units.xml:emotion_type data #239