df.world_data.xml:world_region_details.edges oddities

PatrikLundell commented 5 years ago

gui/gm-editor displays inconsistent information about split_x and split_y:
- split_x coord2d[17], while expanded the range is 0-15 of coord2d[17] (which has 17 entries)
- split_y coord2d[16], while expanded the range is 0-16, of coord2d[16] (which has 16 entries) In both cases the actual data matches the xml. I have no idea why gui/gm-editor displays the wrong size along the first dimension, instead displaying the same value as the second one.

Are the dimensions of the entries completely verified, and is there a way to verify them? The reason for the question is that biome_corner would have to be 17 17 to allow for the complete 16 16 set of mid level tiles to be defined, edge wise. With only 16 * 16 the southernmost and easternmost edges and corners of the southernmost and easternmost row/column are left undefined. This, in turn, would make it fairly pointless for the biome and elevation arrays to provide the data that would be used over those undefined edges and corners.

As an aside, the following might be a better description of the corresponding part of the file (comment attributes changed and added some comment text, but no data layout change):

          <static-array name='biome_corner' count='16'
                          comment='0=Reference is NW, 1=Reference is N, 2=Reference is W, 3=Reference is current tile'>
                <static-array type-name='int8_t' count='16'/>
            </static-array>
            All 4 corners touching get the same reference, i.e. SE corner of the tile to the NW,
            SW corner of the tile to the N, NE corner of the tile to the W, and the NW corner
            of the current tile, as directed by this value.
            <static-array name='biome_x' count='16' comment='0=Reference is N, 1=Reference is current tile (adopted by S edge to the N)'>
                <static-array type-name='int8_t' count='16'/>
            </static-array>
            <static-array name='biome_y' count='16' comment='0=Reference is W, 1=Reference is current tile (Adopted by E edge to the W)'>
                <static-array type-name='int8_t' count='16'/>
            </static-array>

Edit: I've tried to find the logic used for corners that are to be "reversed" because the biome specified should yield to others present. As far as I can tell (by locating corners, embarking at them, and then examining what DF produced, with the help of scripts for location and examination of them), the algorithm selects the lowest corner value that's of the highest precedence level, i.e. first the NW (0) one, then the N (1) one, followed by the W (2) one. I've looked at corners of 3 biomes (never found one with 4 candidates), and have seen this pattern for all but the last (3) one, because I haven't found any world with both the 0 and 3 corner being rejected while the (1) and (2) have different superior biomes.

Edit 2: It seems the corner rules apply mostly to world edges as well. On the eastern and southern sides there are no corner values guiding the process, and corners consistently end up behaving as there was a corner definition of 0, resulting in the eastern side getting their NE corner data from the north and the southern side getting their SW corner from the west. The western and northern world edges show a more complex corner behavior as there are actual corner definitions available. On the northern side a corner value of 0, and 2 resulted in the western tile being used, while 1 and 3 selected the eastern one. This is the exception to the rule, as 1 is interpreted as 3 rather than 2, favoring the WE direction indication (bit?) over the numeric order. For the western side, 0 and 1 resulted in the northern biome being selected, while 2 and 3 selected the southern one.

lethosor commented 5 years ago

gui/gm-editor displays inconsistent information about split_x and split_y:

split_x coord2d[17], while expanded the range is 0-15 of coord2d[17] (which has 17 entries)

split_y coord2d[16], while expanded the range is 0-16, of coord2d[16] (which has 16 entries)

Are you saying that the type name displayed for these fields is giving the wrong array length? If so, that part is probably a core DFHack issue. If you print() these fields, do you see the same issue?

https://github.com/dfhack/dfhack/commit/576174ea0babdb06d05baecbff0719e9ba9201d1 is the change that introduced lengths in string representations of array-like types, by the way. It has known limitations for multi-dimensional arrays that are hard to address.

As for whether the dimensions are correct - if there is anything after split_x and split_y with sensible contents, then the overall size of split_x/y is almost certainly correct (it could maybe be off by 1-2 depending on padding, but not 16-17). That doesn't necessarily mean the dimensions aren't reversed - 16x17 and 17x16 arrays are the same size - but it sounds to me like that's not the issue according to your research.

PatrikLundell commented 5 years ago

The asymmetry of the arrays make logical sense based on how they're used: There's no use for showing how N/S edges should be handled in the 17:th column as that column belongs to the next world tile, but there is a use for a 17:th row to define how the 16:th/17:th N/S border should be handled because the 16:th belongs to the local tile (and vice versa for the other matrix).

print (df.global.world.world_data.region_details [0].edges.split_y, #df.global.world.world_data.region_details [0].edges.split_y) results in "<coord2d[16][]: xxx...> 17" and print (df.global.world.world_data.region_details [0].edges.split_y[0], #df.global.world.world_data.region_details [0].edges.split_y[0]) in "<coord2d[16]: xxxx...> 16"

And, as mentioned, gui/gm-editor displays the correct (according to the XML) number of items in the arrays, so I'm fairly sure it's some kind of display issue.

I'm not familiar with the classes used in DFHack/dfhack@576174e so this is just a wild guess. "ptr" is used to generate "[]" for the matrix and nothing for the vector and is also used in the call to get length. Could it be that a different pointer has to be used if the elements of a vector/container are vectors/containers themselves because that reference causes data to be retrieved from one level too deep when there's a length available for the lower level?

quietust commented 5 years ago

The problem is that Lua is rendering 2D arrays with the inner length as the first number rather than the second number - for example, world_region_details,edges.split_x is logically of type coord2d[16][17], but Lua renders it as coord2d[17][] (and containing 16 entries of type coord2d[17]) instead of coord2d[][17].

PatrikLundell commented 5 years ago

So, if I understand quietust correctly, the "cid->lua_item_count()" call returns the wrong number because the C(++) code uses a Lua oriented function that thus uses some wacky backwards logic? What happens if the object is 3D, 4D, etc. (to allow us to address the general situation of higher numbers of dimensions)?

DFHack / df-structures

df.world_data.xml:world_region_details.edges oddities #323