DFHack / dfhack

Memory hacking library for Dwarf Fortress and a set of tools that use it
Other
1.86k stars 468 forks source link

Prospect pre-embark Blue Diamond overestimation #1772

Closed feelotraveller closed 3 years ago

feelotraveller commented 3 years ago

In my most recent world several/many embark locations give greatly overestimated predictions for blue diamonds using the prospect plugin pre-embark. Once embarked accurate results are reported.

I saw this using df47.05 with dfhack-0.47.05-alpha0-210131001-Linux-64bit-gcc-7.

Pre-embark savegame

The issue is present in a number of embarks across the world, although many/most embarks do not have the issue. A selection of locations to see/reproduce the issue in the above save game follows. The numbers predicted are reproduced for convenience.

If top left of map is (1,1) and embark rectangle is default 4x4 then (1,25) with embark rectangle bottom right corner (diamond_blue 894 diamond_fy 128) (65,41) with embark rectangle top left corner (diamond_blue 1287 diamond_fy 307) (14,35) with embark rectangle centred (diamond_blue 704 diamond_fy 128)

The save (world generation) was done with an entirely vanilla linux dwarf fortress (running up to date archlinux) with the above version of dfhack entirely unmodified (even using the example init). It was made, however, with a custom advanced worldgen setting which is attached. (Attempted to reproduce with a standard worldgen but my patience was tried - a couple of hours and I did not find any predicted single occurence diamonds at all.) [The specific seeds are a result of reproducing the save in vanilla df since I originally noticed the issue while using CLA graphics/tileset which did modify a few raws.]

world_gen.txt

This was originally reported on the bay12 forums here (and following) which is a better place to get hold of me if more information is needed. (The figures are different from those reported there since I was originally using a 3x3 embark rectangle.)

lethosor commented 3 years ago

Some research from PatrikLundell here: http://www.bay12forums.com/smf/index.php?topic=164123.msg8244278#msg8244278

PatrikLundell commented 3 years ago

Looking at the first embark indicated, using Eyeball Mk I, it seems the faint yellow diamond clusters have a size of 3-8, and all the FY Diamond clusters I found in the lower gabbro layer had a single blue diamond in them, while none of the ones in the higher gabbro layer did (and there shouldn't be any there either).

I don't understand what the scale factors are intended to represent, as CLUSTER_ONE probable means exactly one when such a cluster is present, and 6*7 doesn't make much sense to me for a small cluster either. However, Prospector's prediction for the faint yellow diamonds were reasonably good, and would have been even better for the first embark if it had been scaled back by the Kimberlite size/probability/abundance factor, while the prediction for the blue diamonds would be about a factor of 5 off even if scaled back by the Kimberlite and Faint Yellow Diamond factors.

Hm, the comment for VEIN indicates an estimate of 3 veins in a layer of size 48*48 (layer_size starting value), with the "size" just distributing whatever vein materials there are over those 3 veins, with a standard cluster assuming a single cluster that has the possible materials compete for that volume. If you apply the same logic to small clusters, they'd be expected to be of an average size of 6 and with 7 clusters in a layer. However, small clusters may reside directly in the layer, in a vein in a layer, or in a cluster in a layer. This may sum up to about 7, but different environments ought to have different weights. Maybe the differences are too small to be of interest for a rough estimate, though. When it comes to CLUSTER_ONE, all of the ones defined in the 0.47.05 raws are enclosed by CLUSTER_SMALL, with the diamond clusters being enclosed by Bauxite veins, while the ruby and sapphire ones being enclosed by CLUSTER_SMALL within Bauxite clusters. I think the factor 5 part for the CLUSTER_ONE should be removed or at least be used only if present in any of the other environments (i.e. directly in the layer, in a vein, or in a standard cluster).

It's also interesting to note that there is nothing in the logic that accounts for the world gen mineral scarcity factor, but the code assumes the same number of veins and clusters (both standard and small) are generated and the mineral scarcity parameter just affects the proportions of the ones you get. That doesn't really match my experience with very high scarcity which results in rather few gems (I can't really say anything about clusters and veins, as the low metal availability may well be achieved by a selectively low probability for metal ores specifically. However, I wouldn't be surprised if the sum of the percentage/size/probability factors at the various levels (vein, cluster, small cluster, at least) would affect the total number of these features. That's just conjecture, though (Edit: probably wrong, according to the Edit below).

Edit: I took a look at my last fortress (mineral scarcity 80000, a bit more than "very scarce"). When there are small clusters, the 7 per tile seems to hold up fairly well, but many tiles didn't have any, and the variety was low when there were any. There was a vein specified in one geo biome, but that was below the SMR cutoff. Thus, it seems mineral scarcity is handled by not generating anything in an increasing number of layers, at least for small clusters, and it would make sense if that held for veins and clusters as well.

feelotraveller commented 3 years ago

In my experience - generally from using mineral scarcity 100, i.e. max minerals/gems as in the linked save - the pre-embark prospect reports about minerals are generally accurate as to the type but underestimate (ballpark by 200% by which I mean the prediction is only about half of what occurs once embarked) the amount of metal. The outliers are: the iron-bearing ores which are sometimes overestimated but this can be explained by layers sometimes only being partially present on near the surface; and potentially more relevant to the issue reported minerals that can occur in small clusters (memorably platinum and bismuthinite). I've certainly had platinum present in the embark when it was not predicted pre-embark but on investigation there were no veins only small clusters. And a report about the presence or absence of bismuthinite I take with a grain of salt. Leaving the small clusters + terrain erosion aside I would suggest that the list of minerals in the pre-embark prospect report is accurate. However I would also suggest that there is a significant effect on the amount of these minerals that is influenced by mineral scarcity that is not accounted for by propsect.

Similarly for gems the types are generally accurate (far less certain since there are often about a hundred... at least the number of different types is generally accurate) but the amount is underestimated by roughly 200%. Although that is something I can live with and have become used to over the years it does suggest again that mineral scarcity may be a relevant factor that is currently missed by prospect.

I mention all this because it was brought up - the scale of the inaccuracy of the blue diamond reports is on another level. I did look at prospector.cpp before posting here but my ability to understand the code there is a complete fail.

(As a definitely off topic aside I would vaguely speculate if DF chooses one single occurence diamond type for each world? Just musing ; ...)

PatrikLundell commented 3 years ago

Valuable input, feelotraveller (although I don't know what it will lead to, but at least it's another data point).

Your experience of the presence of something that wasn't predicted can be explained in at least one way, though: the thing I've come to call "incursions", i.e. small bits of the (geo) biome of neighboring Mid Level Tiles that jut into the embark tiles. If those "incursions" belong to a different geo biome, they can contain minerals not present in any of the geo biomes the MLTs belong to. Prospector doesn't deal with incursions, and shouldn't, as the amounts of these extra minerals would be very limited and hard to predict (and after embark it just looks at what's present, rather than what the geo biomes would make it guess was present, and so accounts for incursions naturally).

It would be reasonably easy to write a script that goes through all geo biomes and look for single cluster diamonds to check if there's only a single type (the Biome Manipulator geo biome handling can serve as a starting point [and that logic owes much to Prospector]).

Edit: Looking a bit more at a 2*2 embark in the world (small enough to not require scrolling), I see one cluster in each MLT, 2-4 veins (with an average slightly below 3, I think), and about 10 clusters (or slightly above 10). This gives me the impression that there's something in DF that uses mineral scarcity to regulate the number of clusters in the range 7-10. I don't know if there might be something to control (large) cluster or vein size. However, the hard coded values are probably good enough.

The "5" factor for single clusters means 5 out of the 7 projected clusters would contain these, which is usually a gross exaggeration.

            case inclusion_type::CLUSTER_ONE:
                size = size * 1 * 5 / sums[type];
                break;

should probably be

            case inclusion_type::CLUSTER_ONE:
                if (layer->vein_nested_in[j] != -1) {
                    if (layer->vein_type [layer->vein_nested_in[j]] == inclusion_type::CLUSTER_SMALL)  { // Factor in how many clusters of the containing type you probably have
                        size = size * 1 * layer->vein_unk_38[layer->vein_nested_in[j]] * 7 / sums[layer->vein_type[layer->vein_nested_in[j]]] / sums[type];
                    } else { //  Doesn't happen with vanilla raws, so it might be wildly off
                        size = size * 1 * 5 / sums[type];
                    }
                } else {  // Doesn't happen with vanilla raws, so it might be wildly off
                    size = size * 1 * 5 / sums[type];
                }
                break;

Edit 2: I've investigated abundance across the different standard world gen scarcity settings, and what I've seen is: