GreenInfo-Network / seattle-building-dashboard

Energy benchmarking for Seattle
https://greeninfo-network.github.io/seattle-building-dashboard/
ISC License
1 stars 0 forks source link

Histogram colors do not match map colors, especially 0 #77

Closed tomay closed 2 weeks ago

tomay commented 1 year ago

The histograms that accompany the map layers on the left panel also act as a kind of legend.

For some reason, the colors represented do not always match the full range of colors on the map.

Here is GHG Intensity for example.

Histogram:

image

Map:

image

The blue color in particular, is not represented, although that is the color for total_ghg_emissions_intensity = 0

This is not a new issue, and has probably been present all along. In fact there is a cryptic explanation by way of a comment in the building_bucket_calculator.js colorGradient prototype function:

// This is how we calculate the colors for the dots on the map. // But they don't line up with the colors in the histogram. Why not?

// The domain is "fieldValues", which is an unordered list of all of the building value for this field. // But the domain for the histogram color ramp is just linear max and min for the given field. // And more importantly, it needs to be the max and min that's set according to the config file. That's how the colors get determined in the histogram

The config file seattle.json does set the proper min and max.

Here's total_ghg_emissions_intensity:

        {
            "title": "Seattle GHG Intensity",
            "field_name": "total_ghg_emissions_intensity",
            "display_type": "range",
            "range_slice_count": 18,
            "section": "Greenhouse Gas Emissions",
            "color_range": ["#1f5dbe","#599b67","#ffd552","#da863f","#ab2328"],
            "hatch_null_css": true,
            "unit": "Kilograms CO₂e/ft²",
            "formatter": "fixed-1",
            "filter_range": {"min" : 0, "max" : 10},
            "

So I don't know if this is an oversight, or what exactly is going on

┆Issue is synchronized with this Asana task

tomay commented 1 year ago

The way this works is incredibly complicated and circular

  1. D3 is used to set up a quantile scale, based on the color range specified in seattle.json and the values for the given variable. In the case of total_ghg_emissions_intensity, that amounts to a domain of ~3600 values, and a range comprised of the original colors spread across 18 derived colors (unclear why not let d3 derive the colors in the scale function itself? Most likely to give a bigger spread to the map, which is hard wired CartoCSS ranges, not D3 functions, see below).

  2. But then, when the actual color is applied to a bar on the historgram, the xpos of the bar (a pixel value?) is fed into yet another D3 scale (linear), before getting passed to the original quantile colorScale.

  3. This same quantile scale is used indirectly to make the CartoCSS. In the end, to me, this seems much more akin to a threshold scale (directly specify the cut values that separate the classes) than the original quantile scale (intervals of similar sizes), as you can see in the resulting CartoCSS statements and the code. Each "stop" is derived from a call to d3.scale invertExtent on each color in the range - which is the equivalent of a threshold

  4. The end result is that a value of 0 gets the expected color value for 0 on the map, but in the original scale function (xPos passed to a linear scale, passed to a quantile scale), a value of 0 is assigned to the color class #b5bb5b, which is the 7th position of 18 in the defined range, even though the actual data value is 0.

(18) ['#1f5dbe', '#306fa5', '#40808c', '#519273', '#6ba165', '#90ae60', '#b5bb5b', '#dac857', '#ffd552', '#f7c34e', '#efb24a', '#e6a045', '#de8f41', '#d57b3c', '#ca6537', '#c04f32', '#b5392d', '#ab2328']

Questions and observations

tomay commented 1 year ago

After a bit more thought, I think this is a wontfix

Each bar in the histogram represents a large range, not a discrete value. That's what the linear scale on the xpos is trying to model

The first bar in Total Seattle GHG Emissions, for example, isn't just representing 0, it's representing a range of values, something like 0 - 20 (there are no scales printed, so it's not easy to say exactly):

image

Buildings, on the other hand, are painted according to a specific value for that one building.

There is no way to "fix" this. The histogram is not a legend, it is a broad brush stroke showing the range and distribution of all the values

tomay commented 2 months ago

This came up again in today's call. I think my notes above are still valid. There is however an easy way to get a better visual idea of what is going on, and that is to increase the range_slice_count in seattle.json for any single map layer.

Here's what it looks like if we increase range_slice_count from 18 to 100 for "Total Seattle GHG Emissions" (total_ghg_emissions):

image

With that, it's much more obvious that

  1. There are a lot of buildings with very low values (blue and green)
  2. These are not visible when there are fewer bins, because that color variation gets collapsed to a single greenish yellow

Not sure if that's a solution to the issue, but it can at least serve as additional explanation

tomay commented 2 months ago

Seems like another way to mitigate this issue would be to use fewer buckets again, eg 18, but then set the max to a lower number, like 100, so the upper bucket gets bigger and the smaller buckets are better distributed across the color scheme

Unfortunately, although this does bring the values further down into the blue/green range, the sheer number of buildings that get lumped in with 100+ makes it more difficult to see any kind of distribution pattern

Here's 18 buckets with a max of 100:

image

Here's 18 buckets with a max of 200:

image

Finally, 18 buckets with 300:

image

That last one is maybe the best. At least there's a decent indication of values that are not strictly yellow-red.

tomay commented 2 months ago

Belatedly realized the above snaps had the building type restricted to high rise multifamily.

300 is significantly better with all building types showing:

image

tomay commented 2 months ago

Current plan:

tomay commented 2 weeks ago

I believe this is all set, total_ghg_emissions has a filter range of 0 to 300