Sparse opportunity quantization in high resolution grids

This is very related to #566 Fractional opportunity quantization error, which was fixed in #689. The simple error diffusion we settled on works well enough for moderately dense opportunities, where the size of the quantization to a whole number is relatively small compared to the true value. However, it produces noticeable artifacts at very low per-cell opportunity counts which can arise as zoom levels increase. Opportunities per cell decrease as the square of the zoom level, so an increase from 9 to 12 will decrease opportunity counts by a factor of 9. At counts below 1/2 or 1/3 per cell, gaps of zero density will appear between vertical stripes.

At a high zoom level and very low opportunity density, a polygon might be 20 cells across, with 0.1 opportunities per cell. Quantization with directional error diffusion yields a vertical linear artifact halfway across the polygon, a shadow of the polygon's left edge.

Such artifacts are always present, they are just more visible in this situation as they stand out against the ambient rounded opportunity count of zero. It is debatable whether they add significant bias to results because they are likely to arise in inadequately detailed zones in the peripheral areas of an analysis. The case in the image above is a realistic one: a denser town surrounded by small towns and isolated houses. When there are fewer opportunities than grid cells and only whole opportunities are represented, opportunities must be placed at arbitrary cell locations. We could put a lot of work into confronting this problem but it might be mostly cosmetic: is it ever meaningful to perform such a high resolution analysis on such low density data? Perhaps it's a good thing that the insufficiency of the data at a particular level of geographic precision is visible to the user.

One solution is just to call out this situation in the documentation as artificial precision, and say the analysis must be conducted in a different way. Such low density opportunities must either be ignored or removed as inconsequential, or represented as points in their actual locations for such high resolution analysis to even be meaningful.

Another solution that we have considered several times to just store/load fractional densities without rounding, especially considering that the in-memory data structure is already double precision floating point. There are a few problems with this:

Old grids are all integers, and we would either need convert them or introduce a header in the new files to indicate data type.
Integers compress more efficiently.
Existing grids are delta-coded with the intent to make compression more efficient. Equivalent predictive methods on floating point data are more complicated.
Any new grid format would also need to be readable and displayable in the UI.

So there is some research to be done on whether storing floating point data to handle these few special cases makes all other files significantly larger (with an effect on response times and bandwidth requirements).

A common way of handling quantization noise is dithering, but this has the problem of only probabilistically matching the total number of opportunities which is quite a compromise to handle an edge case. We'd need a custom quantity-preserving dithering algorithm.

We could also apply the quantization during rasterization of the individual polygons rather than while saving the grid as a whole. This would give more options for an alternate code path where the average density was less than 1 or 2.

conveyal / r5

Sparse opportunity quantization in high resolution grids #821