conveyal / r5

Developed to power Conveyal's web-based interface for scenario planning and land-use/transport accessibility analysis, R5 is our routing engine for multimodal (transit/bike/walk/car) networks with a particular focus on public transit
https://conveyal.com/learn
MIT License
280 stars 73 forks source link

Sparse opportunity quantization in high resolution grids #821

Open abyrd opened 2 years ago

abyrd commented 2 years ago

This is very related to #566 Fractional opportunity quantization error, which was fixed in #689. The simple error diffusion we settled on works well enough for moderately dense opportunities, where the size of the quantization to a whole number is relatively small compared to the true value. However, it produces noticeable artifacts at very low per-cell opportunity counts which can arise as zoom levels increase. Opportunities per cell decrease as the square of the zoom level, so an increase from 9 to 12 will decrease opportunity counts by a factor of 9. At counts below 1/2 or 1/3 per cell, gaps of zero density will appear between vertical stripes.

At a high zoom level and very low opportunity density, a polygon might be 20 cells across, with 0.1 opportunities per cell. Quantization with directional error diffusion yields a vertical linear artifact halfway across the polygon, a shadow of the polygon's left edge.

Such artifacts are always present, they are just more visible in this situation as they stand out against the ambient rounded opportunity count of zero. It is debatable whether they add significant bias to results because they are likely to arise in inadequately detailed zones in the peripheral areas of an analysis. The case in the image above is a realistic one: a denser town surrounded by small towns and isolated houses. When there are fewer opportunities than grid cells and only whole opportunities are represented, opportunities must be placed at arbitrary cell locations. We could put a lot of work into confronting this problem but it might be mostly cosmetic: is it ever meaningful to perform such a high resolution analysis on such low density data? Perhaps it's a good thing that the insufficiency of the data at a particular level of geographic precision is visible to the user.

One solution is just to call out this situation in the documentation as artificial precision, and say the analysis must be conducted in a different way. Such low density opportunities must either be ignored or removed as inconsequential, or represented as points in their actual locations for such high resolution analysis to even be meaningful.

Another solution that we have considered several times to just store/load fractional densities without rounding, especially considering that the in-memory data structure is already double precision floating point. There are a few problems with this:

So there is some research to be done on whether storing floating point data to handle these few special cases makes all other files significantly larger (with an effect on response times and bandwidth requirements).

A common way of handling quantization noise is dithering, but this has the problem of only probabilistically matching the total number of opportunities which is quite a compromise to handle an edge case. We'd need a custom quantity-preserving dithering algorithm.

We could also apply the quantization during rasterization of the individual polygons rather than while saving the grid as a whole. This would give more options for an alternate code path where the average density was less than 1 or 2.

abyrd commented 2 years ago

A decent ad hoc solution is to treat values that will round to zero differently, and set them to 1 or 0 probabilistically. This trades off exact conservation of the total count against spatial bias in only very low density cases. This will still produce vertical banding for values in the range [0.5, 1), but with vertical bands of zero-density gaps rather than vertical bands of opportunities. Treating all values less than one probabilistically removes (visible) banding entirely.

image image

The non-deterministic non-conservation of opportunity counts will cause existing tests to fail randomly though, so this is not ready for use.