felt / tippecanoe

Build vector tilesets from large collections of GeoJSON features.
BSD 2-Clause "Simplified" License
875 stars 76 forks source link

Stabilize feature order in overzoom #210

Closed e-n-f closed 5 months ago

e-n-f commented 5 months ago

Previously, the --preserve-feature-order sort in tippecanoe-overzoom could accidentally get the features out of order by using an unstable sort where all rows have the same sort key. Now it uses std::stable_sort to ensure order stability.

The change to the Makefile is to remove the Debug test that was added in https://github.com/felt/tippecanoe/pull/202 and will be fixed in https://github.com/felt/tippecanoe/pull/209.

e-n-f commented 5 months ago

This particular bug only really needs it for the one sort:

        if (preserve_input_order) {
            std::stable_sort(outlayer.features.begin(), outlayer.features.end(), preservecmp);
        }   

The one in dirtiles.cpp doesn't really matter, since tiles shouldn't be duplicated in directories.

In geometry.cpp I don't think the sorty one matters, but the candidates one should be stable because there might be multiple candidates with the same distance.

The one in main.cpp should be stable because there might be multiple sets of features with the same gap.

The one in mvt.cpp should be stable because there might be duplicate attribute values.

The ones in pmtiles_file.cpp shouldn't matter because there shouldn't be duplicate tile IDs.

I don't think the one in serial.cpp should matter.

In shared_borders.cpp, I don't think the edgecmp_ring one should matter, but the order one should be stable because there might be multiple features with the same interfeature gap.

The one in tile-join.cpp should be stable because there can be multiple source tiles with the same ID (from different source tilesets).

In tile.cpp: The cluster one should be stable, because there can be multiple features with the same drop sequence (because their locations are identical). The one in choose_minextent shouldn't matter, because these are just numbers. The sort of the drop_sequences themselves shouldn't matter, because these are just numbers. The feature_sequences one shouldn't matter because sequence numbers from input are unique. The coalindexcmp one shouldn't matter because the features are distinguished by their ID, attributes, index, and geometry, which should be enough to make them distinct. The preservecmp one shouldn't matter because the input sequence numbers should be distinct. The ordercmp one should be stable because there might be multiple features with the same sort keys.