Closed javitonino closed 7 years ago
Tested on
Tested on a local DO Analysis using a buffer on a rivers layer (thank you @AbelVM) in Spain and Europe.
Acceptance steps?
Adding a simplification task:
class RawGeometry(TempTableTask):
...
class SimplifiedRawGeometry(TempTableTask):
def requires(self):
return RawGeometry()
def run(self):
yield SimplifyGeometriesMapshaper(schema=self.input()._schema,
table_input=self.input()._tablename,
table_output=self.output()._tablename,
geomfield='wkb_geometry')
# The SimplifyGeometriesPostGIS task can be used as well
class Geometry(TableTask):
def requires(self):
return {
'data': SimplifiedRawGeometry()
}
...
Simplify an existing table (command line):
make -- run simplification SimplifyGeometriesMapshaper --schema es.ine --table-input rawgeometry__99914b932b
Note that SimplifyGeometriesPostGIS
can also be used to simplify and that table_out
and other default parameters can be explicitly specified.
The expected result is a simplified version of the input table in the same schema that the original one (not overwriting it), having the same record count, without invalid geometries and with less points (~50% by default). Also, if we are simplifying using Mapshaper, the output features ar expected to maintain topology.
I've found one problem using the MapShaper task. OGR2OGR is creating invalid geometries in the import of geometries not in the simplification process due to the use of Multipolygon
, from the ogr2ogr documentation:
Some forced geometry conversions may result in invalid geometries, for example when forcing conversion of multi-part multipolygons with -nlt POLYGON, the resulting polygon will break the Simple Features rules.
Here is the example:
gis=# select count(*) from observatory.obs_43157a9633ea9b512d00a8416d5faf1d0c8452fd where ST_IsValid(the_geom) is false;
NOTICE: Ring Self-intersection at or near point -4.1010282450000082 42.791497261000011
count
-------
1
(1 row)
So, I think we should execute a ST_CollectionExtract(ST_MakeValid(the_geom), 3)
after the ogr2ogr
operation ended.
I'll take a look at it
I've added simplification tasks for the Spain and Canada Census data:
WITHOUT SIMPLIFICATION
max | min | avg
-------+-----+----------------------
15067 | 4 | 186.3554505005561735
(1 row)
WITH SIMPLIFICATION
max | min | avg
------+-----+----------------------
4047 | 4 | 100.0852335928809789
(1 row)
There is no visual difference between the two kind of geometries
WITHOUT SIMPLIFICATION
max | min | avg
------+-----+----------------------
1452 | 13 | 650.3157894736842105
WITH SIMPLIFICATION
max | min | avg
-----+-----+----------------------
834 | 6 | 325.4736842105263158
Here is an image with the difference betwwen simplified and not-simplified geometries for Balears
Those are the expected results.
Based on this explanation, the correlation between a "worse" simplification (removing too many significative points) using Mapshaper and removing too few points using PostGIS simplification using the default parameter values is direct and expected.
Anyway, we need to find the best simplification parameter for each table to improve performance without losing significative points of the geometry.
Mapshaper also takes a resolution parameter (instead of the retain %). Maybe that is more similar to postgres parametrization?
I have tested several input parameters and alternatives and I think that using the percentage of retained points is a good approach.
The results are not bad at all, I just wanted to point out (and document) that the correlation that we see in the tests is the result that we expected and reminds us that we can't rely on the default parameters.
Yeah, results look good. But looking at Mario's number, it seems like Canada could use a more aggressive simplification than Spain (which is already pretty simplified). In that sense, maybe both geometries are OK to be simplified with a parameter of 100m, but for the % approach we would probably need to set separate parameters for each. Not a big deal, since we are probably going to be tweaking those numbers manually anyway, so this is probably a moot point.
Yes, as @AbelVM pointed out with his articles, the simplification factor will depend a lot on the complexity of the geometries and the resolution of the geography (Canada for example has a very complex shore line that we may need to maintain in more detailed resolutions).
Mapshaper has an interval
parameter that allows us to specify the simplification factor in distance units, but as the PostGIS simplification will be marginally used (only for larger and very memory consuming datasets) and the retain percentage seemed more natural and convenient to me, I didn't used it.
Maybe I'm wrong and it'd be better to have a consistent parameter in both simplifications... I'll do a quick test now that I'm more unbiased thanks to your opinions :)
Typically, simplification means reducing the number of vertexes keeping the topology and the same values for RBF and shape factor. Which is not a trivial task if you want to be strict.
In the 2nd article of my previous comment, they explain that the geometrical info is not that relevant if the values within the simplified polygon remain ~ the same. So, simplification of a polygon in DO should take into account (the distribution of) the data that would be exposed through that geometry. Usually, population distribution should do the trick. GPW v4 (world population in a 250m resolution grid) dataset might help to identify the places where higher simplification could be applied
I've done a quick test and you were right. Having the same meaning for both factors (Mapshaper and PostGIS simplifications) is more intuitive although the value is different due to the different implementations. I was very biased and while testing I realized that it wasn't very natural having two parameters with almost opposite meanings.
I'm changing it :+1:
From https://github.com/CartoDB/observatory-extension/issues/304
Summary of what we have learned:
Next steps: