Add a general tool to generate diffs between two datasets (geodataframes) based on a common key.
For downloaded data, the key is taken from the layer config's primary_key value, but note that many datasets will not have a simple integer/text/uuid/ primary key.
So, in order of priority, run the comparison based on:
provided primary key
when pk is not unique, issue a warning... but continue and add geometry to the pk
when pk is not provided, use geometry as the pk
if pk is still not unique when geometry is included, bail (as complete duplicates will be filtered by #6, duplicates here would indicate weird overlaps in the data that should be investigated)
The reason for using geom as a pk when comparing datasets is to detect modifications to attributes. Of course, this means that modifications to geoms where no other pk is used will result in additions/deletions (we can't detect modifications to geoms if there is no pk).
Add a general tool to generate diffs between two datasets (geodataframes) based on a common key.
For downloaded data, the key is taken from the layer config's
primary_key
value, but note that many datasets will not have a simple integer/text/uuid/ primary key.So, in order of priority, run the comparison based on:
When using geoms as the pk, try reducing precision slightly via https://geopandas.org/en/latest/docs/reference/api/geopandas.GeoSeries.set_precision.html#geopandas-geoseries-set-precision
The reason for using geom as a pk when comparing datasets is to detect modifications to attributes. Of course, this means that modifications to geoms where no other pk is used will result in additions/deletions (we can't detect modifications to geoms if there is no pk).