Closed ethervoid closed 7 years ago
I did some experiments with this, adding either ST_IsValid
or ST_MakeValid
to the GetData
function. Some findings over the dataset in https://github.com/CartoDB/bigmetadata/issues/222 (5k polygons, 60 invalid ones, average 40 points/polygon, takes around 6 seconds to process):
WHERE ST_IsValid(geom)
adds around 400ms of runtime. It just returns null for invalid geometries.ST_MakeValid(geom)
is problematic. If the polygon is invalid because it has too few points, it will generate a line, which is pretty useless in the rest of the function (and triggers some errors). Doing some ST_Collection management and filtering geometries, it could work, but it is a mess. It adds about a second of runtime.ST_IsValid
check to camshaft, so the credits are not counted. But adding some check to GetData
is a must (for API users, and because it's nice to enforce function requirements in the same function). On the other hand, adding the check twice is redundant and a bit slow (~10% of time spent on geometry validation).I'm not too sure on how to go with this one. If we are optimizing for speed, we could keep the extension code and only add this check to the camshaft node. Maybe add some code in camshaft so when an exception is thrown by GetData, this error is bubbled up to the user, instead of silently setting the entire column to NULL.
Closing this in favor of https://github.com/CartoDB/bigmetadata/issues/233 so we can better track it
Related #290, #844
If we pass invalid or NULL geometries in the geomvals this could lead strange behavior like think that a points dataset is made of polygons