Closed jgoizueta closed 6 years ago
Whether it's a good idea to implement the metadata request in the map instantiation endpoint(s), or add a new specific endpoint.
Then we can start with the implementatin; other details can be decided later/after some experimentation:
column='value'
is used in such a columns it may not work (in the client) if the particular value is not in the metadata stats (because string values are mapped to floats for execution in WebGL). For simple cases the filter will be executed in the server, but not in general (e.g. if combined with AND with an expression not executable in the db)Experimental map instantiation metadata is now available in #952
But there's a problem with returning metadata at instantiation and how we use it now at the client; Carto VL is using metadata for these two details of the instantiation:
Possible solutions
Note that A and B are modifications of the Maps API. C and D involve only Carto VL changes
Since MVT does not support date/time types, it would be nice to be able to cast those types into something (text strings or epoch numbers) that can be transferred in the MVT.
@Algunenano has mentioned that Mapnik doesn't currently support time/date for styling, and implementing the automatic casting at the plugin level would not only make those types available in MVTs, but would allow to use them to style raster tiles.
I would like a flavor of D.
Regarding the timestamp management, I think it would be best if Maps API automatically cast it to a usable form. Ideally, it would be compressed in some way (no strings).
Regarding filters, I would move the conditional logic to Maps API. I wouldn't apply filters every time since we saw this is overkill for most maps (small and medium datasets) since they won't be able to instantly refilter with just client-side logic, and the MVT sizes would be small even without filtering taken into account.
Basically, I think Maps API should return an instantiated map and a flag saying if filters were applied or not (similar to aggregation). When the filters change in the client, CARTO VL should re-instantiate if the flag indicates that Maps API filtered in the last instantiation.
I reopen this to be closed after deployment.
Closing this.
CartoVL is using the SQL API to obtain metadata about the dataset/query used, including a sample of the data. We must avoid using the SQL API from CartoVL and obtain this data from the Maps API to avoid requiring both Maps API and SQL API authorization keys.
At this point we want to implement quickly what cartovl needs, and eventually refactor it into something more reasonable and efficient.
We could just implement now an ad-hoc endpoint performing the exact same queries/data processing we do now in CartoVL.
Or, if we consider the effort will be similar (I'm inclined to think so), to implement it as optional metadata returned by the map instantiation. This will offer opportunities for optimization (now or later) since some of the metadata may be already computed/used, we save requests and could also save queries by combining requested metadata with e.g. the needs of
getAggregationMetadata
).We could add a parameter to request metadata, e.g.
"metadata": { sample: true, rowCount: true, columnStats: true }
and the data could be added to the existingmetadata.stats
in the response. This could be nicely encapsulated in thesetLayerStats
function of the Windshaft-cartodb maps controller.Details
What CartoVL does now
All the metadata CartoVL requests now is actually needed. Tne
windshaft
module encapsulates all SQL API requests throughgetSQL
, which is used by the next functions which are called to prepare the metadata (in_getMetadata
):getSample(conf, sampling)
SELECT * TABLESAMPLE BERNOULLI / random() < x
metadata.sample
getFeatureCount(query, conf)
SELECT COUNT(*) FROM ${query}
metadata.featureCount
getColumnTypes(query, conf)
select * from ${query} limit 0
=> .fields (name: {type})metadata.columns
metadata.categoryIDs, categoryIDsToName
type
, columns are categorized as numeric, date or category (all strings).getNumericTypes(names, query, conf)
(executed for numeric columns)SELECT min($name), max($name), ...FROM ${query}
metadata.columns
(name, type, min, max, avg, sum)COUNT(*)
computed but not used (?)getDatesTypes(names, query, conf)
(executed for date columns)SELECT min($name), max($name) FROM ${query}
metadata.columns
(name, type, min, max)getCategoryTypes(names, query, conf)
(executed for category columns)SELECT COUNT(*), ${name} FROM ${query} GROUP BY ${name}
metadata.categoryIDs, categoryIDsToName
metadata.columns
(name, type, categoryNames, categoryCounts)getGeometryType(query, conf)
SELECT ST_GeometryType(the_geom) FROM ${query}
windshaft.geomType
In addition to the metadata (
categoryIDs
,columns
,featureCount
,sample
).geomType
is kept in windshaft object, used to deterimine if aggregation is possible and to decode MVT.What the tiler already does at instantiation
The module
query-utils
of Windshaft-cartodb contains some functions to fetch metadata about the query. In particular a functiongetAggregationMetadata
used to determined if aggregation should be applied which returns:CDB_EstimateRowCount
(asmeta.stats.estimatedFeatureCount
)SELECT ST_GeometryType(${geom}) FROM (${query}) WHERE ${geom} IS NOT NULL LIMIT 1
When default aggregation is used (sampling) the columns of the original query are obtained with a LIMIT 0 query (in
getLayerAggregationColumns
) to set thecolumns
layer parameter.The map instantiation response contains
layergroup.metadata.layers[0].meta.stats.estimatedFeatureCount
(which could be extended for additional metadata). It also containslayergroup.metadata.layers[0].meta.stats.aggregation
which could be used for aggregated stats at some point.