Closed dgaubert closed 5 years ago
Improve efficiency of query samples (esp. for FDW's)
Do you have any benchmarks? What's the expected impact in different scenarios (small vs big tables, compact vs sparse ids)?
While performing benchmarks, I found this error:
{
"errors": [
"TABLESAMPLE clause can only be applied to tables and materialized views"
],
"errors_with_context": [
{
"type": "unknown",
"message": "TABLESAMPLE clause can only be applied to tables and materialized views"
}
]
}
So TABLESAMPLE BERNOULLI
doesn't work with foreign tables. Going to hijack the code to avoid using it just for testing purposes.
{
"version": "1.8.0",
"layers": [
{
"type": "mapnik",
"options": {
"sql": "select * from {table|foreign_table}",
"cartocss_version": "2.3.0",
"cartocss": "#layer{marker-placement:point;marker-allow-overlap:true;marker-line-opacity:0.2;marker-line-width:0.5;marker-opacity:1;marker-width:5;marker-fill:red;}",
"metadata": {
"sample": {
"num_rows": 1000
}
}
}
}
]
}
Foreign table (800K points), map instantiation:
$ wrk -s ow.lua http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d -d 30 -t 1 -c 1
Running 30s test @ http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.00us 0.00us 0.00us -nan%
Req/Sec 0.00 0.00 0.00 100.00%
6 requests in 30.08s, 1.90MB read
Socket errors: connect 0, read 0, write 0, timeout 6
Requests/sec: 0.20
Transfer/sec: 64.76KB
Note: I used 1 thread / 1 connection, otherwise the benchmark doesn't get any response.
Local table (800K points), map instantiation:
$ wrk -s ow_local.lua http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d -d 30 -t 1 -c 1
Running 30s test @ http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 100.12ms 13.09ms 194.29ms 85.00%
Req/Sec 10.09 2.33 20.00 91.96%
300 requests in 30.04s, 95.15MB read
Requests/sec: 9.99
Transfer/sec: 3.17MB
$ wrk -s ow_local.lua http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d -d 30
Running 30s test @ http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 668.43ms 181.12ms 1.51s 81.03%
Req/Sec 10.56 8.42 40.00 68.50%
445 requests in 30.10s, 141.13MB read
Requests/sec: 14.78
Transfer/sec: 4.69MB
$ wrk -s ow.lua http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d -d 30 -t 1 -c 1
Running 30s test @ http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 70.90ms 18.08ms 284.70ms 91.86%
Req/Sec 14.28 5.10 20.00 52.38%
426 requests in 30.03s, 135.15MB read
Requests/sec: 14.18
Transfer/sec: 4.50MB
$ wrk -s ow.lua http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d -d 30
Running 30s test @ http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 509.24ms 200.36ms 1.98s 87.22%
Req/Sec 13.35 9.12 40.00 70.61%
602 requests in 30.09s, 190.99MB read
Requests/sec: 20.00
Transfer/sec: 6.35MB
$ wrk -s ow_local.lua http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d -d 30 -t 1 -c 1
Running 30s test @ http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 56.06ms 31.41ms 271.20ms 89.84%
Req/Sec 19.37 7.09 30.00 53.17%
567 requests in 30.05s, 179.93MB read
Requests/sec: 18.87
Transfer/sec: 5.99MB
$ wrk -s ow_local.lua http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d -d 30
Running 30s test @ http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 371.73ms 103.41ms 1.02s 80.42%
Req/Sec 14.39 7.03 40.00 83.97%
806 requests in 30.10s, 255.79MB read
Requests/sec: 26.77
Transfer/sec: 8.50MB
cc/ @Algunenano
I was wondering how this branch behaves with datasets with gaps in the primary key sequence (cartodb_id
). I performed some tests:
{
"version": "1.8.0",
"layers": [
{
"type": "mapnik",
"options": {
"sql": "select * from {ow_odd|ow_gap}",
"cartocss_version": "2.3.0",
"cartocss": "#layer{marker-placement:point;marker-allow-overlap:true;marker-line-opacity:0.2;marker-line-width:0.5;marker-opacity:1;marker-width:5;marker-fill:red;}",
"metadata": {
"sample": {
"num_rows": 1000
}
}
}
}
]
}
cartodb_id
sequence (50% sequence is used)$ curl -X POST -H 'Host: cdb.localhost.lan' -H 'Content-Type: application/json' -d @ow_local_odd.json "http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d" > ow_local_odd_layergroup.json
$ node
> const l = require('./ow_local_gap_layergroup.json')
undefined
> l.metadata.layers[0].meta.stats.sample.length
496
cartodb_id
sequence (50% sequence is used)$ curl -X POST -H 'Host: cdb.localhost.lan' -H 'Content-Type: application/json' -d @ow_local_gap.json "http://0.0.0.0:8181/api/v1/map/?api_key=5d56****473d" > ow_local_gap_layergroup.json
ubuntu@cdb-dev:/vagrant/utils/live-connectors/sample$ node
> const l = require('./ow_local_gap_layergroup.json')
undefined
> l.metadata.layers[0].meta.stats.sample.length
539
As expected, it's able to get the ~50% of the requested number of rows. Such kind of scenarios are unusual in our infrastructure but we must check if returning fewer data to Carto-VL is enough to work with.
Took a look to CartoVL and it doesn't seem to check anything with the sample, it just uses it to create GlobalPercentile
, GlobalQuantiles
, GlobalStandardDev
, and GlobalHistogram
.
Fixes #1118
Note: the new sample implementation may be inefficient when the numeric ID column (cartodb_id) has many (or moderately many) gaps.