CartoDB / cartodb-postgresql

PostgreSQL extension for CartoDB
BSD 3-Clause "New" or "Revised" License
111 stars 53 forks source link

Improve tile rendering time by clustering during import process #313

Open AbelVM opened 7 years ago

AbelVM commented 7 years ago

While working on https://github.com/CartoDB/QuadGrid, which tries to solve a problem quite related to the query used by CARTO to fetch the data to render tiles. Benchmarking different approaches, I've found that just reshuffling the data to be clustered base on the_geom_webmercator leads to a performance improvement typically between 15% and 25%

The code used is https://github.com/CartoDB/QuadGrid/blob/master/sql/CDB_Quadgrid_recursive_R2.sql#L11-L22

But for import process, could be simplified just as

CLUSTER mytable USING mytable_the_geom_webmercator_idx;

CLUSTER function takes like 9s for a 400K points dataset, and for QuadGrid it lowers the processing time for that specific dataset from 4:15 min to 48s (so, 9s clustering and 39s processing). That's ~ 85% improvement!

cc @javitonino @rochoa

AbelVM commented 7 years ago

cc @jgoizueta because of the mention in Slack :grin:

AbelVM commented 7 years ago

Some simple tests with a sample table (1497711 points ~ 1.5M)

Using simplified tile checking as described here:

select count(*) from (
    SELECT g.the_geom_webmercator, g.col1, g.id
    FROM (
      SELECT the_geom_webmercator, clientid as col1, cartodb_id as id
      FROM sample
    ) g
) wrapped
where CDB_XYZ_Extent(x, y, z) && the_geom_webmercator

Selecting tiles of interest in the map view of the dataset, and different approaches:

  1. as is
  2. clustered by the_geom index
  3. clustered by the_geom_webmercator index
  4. adding a geohash B-Tree index and clustering by it

img x y z count time 1 time 2 time 3 time 4 time 5
image 0 0 0 1497711 0.571s 0.6s 0.711s 0.582s
image 2 1 1 0 0.002s 0.002s 0.002s 0.003s
image 4 2 3 256689 0.608s 0.576s 0.522s 0.539s
image 3 3 3 896729 0.52s 0.617s 0.555s 0.548s
image 31 24 6 632158 0.515s 0.592s 0.543s 0.556s
image 62 48 7 284763 0.67s 0.566s 0.523s 0.533s
image 64 48 7 43591 0.335s 0.033s 0.025s 0.021s
image 4011 3088 13 15209 0.032s 0.015s 0.016s 0.012s

Some conclusions:

AbelVM commented 7 years ago

cc @oriolbx

AbelVM commented 7 years ago

SIDE NOTE: Clusters get degraded (fragmented) over time if the user performs write operations on the dataset, so the databases should be clustered as a maintenance task (ala VACUUM)

AbelVM commented 7 years ago

cc @inigomedina for awareness