Review time overhead in observatory functions

iriberri commented 8 years ago

Getting a single measure for a point locally (no DS layer, no SQL API) takes more than 200ms. We should check where the overhead is being caused (probably in the metadata checks/decisions?) because it should be quick being a join within a table which has geometric indexes.

obs_2016_06_15_d428377abd=# SELECT cdb_observatory.OBS_GetMeasure('SRID=4326;POINT(-122.4507 45.5430)'::geometry, 'us.census.acs.B23025001');
NOTICE:  geom_table_name obs_c6fb99c47d61289fbb8e561ff7773799d3fcc308, data_table_info {"{\"colname\":\"pop_16_over\",\"tablename\":\"obs_1a098da56badf5f32e336002b0a81708c40d29cd\",\"aggregate\":\"sum\",\"name\":\"Population age 16 and over\",\"type\":\"Numeric\",\"description\":\"The number of people in each geography who are age 16 or over.\",\"boundary_id\":\"us.census.tiger.block_group\"}"}
CONTEXT:  SQL statement "
    SELECT ARRAY_AGG(val) FROM (SELECT (cdb_observatory._OBS_Get($1, $2, $3, $4)->>'value')::NUMERIC val) b
  "
PL/pgSQL function cdb_observatory.obs_getmeasure(geometry,text,text,text,text) line 23 at EXECUTE
NOTICE:  geoid is 410510102002, geometry table is obs_c6fb99c47d61289fbb8e561ff7773799d3fcc308 
CONTEXT:  PL/pgSQL function cdb_observatory._obs_get(geometry,text[],text,text) line 37 at assignment
SQL statement "
    SELECT ARRAY_AGG(val) FROM (SELECT (cdb_observatory._OBS_Get($1, $2, $3, $4)->>'value')::NUMERIC val) b
  "
PL/pgSQL function cdb_observatory.obs_getmeasure(geometry,text,text,text,text) line 23 at EXECUTE
    obs_getmeasure    
----------------------
 921.8174085813772134
(1 row)

Time: 275.178 ms

cc @javisantana @rafatower

javisantana commented 8 years ago

to give some context, 275ms means like 3QPS, a service like this should be in the range of 100-200QPS. I'm pretty sure we could do some improvements in queries and database configuration but first we need to understand what's going on behind that function

iriberri commented 8 years ago

I think there might be some indexes missing, for example in this table I'd expect the_geom to have it:

obs_2016_06_15_d428377abd=# \d+ observatory.obs_c6fb99c47d61289fbb8e561ff7773799d3fcc308
    Table "observatory.obs_c6fb99c47d61289fbb8e561ff7773799d3fcc308"
  Column  |   Type   | Modifiers | Storage  | Stats target | Description 
----------+----------+-----------+----------+--------------+-------------
 geoid    | text     |           | extended |              | 
 the_geom | geometry |           | main     |              | 
 aland    | numeric  |           | main     |              | 
 awater   | numeric  |           | main     |              | 
Indexes:
    "obs_c6fb99c47d61289fbb8e561ff7773799d3fcc308_geoid_idx" btree (geoid)

talos commented 8 years ago

Good catch on the missing geom indices -- I had originally been running under the assumption that CartoDBfication would add those, but the dump doesn't run through that process. This means we have to add them manually to the dump. This will make ingesting the dump much slower.

Checklist for me:

[ ] Add the_geom GIST index to dump for all data tables.
[ ] Optimize number of calls to metadata tables, which involve many joins, via use of obs_meta table, which had most of what we need pre-joined. We're probably at ~6 or 7 SELECT to metadata tables, each with some JOINs. This could probably be brought down to 1 or 2 SELECTs, with no or very few JOINs.
[ ] Remove NOTICEs, which I believe can be a source of slowdown

rafatower commented 8 years ago

I'd just replace NOTICE's with DEBUG's as they can save us eventually.

iriberri commented 8 years ago

I was doing some tests in my own instance with the 6K rows table and my table-level functions:

Without indexes: 16 minutes
With a GIST index on the_geom for the observatory table: 4 seconds

... wow

javisantana commented 8 years ago

nice catch

iriberri commented 8 years ago

With the added indexes we're now at 70QPS (93133.434 ms for 6429 rows).

talos commented 8 years ago

Did extensive tests documented here: https://docs.google.com/spreadsheets/d/1cmiB0GfCwN4rLjEB51QhBr2Amlj6YX1s2BixtSFZfb8/edit#gid=0

In summary, in production, we're looking at:

method	QPS low	QPS high
`OBS_GetMeasureByID(block_group)`	31	35.5
`OBS_GetMeasure(Point)`	26	33
`OBS_GetMeasure(Polygon, no interpolation)`	26	32
`OBS_GetMeasure(Polygon, interpolated)`	17	19
`OBS_GetCategory(Point)`	29	35.7
`OBS_GetCategory(Polygon, no interpolation)`	27	33.5
`OBS_GetCategory(Polygon, interpolated)`	25	26.5

I compared these numbers to the performance before the last release, and they're either dramatically improved or mostly similar. I'm not sure whether the 70QPS was through a non-production pathway.

Those numbers are also using code in observatory-extension that's about as performant as I see us getting (one metadata request, then one request for the data itself, efficient algorithms for calculating interpolated areas when we need to).

talos commented 8 years ago

@javisantana @iriberri @rafatower Can we close this? I think we've thoroughly reviewed the overhead, and at this stage any further steps to try to reduce it would not be done in cartodb/observatory-extension proper.

iriberri commented 8 years ago

Hi @talos! Nice! :-)

I'm noticing that in your tests you used UPDATE statements, which will have a bigger cost than only retrieving the observatory data. My previous test used SELECT statements to check the speed of our functions just when retrieving the data. I ran a GetMeasure(point) via SQL API + Pl/Proxy stack and it took 0,021s, which basically means almost 48QPS.

talos commented 8 years ago

Oh! That makes sense. Thanks for the clarification.

talos commented 7 years ago

Closing this out, we've moved over to OBS_GetData and OBS_GetMeta, which have dramatically superior performance.

CartoDB / observatory-extension

Review time overhead in observatory functions #128