Closed iriberri closed 7 years ago
to give some context, 275ms means like 3QPS, a service like this should be in the range of 100-200QPS. I'm pretty sure we could do some improvements in queries and database configuration but first we need to understand what's going on behind that function
I think there might be some indexes missing, for example in this table I'd expect the_geom to have it:
obs_2016_06_15_d428377abd=# \d+ observatory.obs_c6fb99c47d61289fbb8e561ff7773799d3fcc308
Table "observatory.obs_c6fb99c47d61289fbb8e561ff7773799d3fcc308"
Column | Type | Modifiers | Storage | Stats target | Description
----------+----------+-----------+----------+--------------+-------------
geoid | text | | extended | |
the_geom | geometry | | main | |
aland | numeric | | main | |
awater | numeric | | main | |
Indexes:
"obs_c6fb99c47d61289fbb8e561ff7773799d3fcc308_geoid_idx" btree (geoid)
Good catch on the missing geom indices -- I had originally been running under the assumption that CartoDBfication would add those, but the dump doesn't run through that process. This means we have to add them manually to the dump. This will make ingesting the dump much slower.
Checklist for me:
the_geom
GIST index to dump for all data tables.obs_meta
table, which had most of what we need pre-joined. We're probably at ~6 or 7 SELECT to metadata tables, each with some JOINs. This could probably be brought down to 1 or 2 SELECTs, with no or very few JOINs.NOTICE
s, which I believe can be a source of slowdownI'd just replace NOTICE's with DEBUG's as they can save us eventually.
I was doing some tests in my own instance with the 6K rows table and my table-level functions:
... wow
nice catch
With the added indexes we're now at 70QPS (93133.434 ms for 6429 rows).
Did extensive tests documented here: https://docs.google.com/spreadsheets/d/1cmiB0GfCwN4rLjEB51QhBr2Amlj6YX1s2BixtSFZfb8/edit#gid=0
In summary, in production, we're looking at:
method | QPS low | QPS high |
---|---|---|
OBS_GetMeasureByID(block_group) |
31 | 35.5 |
OBS_GetMeasure(Point) |
26 | 33 |
OBS_GetMeasure(Polygon, no interpolation) |
26 | 32 |
OBS_GetMeasure(Polygon, interpolated) |
17 | 19 |
OBS_GetCategory(Point) |
29 | 35.7 |
OBS_GetCategory(Polygon, no interpolation) |
27 | 33.5 |
OBS_GetCategory(Polygon, interpolated) |
25 | 26.5 |
I compared these numbers to the performance before the last release, and they're either dramatically improved or mostly similar. I'm not sure whether the 70QPS was through a non-production pathway.
Those numbers are also using code in observatory-extension that's about as performant as I see us getting (one metadata request, then one request for the data itself, efficient algorithms for calculating interpolated areas when we need to).
@javisantana @iriberri @rafatower Can we close this? I think we've thoroughly reviewed the overhead, and at this stage any further steps to try to reduce it would not be done in cartodb/observatory-extension proper.
Hi @talos! Nice! :-)
I'm noticing that in your tests you used UPDATE statements, which will have a bigger cost than only retrieving the observatory data. My previous test used SELECT statements to check the speed of our functions just when retrieving the data. I ran a GetMeasure(point)
via SQL API + Pl/Proxy stack and it took 0,021s, which basically means almost 48QPS.
Oh! That makes sense. Thanks for the clarification.
Closing this out, we've moved over to OBS_GetData
and OBS_GetMeta
, which have dramatically superior performance.
Getting a single measure for a point locally (no DS layer, no SQL API) takes more than 200ms. We should check where the overhead is being caused (probably in the metadata checks/decisions?) because it should be quick being a join within a table which has geometric indexes.
cc @javisantana @rafatower