Open achshar opened 6 years ago
Try selecting only one small row that will still force the same set of rows to be retrieved – md5()
of the concatenated values of the original result set, for example. That should help determine whether there’s a parsing (or even network?) bottleneck. (pg’s pure JavaScript implementation only has one thread available to it, so it can’t parse messages and results from multiple clients in parallel.)
Try selecting only one small row that will still force the same set of rows to be retrieved – md5() of the concatenated values of the original result set, for example. That should help determine whether there’s a parsing (or even network?) bottleneck.
Can you elaborate on that a bit? I'm not sure I get it.
I'm now trying the native bindings. Maybe they will help.
If your original query was:
SELECT a, b, c FROM d
then a query that will use the same data but send only a small response is:
SELECT md5(string_agg(substr(md5(concat(a, b, c)), 1, 1), '')) FROM d
There’s probably a much simpler alternative, but I don’t know what Redshift optimizes compared to Postgres.
@achshar did you ever figure this out?
Hello, I am using node-pgsql to connect to my redshift database. (node 9.11 and 8.9) When I run my basic query sequentially, it only takes ~3-4 seconds per query. But when I run them in parallel, it jumpts to 12 seconds for each.
Somewhere, somehow, the parallel queries are still being run in sequence. I tried using redshift queues (they are 20 concurrency in 3 queues atm, so more than sufficient). Maybe someone here can help?
Here is the code
(All the extra generating stuff is to force disable redshift cache. We were having some trouble with it's session wise flag.)
And the output I get from my testing is this. Notice how first 5 requests were sequential, and they took ~3 seconds each, whereas later 5 were in parallel but took ~13 seconds each. We're tried almost everything. This is sort of a Hail Mary.