Closed JamesInform closed 3 years ago
Columnar shines when IO is the bottleneck, or when memory or storage is scarce.
For 100M rows, I see table sizes of 3.5GB for simple_row, and 115MB for simple_columnar. Columnar is compressing well, but both of those are easily small enough to fit in memory, so it's not an IO-bound workload. Also keep in mind that local SSDs are fast, so you are more likely to see an IO-bound workload when using, e.g., managed disks.
Well, I see.
But why is the query to simple_row more than 2 times faster (4322,531 ms row IN CONTRAST TO 11962,963 ms columnar), although only 2 parallel workers are involved with the row version?
Is there something like a rule of thumb beside the IO-bound workload on when to use citus? It seems that using citus only makes sense, when you can't throw more ram into the system.
I expected citus to make such easy queries like in my example to return results within millisecond. This is what one expects from columnar in memory solutions like in SQL Server and others.
Or is the citus extension not comparable to other columnar solutions?
But why is the query to simple_row more than 2 times faster (4322,531 ms row IN CONTRAST TO 11962,963 ms columnar), although only 2 parallel workers are involved with the row version?
Most likely due to the cost of decompression.
You could try using SELECT create_distributed_table('simple_columnar', 'i')
to get more parallelism.
I expected citus to make such easy queries like in my example to return results within millisecond. This is what one expects from columnar in memory solutions like in SQL Server and others.
The PostgreSQL executor processes tuples row-by-row regardless of the storage format, hence the amount of computational work is similar when using columnar storage except for decompression. Longer term, it may be possible to implement vectorized execution in PostgreSQL, but for now our main goal is to reduce I/O for queries on large tables.
That makes thing clear.
Thanks
Hi,
I have built citus extension from the github master for Mac. Server is PostgreSQL 13.2 on Mac using "Postgresapp.com"'s Mac App.
I have done a little performance testing based on the following blog: Citus 10 brings columnar compression to Postgres
I modified the script to create tables with 100 million records. I have pg_prewarm as a preloaded library. First run columnar table it a bit faster than row table (but only about 25%). But in the second run citus is much slower (11 seconds columnar to 4 seconds row)
Please tell me if I missed something?
Here comes the script run including timing: