citusdata / cstore_fdw

Columnar storage extension for Postgres built as a foreign data wrapper. Check out https://github.com/citusdata/citus for a modernized columnar storage implementation built as a table access method.
Apache License 2.0
1.76k stars 171 forks source link

Setup for analytics data storage #185

Open martinspaeth opened 6 years ago

martinspaeth commented 6 years ago

Hi thank you for your great work at citusdata! We just setup a server with an cstore_fdw installation. The initial copy of our table (around 80 milion data rows) into the foreign table took about 20 minutes. The query performance boost with the given approach is amazing! The data in the origin table changes very rapidly (inserts and updates). While for our day to day analysis (which we plan to use the cstore for) 24 hour old data would be fresh enough i am not sure how the best practice for setting up a analytics only table with cstore should look like.

My current thougts are:

  1. setup a replica while the master has only a cstore installed (with an empty table) the replica shall fill the data in batches every 24 hours. 1.1. cause the initial fill of the table takes more than the max_standby_streaming_delay this seems to be kind of problematic
  2. like 1 but with triggers on the slave that would insert/update/delete records in the cstore 2.2. is not possible since cstore does only support insert

The setup with the trigger looks like the most up to date version and would take much less resources (assuming upserting/deleting would be not that expansive as a setup from scratch.

Is there any input you can give me about best practices here or can you give me an update about the upsert/deletion functionality of cstore_fdw (discussed also in #2 #175 and last but not least in #118)

mtuncer commented 6 years ago

hello @martinspaeth

Once the data gets into cstore_fdw table it becomes immutable. Although there are issues calling to support update/delete, they are not planned to be done anytime soon.

I have been recommending partitioned table approach who need this functionality. It works for use cases when you only need to modify RECENT data.

Notice that, this is strictly for use cases when you only modify recent data, and you move data to cstore_fdw table only if you are sure you won't be modifying it anymore.

This approach also let's you selectively drop some cstore_fdw partition tables to retire old data.

Keep in mind that, due to how postgres handles partition tables (as of 10), you should insert data directly into cstore_fdw partition, not to the master partitioned table. It won't work otherwise.