citusdata / cstore_fdw

Columnar storage extension for Postgres built as a foreign data wrapper. Check out https://github.com/citusdata/citus for a modernized columnar storage implementation built as a table access method.
Apache License 2.0
1.76k stars 171 forks source link

Need strategy for backing up cstore_fdw files #24

Open jberkus opened 10 years ago

jberkus commented 10 years ago

$subj pretty much says it. How do we back up the cstore_fdw files? Ideas?

pykello commented 10 years ago
  1. I was thinking as an initial method, in the document we can tell users that they can use pg_dump (to dump the DDL commands) + COPY (SELECT * FROM ...) TO 'backupfile'. We can also implement COPY TO to make this a bit easier.
  2. If automatically managed files are inside PGDATA, pg_basebackup should back them up.
  3. For future, should we provide a script which users can use for backing up cstore tables?
jberkus commented 10 years ago

Yeah, a script would be nice.

I'm also thinking that scripting backup-by-copying-table-files is a desirable approach for cstore. If you're doing COPY TO, then you're uncompressing the data in order to recompress it.

pykello commented 9 years ago

I added this to the list of features to be added in v1.3. v1.3 should be released sometime in the 1st quarter of 2015.

mtuncer commented 9 years ago

postponed to later release

gunicsba commented 7 years ago

So I assume we can't use pg_dump to create backups of the cstore files?

ghost commented 7 years ago

We have a backup/mirror script for cstore tables. But it assumes that the database doesn't change while the mirror script is running...

Script to translate between cstore filenames and a readable form:

SELECT
  (SELECT oid
   FROM pg_database
   WHERE datname = current_database()) AS dboid,
  ftrelid,
  nspname,
  relname
FROM pg_foreign_table
  JOIN pg_class ON pg_class.oid = ftrelid
  JOIN pg_namespace ON pg_namespace.oid = relnamespace;

Using this query, it's basically

[... pg_dump ...]

echo "Copying cstore files to tempdir..."
${ssh} ${production_host} "rm -rvf \"${dest_dir}/cstore_fdw/\""
${ssh} ${production_host} "mkdir -pv \"${dest_dir}/cstore_fdw/\""
${ssh} ${production_host} "cat /bi/mirror/get-cstore-tables.sql | ${psql} --user=dwh --tuples-only --no-align --field-separator=' '  --quiet ${production_database} | awk '{print \"cp -v {{pillar['pg-data-dir']}}/cstore_fdw/\"\$1\"/\"\$2\" ${dest_dir}/cstore_fdw/\"\$3\".\"\$4\";\ncp -v {{pillar['pg-data-dir']}}/cstore_fdw/\"\$1\"/\"\$2\".footer ${dest_dir}/cstore_fdw/\"\$3\".\"\$4\".footer\"}' | bash"
${ssh} ${production_host} "chmod -R 755 \"${dest_dir}/cstore_fdw/\""

[...]

echo "rsync cstore files to destination..."
${ssh} ${dest_host} "rm -rvf \"${dest_dir}/cstore_fdw/\""
${ssh} ${dest_host} "mkdir -pv \"${dest_dir}/cstore_fdw/\""
${ssh} ${dest_host} "rsync -av --delete ${production_host}:${dest_dir}/cstore_fdw/ ${dest_dir}/cstore_fdw/"
${ssh} ${dest_host} "chmod -R 755 ${dest_dir}/cstore_fdw"
[...]

[... pg_restore ...]

echo "Restoring cstore files from tempdir..."
${ssh} ${dest_host} "cat /bi/mirror/get-cstore-tables.sql | ${psql}  --tuples-only --no-align --field-separator=' '  --quiet ${dest_tmp_db} | awk '{print \"cp -v ${dest_dir}/cstore_fdw/\"\$3\".\"\$4\" {{pillar['pg-data-dir']}}/cstore_fdw/\"\$1\"/\"\$2\";\ncp -v ${dest_dir}/cstore_fdw/\"\$3\".\"\$4\".footer {{pillar['pg-data-dir']}}/cstore_fdw/\"\$1\"/\"\$2\".footer\"}' | bash"
jberkus commented 7 years ago

@Dezzsoke correct, you cannot.

jberkus commented 7 years ago

FYI: if you install CStore files to the default location, the pg_basebackup binary backup will back them up. It's only pg_dump which will not.

gunicsba commented 5 years ago

FYI: when working with partitioned tables you want to use SELECT * FROM ONLY table. Otherwise you'd dump the data from the table twice if you copy it out from both the master and the partitions. (If you managed to do this you can do a TRUNCATE ONLY to drop out the data you don't need.