epermana / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
1 stars 0 forks source link

Compression of Redshift CSV files with GZIP #1096

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
1. To which tool/application/daemon will this feature apply?

Redshift applier (redshift.js).

2. Describe the feature in general

Uploading of CSV files to Amazon S3 is slow. Compressing them would benefit the 
upload speed substantially.

3. Describe the feature interface

A new option added to share/s3-config-<service>.json:

"gzipS3Files" : "true"

It is turned off by default.

4. Give an idea (if applicable) of a possible implementation

CSV files will be compressed with gzip before uploading to S3. Then a "GZIP" 
option will be added to the Redshift COPY command to decompress them.

5. Describe pros and cons of this feature.

5a. Why the world will be a better place with this feature.

Significant reduce in upload time for multi-row transactions.

5b. What hardship will the human race have to endure if this feature is
implemented.

Another option to learn.

Original issue reported on code.google.com by linas.vi...@continuent.com on 18 Feb 2015 at 12:38

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 18 Feb 2015 at 12:39

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2796.

Compression support ("gzipS3Files" option).

Original comment by linas.vi...@continuent.com on 18 Feb 2015 at 12:39

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2797.

Example of how to use compression support ("gzipS3Files" option).

Original comment by linas.vi...@continuent.com on 18 Feb 2015 at 12:40

GoogleCodeExporter commented 9 years ago
I got this error while testing.

INFO   | jvm 1    | 2015/02/18 18:21:05 | Caused by: 
org.mozilla.javascript.WrappedException: Wrapped 
com.continuent.tungsten.replicator.ReplicatorException: OS command failed: 
command=gzip --keep /tmp/staging/alpha/staging0/redtest-msg-9.csv rc=1 stdout= 
stderr=gzip: unrecognized option '--keep'

Original comment by jeffm...@gmail.com on 18 Feb 2015 at 6:23

GoogleCodeExporter commented 9 years ago
It looks like you should run `gzip -c <filename> > <filename>.gz` in order to 
be compliant with older versions of gzip.

Original comment by jeffm...@gmail.com on 18 Feb 2015 at 6:57

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2798.

Support for older versions of gzip.

Original comment by linas.vi...@continuent.com on 19 Feb 2015 at 8:52