eeshugerman / postgres-backup-s3

A handy Docker container to periodically backup PostgreSQL to S3
MIT License
485 stars 167 forks source link

gzip backup to reduce file size #7

Closed stvhanna closed 3 years ago

stvhanna commented 3 years ago

Hi Elliott, thanks for creating a useful tool! What are your thoughts on adding an option (can be even the default setting) to gzip the database backup dump to reduce the file size?

This feature is beneficial because it reduces the required storage and thus saves money. Thanks again @eeshugerman for your hard work on this!

eeshugerman commented 3 years ago

Howdy! We use pg_dump's custom format, which is already compressed by default (docs), so gzip on top of that would only add compute cost.

eeshugerman commented 3 years ago

~It might make sense to expose the compression level setting, but in my experience that's usually best left untouched.~ (Nevermind, one could use PGDUMP_EXTRA_OPTS for this.)

stvhanna commented 3 years ago

@eeshugerman You're right the PG docs clearly state that pg_dump is "compressed by default". I'm comparing the backup size of the same small database using your s3 remote backup tool and this local backup tool (https://github.com/prodrigestivill/docker-postgres-backup-local), which uses gzip.

The backup size of your tool is 245KB compared to 20KB of the other tool. I'm surprised by the results since I expect PG's default compression to be comparable as good as gzip, if not better. If you have time, can you run that test to confirm?

eeshugerman commented 3 years ago

Dang, that's a big difference! Yep, I'll look in to it.

eeshugerman commented 3 years ago

@stvhanna Would you mind testing with a with a larger DB, say a few hundred MBs? I'm wondering if this is just a sort of overhead that becomes insignificant at scale.

stvhanna commented 3 years ago

@eeshugerman Good point, let me try a larger DB and report back.

stvhanna commented 3 years ago

@eeshugerman On a 1GB DB, your tool compressed the backup to 480MB while the other tool did 457MB, so you are right on the guess that it would be mostly overhead. I don't see a reason to change your implementation as the difference is not significant. Thanks for prompt responses and a great tool! You can close this issue. :)

eeshugerman commented 3 years ago

Got it, thanks for looking into this!