Dumper is a backup management system that offers a whole new way to perform automated backups of your databases.
Add the following line to your Rails project Gemfile:
gem 'dumper'
If your database is larger than 300MB, it is recommended to also include aws-sdk-v1
gem for maximum stability.
gem 'aws-sdk-v1'
gem 'dumper'
Note: If you're using v2 of aws-sdk
, you can also use both versions at the same time.
then run the installer:
rails g dumper:install [YOUR_APP_KEY]
or manually create config/initializers/dumper.rb
and add the following line:
Dumper::Agent.start(:app_key => 'YOUR_APP_KEY')
That's it!
Now, start your server and go to the Dumper site.
You'll find your application is registered and ready to take backups.
As explained above, the Dumper agent will try to run on every process by default.
If you want to start the agent on a particular host, pass a block that evaluates to true or false to the start_if
method.
Dumper::Agent.start_if(:app_key => 'YOUR_APP_KEY') do
Rails.env.production? && dumper_enabled_host?
end
Currently Redis support is limited in that you must run the agent on the same host with Redis.
Dumper::Agent.start_if(:app_key => 'YOUR_APP_KEY') do
Socket.gethostname == 'redis.mydomain.com'
end
If you are using resque, it's a good idea to run it on the same host with Redis, and start the agent on the resque instance.
If the agent doesn't seem to work, pass true
to the debug
option.
Dumper::Agent.start(app_key: 'YOUR_APP_KEY', debug: true)
It gives verbose logging that helps us to understand the problem.
You can also pass custom dump options, with custom_options
and format
for the database type.
Dumper::Agent.start(
app_key: 'YOUR_APP_KEY',
postgresql: {
format: 'dump',
custom_options: '-Fc --no-acl --no-owner'
}
)
In a Rails app server process, a new thread is created and periodically checks the Dumper API to see if a new backup job is scheduled.
When it finds a job, the agent won't run the job inside its own thread, but instead spawns a new process, then goes back to sleep. That way, web requests won't be affected by the long-running backup task, and the task will continue to run even when the parent process is killed in the middle.
Dumper agent will try to run on every process by default. Which means, for instance, if you have 10 thin instances on production, the agent will run on those 10 instances. We designate the first agent that hits our API as primary, and the rest as secondary. In this case, 1 thin process becomes the primary and other 9 processes become secondaries. Only the primary is responsible for taking backup jobs, so it is guaranteed that there is no duplicate work on your servers. The primary polls our API with every 1- to 10-minute interval, while the secondaries poll every hour. If you run fork-based servers like unicorn or passenger, however, the agent thread runs only on the master process, not on child processes.
We do this for fault tolerance. When something goes wrong and the primary dies, one of the secondaries will take over the primary status and continue to serve.
The bottom line is that the agent is designed to be extremely efficient in CPU, memory and bandwidth usage. It's almost impossible to detect any difference in performance with or without it.