globaleaks / globaleaks-whistleblowing-software

GlobaLeaks is free, open-source whistleblowing software enabling anyone to easily set up and maintain a secure reporting platform.
https://www.globaleaks.org
Other
1.23k stars 269 forks source link

Automatic Remote Backups #528

Open DrWhax opened 11 years ago

DrWhax commented 11 years ago

Currently, Globaleaks only places the data on one machine in /var/globaleaks.

If that machine goes down for whatever reason, the data would be gone forever and that would be a waste of the hard work of the whistleblowers and journalists.

I propose to replicate the data over multiple machines so there are backups of the material.

This raises a few questions, what should the setup be like for this?

fpietrosanti commented 11 years ago

I propose to work as follow:

The script must be configurable trough the use of /etc/default/globaleaks file within this behaviour:

If the backup is enabled, the cronjob will execute the backup script.

The script must log it's error to standard output, so that the cronjob will send email local "root"

DrWhax commented 11 years ago

This sounds OK, I would add that the sysadmin should be given an option if they want to rsync every hour or day? I would be in for hourly backups instead of a day to day rsync. Next to that, I would add to add a configure option to backup to one machine or multiple machines ( think different jurisdictions).

Thoughts?

fpietrosanti commented 11 years ago

@DrWhax Yes, it would make sense to activate the cronjob task on /etc/cron.hourly so that the script will be executed hourly. Then make the script in a way that's configurable trough /etc/default/globaleaks to decide if the backup will be done hourly or daily ?

DrWhax commented 11 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 08/19/2013 02:47 PM, Fabio (naif) Pietrosanti wrote:

@DrWhax https://github.com/DrWhax Yes, it would make sense to activate the cronjob task on /etc/cron.hourly so that the script will be executed hourly. Then make the script in a way that's configurable trough /etc/default/globaleaks to decide if the backup will be done hourly or daily ?

— Reply to this email directly or view it on GitHub https://github.com/globaleaks/GlobaLeaks/issues/528#issuecomment-22869079.

Correct


Give a man a fish and you feed him for a day; teach a man to fish and you feed him for life.

http://jurrevanbergen.nl/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJSEha4AAoJELc5KWfqgB0C0OoH/0mIspYgN0UpKIbcQUfgC3ur mV3erVR9Buwc0oBlfbWLw93JWDgnhbBdGWKWcNxcLlrf854jgsJP6AqgzGNco3LQ tE2tFK885xoGdEzZdIqp+Wp6J5NN4HUi1t2mppLLV4//TxG8867FwR6YuGEda5tz 5MqL1wT/z5z0aJER3FjbMOUktNXYDhW0qsK4MB7Vif7Clp/VLl7e/42jhyDhSYCh qGNCR6pgE8s7EbsLAVR6T0JeG1X5ZiLpA45UIgFOfWcK4cnoxzZM83P6VCMby3uh ovnZtnVTcBkNnsonzsbJvXrzHIstNuMMY8LDCLWUsION1Hh1XfcdPvLnA0gsbEE= =Xvk/ -----END PGP SIGNATURE-----

vecna commented 11 years ago

@DrWhax your goal is provide backups and fault tolerance (and, in example, lose at worst the last hour of activity) or you want a constant alignment between two boxes ?

DrWhax commented 11 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Only providing backups and fault tolerance. If something is lost.. so be it. I don't think it's a wise idea to constantly backup since observers could figure out if something has leaked or not..?

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJSE5M2AAoJELc5KWfqgB0CFo4IAKUxC5ASK6vGdn7J3WcMHgmM Yt+RQnKdoMHdCov3xG2sp325KKu+6ZboJxFeaKohjaWovbARSvvJQw2heJKetX2m jhng3EmlinhigbKcbAKEVkZaot3pOtX+pNAWMZ/jLNU2lW7bdE7KdRJhzRaLE2Vh OyIRytMKTRWZzEGkYBHO8GFY8eH/QEvT2/ozl46MkO7kazvqbN2SD5IvDJXYscfU yah/i9EURYxTb+H6wDHwZEBrjI1VrZjqhhWScLjHZG0yoXD6Zb5wz7YRTV2ABNb+ E4v3gLmAhfh0ICMvnPL68Ca1wnGY1/j6BumOgqVGypw3izSNsdpPFb6khO1PlPg= =6EdF -----END PGP SIGNATURE-----

vecna commented 11 years ago

Well, backups run over onion network, no leaking of information thru passive traffic analysis would be possible. anyways, I was thinking how much effort would require, just update DB from the master to the slave (and files/ directory) every time an active operation happen. (a mail il spooled => remote update, an access is performed => update, file is deleted => update).

and I believe the answer is "effort it's more than acceptable" :D

fpietrosanti commented 11 years ago

@vecna any a basic master-slave syncronization managed at application level require more than 60 man working days excluding testing. That's a solution that only enterprise applications can afford.

fpietrosanti commented 11 years ago

@drwhax is this ticket proceeding in implementation?

DrWhax commented 11 years ago

Hm?

hellais commented 11 years ago

Has somebody started working on this ticket? I don't think this is something that the globaleaks team can handle developing by the production due date as we have coding tasks to do and this is a sysadmin/deployment specific feature.

I would say somebody takes on this ticket by the next 24 hours or I am going to move it into the backlog.

hellais commented 11 years ago

I am moving this to the backlog.

hellais commented 11 years ago

I am moving this to the wishlist.

DrWhax commented 10 years ago

So, I have been playing around with tahoe-lafs and their SFTP option. I'm creating a script which will do a full backup of /var/globaleaks, encrypts it with GPG and sends it to a tahoe-lafs grid. I'll share the script soon.

Edit: I'll add support for shamir's secret sharing scheme as well.

fpietrosanti commented 10 years ago

@DrWhax wonderful!

fpietrosanti commented 10 years ago

@DrWhax Any news from such a script for backup, to be committed back? :-) It would be a nice addition to be published by the HOPE's talk

evilaliv3 commented 7 years ago

As the python/twisted facilities for implementing an SFTP scheduler i'm thinking that it wont be that difficult and would be actually good to design a small feature for implementing this.

From the configuration point of view the user just need to configure:

With this in place the system could implement a simple scheduled job that archive the data and push them to a directory of a remote server where the credentials could allow the write but not the read.

Different thing is to implement the serverside component that implement the file rotation but this simple protocol will leave it simple to the implementer.

fpietrosanti commented 7 years ago

I don't want to create a monster but i'd suggest to split in two the feature with it's own configured scheduling but also retention: a) Backup b) Remote upload

The data directory for the backups should be $DATADIR/backups (so that apparmor profile is not impacted). The parameters for the backups should be:

The local backup feature would just create a simple compressed archive with the entire $DATADIR and /etc/globaleaks with a unique pre-defined naming-format including the date (es: globaleaks-backup-$hostname-22-02-2017.tar.gz)

The parameters for the Remote upload should be:

The configuration and UI logic could be a set of "parameters" that are passed to external commands executed by globaleaks process scheduler from /usr/share/globaleaks/backup/ in order to gives the system administrator the flexibility to entirely customized it or replace it?

evilaliv3 commented 5 years ago

I've implemented the first part of this ticket (local backups).

The feature enable users to configure the number of daily, weekly and monthly backups to be performed.

The interface already enable the user to configure remote backups using SCP but the scp component is still under development.

The backup filename is defined to be: date-version-timestamp.tar.gz

Example: 2018_12_26_1545860368_3.5.8.tar.gz

It has been considered to a a uuid4 (as identifier of the node) to makes it possible to use the same scp-write-only-account for storing backups of multiple instances to share the storage with the practical impossibility for each of them to delete others' backups. This would require to configure a specific SCP configuration in write only.

The simplified solution currently implemented is by me preferred as more clean and secure; in fact leaveing the node possibility to read the list of remote backups make it possible for the node to implement the full logic of data retention and possibly delete leaked files due to failures.