casmlab / stack

The BITS Lab STACK tool for social media collection and analysis.
http://bits.ischool.syr.edu/
MIT License
1 stars 0 forks source link

Automate backups to S3 #24

Open libbyh opened 7 years ago

libbyh commented 7 years ago

Following @potus fills up the server, so we need to automate backups and deletions of processed tweets. Should be a shell script that runs through cron.

Here's the process:

1. Go to the archive dir

cd /stack/data/$COLLECTION-NAME/twitter/archive

2. Remove _out_processed.json

find . -name "*_out_processed.json" -print0 | xargs -0 sudo rm OR find . -name "*_out_processed.json" -delete

3. Tar and Remove _out.json

find -type f -name "*_out.json" | sudo tar -czf $COLLECTION-NAME-backup-$(date +%Y-%m-%d).tar --files-from - --remove-files

4. Copy tar file to S3 (as root)

export PATH=~/.local/bin:$PATH source activate aws aws s3 mv $TAR-FILENAME.tar s3://beckett-stack-backup/

libbyh commented 6 years ago

Same problem on Waverly.

libbyh commented 6 years ago

Didn't backup to S3 from Waverly because I'm waiting to hear about old vs new AWS account billing to UM.