cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
17 stars 2 forks source link

Automated Removal Old Backup Files from Server #133

Closed atdservicebot closed 5 years ago

atdservicebot commented 5 years ago

We have few processes that write data to our scripting server (atd-data01). We do not have any automated cleanup to delete/rotate these files.

The locations of concern are:

/home/publisher/atd-data-publishing/transportation-data-publishing/xml/sent

This directory contains files of XML messages that were send to the ESB. We have never had a situation where we need to restore these records .There's no need to keep records that are more than 1 week old.

/home/publisher/atd-data-publishing/transportation-data-publishing/data

This directory contains copies of our Data Tracker records as CSVs. Knack provides a restore service—so these serve as an added precaution. It is very rare that we've needed these, but it does happen. We can remove records that are more than one week old.

johnclary commented 5 years ago

Server ran out of space again today. I cleared sent XMLs and old backup CSVs. Issue resolved.

johnclary commented 5 years ago

@sergiogcx lots of easy fixes for this, i know. would you take a look as your time allows?

sergiogcx commented 5 years ago

Definitely. It would be great to find out how you are monitoring the server status.

sergiogcx commented 5 years ago

In the mean time, we could set up a cron to upload old xml/csv files to s3, and to clear the disk.

sergiogcx commented 5 years ago

I am thinking we could set up the same cron job with CloudWatch or SNS alarms, or set up something more sophisticated such as Zabbix or Zenoss (there are containers ready-to-go)

johnclary commented 5 years ago

@sergiogcx longer term, I know we want to use a monitoring service, but I've changed the scope of this issue to address near-term concerns. Can you find some time to either (1) set up logrotation to remove old files from these locations or (2) setup a simple bash script that does this on a cron schedule?

sergiogcx commented 5 years ago

@johnclary This is done, just needs your review:

For the data folder: ~/maintenance-scripts/transportation-data-publishing-datacsv.sh

For the xml/sent folder: ~/maintenance-scripts/transportation-data-publishing-dataxml.sh

For testing, just run (it will only print the files that are going to be deleted):

bash ~/maintenance-scripts/transportation-data-publishing-datacsv.sh
bash ~/maintenance-scripts/transportation-data-publishing-dataxml.sh

Crontab (every day at mid-night, and five after midnight):

sudo crontab -l
0 12 * * * bash (the script path for datacsv will show here)
5 12 * * * bash (the script path for dataxml will show here)

Review, and please edit line 44:

nano +44 ~/maintenance-scripts/transportation-data-publishing-datacsv.sh
nano +44 ~/maintenance-scripts/transportation-data-publishing-dataxml.sh

Remove the comment that prevents actual deletion:

  # rm -f $DATA_FILE;
sergiogcx commented 5 years ago

My understanding is that crontab runs from root's home directory, tested the script execution from there and it lists all files to be deleted.

Also: do we care to log the files that have been deleted?

johnclary commented 5 years ago

@sergiogcx beautiful. the test ran successfully and i uncommented the rm statement.

i do not care to log the CSV files, but logging the XMLs makes sense to me. would you just keep a log in /maintenance-scripts?