Metro-Records / la-metro-dashboard

An Airflow-based dashboard for LA Metro
4 stars 0 forks source link

Add DAG to clean up dangling images every day at noon UTC #74

Closed hancush closed 3 years ago

hancush commented 3 years ago

Overview

See title. This change is needed to prevent old versions of images from becoming massive space hogs, see: https://github.com/datamade/la-metro-dashboard/issues/36#issuecomment-669477831

Checklist

Notes

This is a good explanation of dangling vs. unused images: https://stackoverflow.com/questions/45142528/what-is-a-dangling-image-and-what-is-an-unused-image. We want dangling rather than unused because we don't want to re-download the most recent version of images on the next scraper / hourly processing run.

Prior to DAG, these were the images on the staging server:

ubuntu@ip-10-0-0-244:~$ sudo docker image ls
REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
datamade/la-metro-councilmatic   staging             8dcf790a6135        25 minutes ago      1.06GB
datamade/la-metro-councilmatic   <none>              083e955b8744        3 days ago          1.06GB
datamade/scrapers-us-municipal   staging             3d6d7a59f6c2        11 days ago         504MB
datamade/la-metro-councilmatic   <none>              e0378aaa35c1        11 days ago         1.06GB
datamade/la-metro-councilmatic   production          cf265444e45f        8 months ago        1.04GB

I expect these images to be removed:

ubuntu@ip-10-0-0-244:~$ sudo docker images -f dangling=true
REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
datamade/la-metro-councilmatic   <none>              083e955b8744        3 days ago          1.06GB
datamade/la-metro-councilmatic   <none>              e0378aaa35c1        11 days ago         1.06GB

Deployed to staging and triggered DAG. Log output:

*** Reading local file: /var/log/la-metro-dashboard/airflow/image_cleanup/prune_images/2021-04-18T12:00:00+00:00/1.log
[2021-04-19 19:45:28,757] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: image_cleanup.prune_images 2021-04-18T12:00:00+00:00 [queued]>
[2021-04-19 19:45:28,774] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: image_cleanup.prune_images 2021-04-18T12:00:00+00:00 [queued]>
[2021-04-19 19:45:28,774] {taskinstance.py:879} INFO - 
--------------------------------------------------------------------------------
[2021-04-19 19:45:28,774] {taskinstance.py:880} INFO - Starting attempt 1 of 1
[2021-04-19 19:45:28,774] {taskinstance.py:881} INFO - 
--------------------------------------------------------------------------------
[2021-04-19 19:45:28,788] {taskinstance.py:900} INFO - Executing <Task(BashOperator): prune_images> on 2021-04-18T12:00:00+00:00
[2021-04-19 19:45:28,792] {standard_task_runner.py:53} INFO - Started process 9617 to run task
[2021-04-19 19:45:28,911] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: image_cleanup.prune_images 2021-04-18T12:00:00+00:00 [running]> ip-10-0-0-244.ec2.internal
[2021-04-19 19:45:28,948] {bash_operator.py:82} INFO - Tmp dir root location: 
 /tmp
[2021-04-19 19:45:28,952] {bash_operator.py:105} INFO - Temporary script location: /tmp/airflowtmpvqe3bg9r/prune_imagesatnpozrs
[2021-04-19 19:45:28,953] {bash_operator.py:115} INFO - Running command: docker image prune -f
[2021-04-19 19:45:28,960] {bash_operator.py:122} INFO - Output:
[2021-04-19 19:45:31,963] {bash_operator.py:126} INFO - Deleted Images:
[2021-04-19 19:45:31,963] {bash_operator.py:126} INFO - untagged: datamade/la-metro-councilmatic@sha256:854b8f44421b183af3bfe1ca3886c855da27e7a2f505c7af1e5c0c06e8f1e33a
[2021-04-19 19:45:31,963] {bash_operator.py:126} INFO - deleted: sha256:083e955b87442035373082a197c041124160fc1270e4b59dd2bd0548fa783719
[2021-04-19 19:45:31,963] {bash_operator.py:126} INFO - deleted: sha256:f9689f0727a00ed09b4ad1f7a3686f9516521455031eb05d22d770b9b2956680
[2021-04-19 19:45:31,963] {bash_operator.py:126} INFO - deleted: sha256:0fa4bbed1fda548ae932dab48a31ffb2b1d426c0fe63f904fdb663f8a32bd4d5
[2021-04-19 19:45:31,963] {bash_operator.py:126} INFO - deleted: sha256:355cfd2c0be594d577d3bba868ce83ce5dc45236f3a4d8a214d0bb8dc31eff7d
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:727b485868f0f0411a35e4fd87f97d6c6a33cca04da61b128db9aad7d2f6f9a3
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:288d1d8e8c88359812116321c0ea99f4234dc62cbc29d67f35313e462fa01b1c
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:11dd880a72dfcd41c8eaf07a486baaf646766220cefde0396606382e77e5c344
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - untagged: datamade/la-metro-councilmatic@sha256:fd928e42d35126dbf2c761bf7cd620fe7c2a97a996e141b49fa4ac3c36ebcf4b
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:e0378aaa35c1afb187338cdfd15acd5a96879ec3906c12422401ca3aa9053299
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:25a8f97b503bf8c9138335f8c8f2ac736594fcc067face366d1e622c48339002
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:2d9a49e7c67feeaefbe2898bc3ce7f069ddc04d8cd10e3ad3694165769f69f86
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:289750e1fcf2bb260d830cdca52eb62c7e2d6b64af0a939f22bb36771ca1c5d2
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:8f9b6062066c46a052abe346b09a9f6b219f7453796e2274bdbfc491724e5744
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:17ed43f5c791deaf8da12635434b0f9cdc257128c0a9c5adc7ac49d088edcb79
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:c4fbf84b69f89354d34f67f3a54f7e758de1959aea67f8a8ed483a04591190ba
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:e386d761f571993d6e7d95476901020da61eb608f664f38139be9294164e577d
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:960e2cd776d4e9e2f080ce8615953b2e7dfecf84ec618c0b262dc9d683dff61b
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:2a4c7e4bfa00d2d811101b7e12c2f44375d6493cd4258cc7e9afc79a8dae4325
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - deleted: sha256:eadfafdb67ff61bb6ff990c537a6948a0b5d88931c8628bc525f00a10d46e046
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - 
[2021-04-19 19:45:31,964] {bash_operator.py:126} INFO - Total reclaimed space: 1.117GB
[2021-04-19 19:45:31,968] {bash_operator.py:130} INFO - Command exited with return code 0
[2021-04-19 19:45:31,999] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=image_cleanup, task_id=prune_images, execution_date=20210418T120000, start_date=20210419T194528, end_date=20210419T194531

Et voila:

ubuntu@ip-10-0-0-244:~$ sudo docker image ls
REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
datamade/la-metro-councilmatic   staging             8dcf790a6135        43 minutes ago      1.06GB
datamade/scrapers-us-municipal   staging             3d6d7a59f6c2        11 days ago         504MB
datamade/la-metro-councilmatic   production          cf265444e45f        8 months ago        1.04GB
ubuntu@ip-10-0-0-244:~$ sudo docker images -f dangling=true
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

Testing Instructions

Handles #49