Closed antidipyramid closed 9 months ago
i defer to hannah, but i would be interested in seeing what it could look like to use airflow’s secret facilities. https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/index.html
On Tue, Dec 19, 2023 at 3:30 PM M J @.***> wrote:
With Councilmatic's switch to Heroku, we no longer use encrypted settings so we modified https://github.com/Metro-Records/la-metro-dashboard/commit/872cfd50fdea458a978bd59b614a111ca3e91a89 DAGs that spin up a Councilmatic countainer to use a new TaggedDockerOperator https://github.com/Metro-Records/la-metro-dashboard/blob/cfd01209169beeaa04f52d3cdc386d21e37eb037/plugins/operators/blackbox_docker_operator.py instead of the old BlackboxDockerOperator.
However, the analytics DAG still needs to decrypt Councilmatic's configs/lametro_service_acct_key.json.gpg in order to upload analytics to Google Drive.
I see a couple ways to address this:
1.
The scraper DAGs still need us to concatenate configs/connection_settings.py https://github.com/Metro-Records/la-metro-dashboard/blob/cfd01209169beeaa04f52d3cdc386d21e37eb037/configs/connection_settings.py to pupa_settings.py https://github.com/Metro-Records/scrapers-lametro/blob/9c0ddd1b1b59a9714811d29cf0691ea3181a283a/pupa_settings.py. We could create separate Blackbox Docker operators for scraper and councilmatic jobs. The former would update the scrapers' pupa_settings.py as usual and the latter would just run blackbox_postdeploy. 2.
Update pupa_settings.py https://github.com/Metro-Records/scrapers-lametro/blob/9c0ddd1b1b59a9714811d29cf0691ea3181a283a/pupa_settings.py in the scrapers repo with the required DB configs in configs/connection_settings.py https://github.com/Metro-Records/la-metro-dashboard/blob/cfd01209169beeaa04f52d3cdc386d21e37eb037/configs/connection_settings.py. That way, we don't add an extra operator and can get rid of configs/connection_settings.py and scripts/concat_settings.sh https://github.com/Metro-Records/la-metro-dashboard/blob/cfd01209169beeaa04f52d3cdc386d21e37eb037/scripts/concat_settings.sh. BlackboxDockerOperator would only be responsible for running blackbox_postdeploy.
Option two seems simpler in the end, to me. Any reason we shouldn't try that first? @hancush https://github.com/hancush @fgregg https://github.com/fgregg
— Reply to this email directly, view it on GitHub https://github.com/Metro-Records/la-metro-dashboard/issues/123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEDC3J6XRU6GDUSTFBC5QTYKH2ONAVCNFSM6AAAAABA3YE67WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2DSMZYHEZDCOA . You are receiving this because you were mentioned.Message ID: @.***>
@fgregg @hancush
It's possible to set variables in the Airflow UI (similar to what we currently do with Heroku env vars). These variables are encrypted and can be masked after assignment. We could store the service account key as a variable in Airflow and pass it to the DockerOperator
as a env var.
That'd look like this:
default_args = {
"start_date": START_DATE,
"execution_timeout": timedelta(minutes=10),
"image": LA_METRO_IMAGE_URL,
"environment": {
**LA_METRO_CONFIGS,
"DATABASE_URL": LA_METRO_DATABASE_URL,
"SEARCH_URL": LA_METRO_SEARCH_URL,
"SENTRY_ENVIRONMENT": ENVIRONMENT,
"GOOGLE_SERVICE_ACCT_API_KEY": "{{ var.value.google_service_acct_api_key }}", # pull from airflow variables
},
}
with DAG(
"tag_analytics",
catchup=False,
default_args=default_args,
schedule_interval="0 0 1 * *",
description="Generates analytics for Metro agenda tags and"
"uploads a CSV file to Google Drive",
) as dag:
t1 = TaggedDockerOperator(
task_id="generate_tag_analytics",
command="python manage.py generate_tag_analytics",
)
We'd just have to amend the Councilmatic app to authenticate to Google Drive with a JSON object parsed from the GOOGLE_SERVICE_ACCT_KEY
env var instead of the current file. The upside is that we'd no longer need to use Blackbox in the Councilmatic app at all!
(Note that the docs also recommend using an custom or external secrets manager (e.g. AWS Secrets Manager) for sensitive vars but I'm not sure if we think that's necessary)
i think that approach makes sense. if we do it we should probably also move other variables out of environment specific settings.py files so we have a consistent way of doing things.
however, this is a big departure from our current pattern, so @hancush should have the last word.
Seems fine to me, so long as whatever we're doing is documented in the README.
I also abdicate having the final word on decision making until I return from leave. Thanks, guys!
okay, @antidipyramid, let's do a PR where you just handle this secret.
then later, we can do a code quality PR that migrates existing secrets into this pattern.
Connected to Metro-Records/la-metro-councilmatic#1043
With Councilmatic's switch to Heroku, we no longer use encrypted settings so we modified DAGs that spin up a Councilmatic countainer to use a new
TaggedDockerOperator
instead of the oldBlackboxDockerOperator
.However, the analytics DAG still needs to decrypt Councilmatic's
configs/lametro_service_acct_key.json.gpg
in order to upload analytics to Google Drive.I see a couple ways to address this:
The scraper DAGs still need us to concatenate
configs/connection_settings.py
topupa_settings.py
. We could create separate Blackbox Docker operators for scraper and councilmatic jobs. The former would update the scrapers'pupa_settings.py
as usual and the latter would just runblackbox_postdeploy
.Update
pupa_settings.py
in the scrapers repo with the required DB configs inconfigs/connection_settings.py
. That way, we don't add an extra operator and can get rid ofconfigs/connection_settings.py
andscripts/concat_settings.sh
.BlackboxDockerOperator
would only be responsible for runningblackbox_postdeploy
.Option two seems simpler in the end, to me. Any reason we shouldn't try that first? @hancush @fgregg