GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
647 stars 101 forks source link

Configure Production Image Setup using LocalExecutor #4511

Closed btylerburton closed 12 months ago

btylerburton commented 1 year ago

User Story

Datagovteam would like to implement a production quality airflow installation using the LocalExecutor. This is necessary in order to establish a baseline for Airflow performance in Cloud.gov, and will allow us to compare performance against other executors more quantitatively.

Related:

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

Background

Given the overwhelming number of options for configuration, establishing a performance baseline with a LocalExecutor configured for production, with an external RDS DB, will allow us to benchmark other solutions more effectively.

Cloud.gov also supports building applications from images, and given that we work with containers locally and have been traditionally supporting a both an image-to-container solution for local developement and a buildpak-to-container solution for production, then this POC will attempt to unify those two environments.

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

btylerburton commented 1 year ago

Deployed prod image successfully using airflow standalone

Image

btylerburton commented 1 year ago

Parked at https://github.com/GSA/datagov-harvester/tree/configure-local-executor

btylerburton commented 1 year ago

Airflow is working, and we are seeing logs!

Image

btylerburton commented 1 year ago

Draft PR is parked at https://github.com/GSA/datagov-harvester/pull/2

README and other cleanup TBD.

btylerburton commented 1 year ago

After much struggle with the Docker image, we pivoted to using the Python buildpack and were happily greeted with a healthy scheduler.

Image Image

btylerburton commented 1 year ago

PR is here: https://github.com/GSA/datagov-harvester/pull/3