Open evansiroky opened 3 months ago
Currently publishing to Github not google cloud container registry
I found one gcr.io
image referenced within the repository: gcr.io/cal-itp-data-infra/dbt-spark:2023.3.28
It's not clear to me whether this is actually being used by anything currently though.
There are 4 images listed in the Container Registry UI right now:
The two gtfs-archive
haven't had new versions pushed since early 2022 and we definitely build and host current versions of the archiver on GitHub's container registry
The dbt-spark
one though as referenced above was last updated March 2023. I'm guessing that's for executing dbt Python models, which I recall Andrew/Laurie doing a demo of but do we have any Python models actually operationalized @charlie-costanzo ? I haven't been able to track down in the docs or data-infra repo where that dbt-spark
image might have been built from.
We could certainly just pull the image down and push it back up to ghcr.io somewhere and update the references just to be safe, but ideally we can locate the source for this image and set up a build workflow for it
That dbt-spark
image is built from https://github.com/cal-itp/data-infra/blob/main/warehouse/Dockerfile.spark which was added in #2346
The example Python model wasn't merged into the repository in that PR and only lives on in that PR's description. There are no active Python models in the warehouse currently.
@evansiroky do we want to put work into maintaining the ability to deploy Python models? A quick and dirty hack would be to just pull the existing image down from gcr.io
and push it up under ghcr.io
somewhere and update the references. Some additional work we might want to do though if we care about maintaining this capability:
User story / feature request
Determine what may be needed given Google Cloud transition to Artifact Registry.
Acceptance Criteria
We're able to still publish containers?