cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
47 stars 12 forks source link

Investigate Google Cloud deprecation of Container Registry in favor of Artifact Registry #3369

Open evansiroky opened 3 months ago

evansiroky commented 3 months ago

User story / feature request

Determine what may be needed given Google Cloud transition to Artifact Registry.

  1. What's the end of life date?
  2. What's the impact?
  3. What's the level of effort?

Acceptance Criteria

We're able to still publish containers?

HaroldBooker commented 5 days ago

Currently publishing to Github not google cloud container registry

themightychris commented 5 days ago

I found one gcr.io image referenced within the repository: gcr.io/cal-itp-data-infra/dbt-spark:2023.3.28

It's not clear to me whether this is actually being used by anything currently though.

There are 4 images listed in the Container Registry UI right now:

Screenshot 2024-09-24 at 5 28 21 PM

The two gtfs-archive haven't had new versions pushed since early 2022 and we definitely build and host current versions of the archiver on GitHub's container registry

The dbt-spark one though as referenced above was last updated March 2023. I'm guessing that's for executing dbt Python models, which I recall Andrew/Laurie doing a demo of but do we have any Python models actually operationalized @charlie-costanzo ? I haven't been able to track down in the docs or data-infra repo where that dbt-spark image might have been built from.

We could certainly just pull the image down and push it back up to ghcr.io somewhere and update the references just to be safe, but ideally we can locate the source for this image and set up a build workflow for it

themightychris commented 5 days ago

That dbt-spark image is built from https://github.com/cal-itp/data-infra/blob/main/warehouse/Dockerfile.spark which was added in #2346

themightychris commented 5 days ago

The example Python model wasn't merged into the repository in that PR and only lives on in that PR's description. There are no active Python models in the warehouse currently.

@evansiroky do we want to put work into maintaining the ability to deploy Python models? A quick and dirty hack would be to just pull the existing image down from gcr.io and push it up under ghcr.io somewhere and update the references. Some additional work we might want to do though if we care about maintaining this capability: