bcgov / entity

ServiceBC Registry Team working on Legal Entities
Apache License 2.0
23 stars 59 forks source link

Spike - Establish required infra for data migration workloads #24373

Closed argush3 closed 1 week ago

argush3 commented 2 weeks ago

The data migration process will need to have Prefect related temporary infrastructure to support batch data loading as well on demand data loading for corps data.

Besides the required infrastructure for Prefect, there will also be a need to host an instance of the COLIN extract postgres db as well as a test target LEAR db. Both of these will be temporary as well.

The work in this ticket is to establish the required infra and the general setup of how things will look for the different environments.

TODOs

argush3 commented 1 week ago

Previously for firms data migration, we had run Prefect workflows locally pointing at the environment of interest.

For corps, we are looking at standing up the required data migration infrastructure in GCP for the following reasons:

Some work was done as a part of this ticket to figure out what is involved with getting the data migration pipelines running in GCP. The work was mainly around the prefect infra as the other infra is probably simpler. Note that there is an assumption that all BE LEAR services will be running in GCP by the time we start migrating corps data in Prod for MVP launch.

I was able to get all the infra required to run a hello world style Prefect workflow in GCP. A detailed doc of the steps taken to setup the required infra can be found here.

Below are a summary of GCP resources required for dev/test/prod.

Prefect Infra

Other Infra

Proposed Setup

Dev/test/prod environments will need the temporary infra provided in the infra summary(prefect + other) sections.

The dev & test environments can have the prefect specific infra recreated or scaled down on an as need basis if cost is an issue. Have tested this and is fairly quick to recreate the environment from scratch via scripts. The main resource that may need to keep running is maybe the COLIN extract cloud sql db and practice target LEAR db(dev only).

Prod related infra can be stood up when we get closer to MVP launch.

argush3 commented 1 week ago

Moving to done as analysis has been completed.

Confirmation around whether the proposed temporary infra is acceptable and details around the setup will be discussed with Thor and Patrick outside of this ticket.