bcgov / entity

ServiceBC Registry Team working on Legal Entities
Apache License 2.0
23 stars 58 forks source link

[SPIKE] MVP soft launch option - technical evaluation #23589

Open OlgaPotiagalova opened 1 week ago

OlgaPotiagalova commented 1 week ago

The purpose is to evaluate a feasibility for Entities MVP soft launch from technical perspective

Potential plan could look something like this:

Questions: • Can we start enabling features in Prod for new users before migration to GCP, or it will increase risks? • How to separate users? We don’t want any existing users to access the new system until we tell them. We don’t want any existing users to access COLIN after their account was migrated. Logic needs to be build in COLIN, and possibly in the Modernized system • What will be the timeline? • What are the risks associated with this plan and how we can mitigate them?

argush3 commented 2 days ago

Some thoughts regarding the points in this ticket.

Complete migration of all the accounts and corps by mid-March 2025

If this is required then, I think GCP and flask upgrade need to be dropped. At least GCP anyways to make the mid-march deadline.

I will also say that we can try to compress the timeline for data migration by throwing devs at this but it probably isn’t as simple as that. Data migration work is not straightforward work and requires a lot of decision making on a business, data mapping, technical implementation and data verification level. It will likely be a struggle for a lot of devs picking up this work. Also, if we do not have enough expert knowledge that can help with questions around data mapping, data nuances and verification of data, the data migration work will be slow going.

I will re-iterate that I think this is a high risk piece for mid-march deadline. There are still things coming out around data migration and there could be big pieces of work we need to address.

A side note on data migration is that there is another large chunk of work around the legacy outputs. As per the meeting the other day, it was brought up the legacy output POC work was done. This POC work established we could crawl COLIN via its UI and download legacy outputs and also some partial work around how to tie that to the ledger while corp data migration was still ongoing. But this work is just a POC and a final implementation to grab all legacy outputs, store outputs in GCP and implement logic to store/retrieve legacy outputs in the modernized application still needs to happen. This is a decent sized chunk of work. The work can be done in parallel to the core corp data migration work.

When the new infrastructure is ready and stable (GCP), and all the features planned for MVP are enabled start to gradually migrate accounts and businesses from COLIN. For example, ULC and CCC first, or selected group of users – TBD. Access for migrated accounts should be closed in COLIN -> logic is needed.

Leaving aside whether we will do GCP upgrade or not, it may not necessarily be the case that we can start migrating corps as soon as planned MVP features are enabled in Prod. It will depend on how complete the corps pipeline implementation is and if we are in a state where we are confident based on data verificiation results. For SPs/GPs we did not migrate until very late as it was very time consuming to verify that all the Prod SPs/GPs could be migrated successfully via the firms migration pipeline. If we are migrating small enough subsets of corps and the corps data migration pipeline is able to handle all the filings involved, we could probably have a more focused effort to bring these over first.

Partial data migration also has me wondering about how that affects the involuntary dissolution process. I haven’t thought about this enough but is this an issue…

Continue to migrate any historical and less critical data – TBD (after MVP)

So the other bullet point(“Complete migration of all the accounts and corps by mid-March 2025”) is just for migrating active corps in COLIN?

OlgaPotiagalova commented 1 day ago

Thank you for your thoughts, @argush3.

March deadline is not negotiable. All the corporations will need to be moved over, and the clients access to COLIN to do filings will be closed. So we have to be creative by shrinking the scope, prioritizing and coming up with the optimal plan

How much time and effort is it going to save us if we de-prioritize migration of the:

Thank you

argush3 commented 2 hours ago

@OlgaPotiagalova I think de-prioritizing the 2 things you mentioned will help a bit.

The corps filings volume(18M+) is quite large so cutting around 50%(9M+) of the filings we need to process is significant I think. We will still need to implement data migration logic for all the filings so time won’t be saved there. Time will be saved in how long the pipeline will take to run. And time will also likely be saved in dealing with data fixing one off issues for filings.

The active corp filing volume(9M+) even after removing filings for historical corp filings is still much larger than what we dealt with for SP/GPs. Firm filings was around 1M filings.

I think if we are able to exclude migrating historical corps, we may still need to migrate some of them. There will likely be a need as amalgamations depend on some of the data in the modernized database I think.

Not having to bring over output files would help as that would be a large chunk of work. I'm assuming we just mark these filings as paper only in modernized app or some other value so that no outputs are downloadable.

We can try reduce scope as you have suggested around data migration but I’m still not that confident in the data migration. I understand that data migration is a non-negotiable. My opinion remains unchanged for now as it relates to my confidence in data migration success for march deadline. We’ll need to start working on the migration so we can see what kind of problems come out and to see if my concerns are unwarranted.

OlgaPotiagalova commented 1 hour ago

Thank you @argush3 for your analysis and clarifications!

Is the plan to start on data migration in the upcoming sprint? Vysakh will be able to join to help

For the purpose of this ticket could you please specifically evaluate the MVP soft launch approach: Phase 1 (mid-Jan) - no data migration, only new users/filings Phase 2 (between mid-Jan and March) - some data migration (limited to some users, corp types only) Phase 3 (mid-March) the rest of it

See the questions in the ticket description above.

We need to make a decision within a week if we are going with phased released or "big-bang" approach, because it will determine our priorities and next steps.

Thank you.

@JazzarKarim

argush3 commented 29 minutes ago

@OlgaPotiagalova Yes, the plan is to start data migration next sprint. I will be working on getting the corps migration pipeline working again but Vysakh can pick up some analysis tickets or helper scripts that we will be needing.

I think the soft launch approach(phase 1-3) laid out works to an extent.

I'm unsure about when we'll be able do the limited data migration in phase 2 but knowing the specific set of corps we need to load earlier on will increase the chances of success.

To address the questions in the description specifically:

Can we start enabling features in Prod for new users before migration to GCP, or it will increase risks?

My opinion is we shouldn't migrate to GCP as the workload is probably too much.

But to answer the question, I think we will need to enable the functionality before we go to GCP so we can start seeing what kind of issues there are earlier on. Yes, there will be a bit of risk.

How to separate users? We don’t want any existing users to access the new system until we tell them. We don’t want any existing users to access COLIN after their account was migrated. Logic needs to be build in COLIN, and possibly in the Modernized system

Even if a corp is loaded into the modernized system, they will not be able to access their corp unless they've been affiliated. Are we saying that we will be pre-affiliating migrated corps in PRD?

As for COLIN, I don't know exactly what the home team did for SP/GPs. We will need to do some digging there.

What will be the timeline?

I think the general phase 1-3 timeline is something we can go with. If GCP is in play, we will need to get the flask upgrades in place first so I'm hoping that we get started on GCP upgrades by start of December or late November.

What are the risks associated with this plan and how we can mitigate them?

I think I've provided a lot of opinions already so won't add anything here.