edgexr / edge-cloud-platform

Apache License 2.0
1 stars 0 forks source link

allow crm code to run from ccrm #355

Closed gainsley closed 2 months ago

gainsley commented 3 months ago

Previously, our architecture required a CRM service running on edge-site to convert platform-independent APIs to platform-specific API calls to deploy VMs/Clusters/AppInsts/etc. To be able to support platforms where it is not feasible to run the CRM service on edge-site (typically because there is another layer of software managing the infrastructure), we want to be able to run the platform-specific code from the CCRM service, which runs off-edge-site alongside the Controller.

This PR refactors the common CRM code (from pkg/crmutil/controller-data.go), moving the parts that are specific to a single CRM on-edge into pkg/crm. The remaining code is modified to support being included in either the CRM, which is a single-instance process using notify for communication with the Controller, or the CCRM which is a horizontally scaled set of processes that use direct access to etcd and GRPC to communicate with the Controller. Code in pkg/crmutil is meant to be shared between both CRM and CCRM.

In the common pkg/crmutil/controller-data.go code, there was a lot of functionality that was specific to a single-instance process running over notify. The following changes were made:

The platform code also requires changes to be able to run from CCRM. This was only partially completed.

Other changes:

I still need to run tests against a real infra like Openstack, and maybe take care of the AppInstRuntime TODO.

gainsley commented 2 months ago

Hi Lev, I found a bunch of issues after more testing. In particular, I didn't realize the proxyCerts object was called by the infra-specific code, so I needed to split it into a stateless and cloudlet-specific part, and a persistent and cloudlet-independent part (cache).

levshvarts commented 2 months ago

Hi Lev, I found a bunch of issues after more testing. In particular, I didn't realize the proxyCerts object was called by the infra-specific code, so I needed to split it into a stateless and cloudlet-specific part, and a persistent and cloudlet-independent part (cache).

Sounds good. I'll still skim through the changes to build a better picture in my head, but won't focus on specifics. Will do a more in-depth review after your changes.

gainsley commented 2 months ago

Hey Lev, I already pushed the fixes, so the PR is complete.

gainsley commented 2 months ago

Thanks Lev also for taking on that code review! So I have done a wide range of tests, besides the usual unit and e2e tests, I also set it up (you can see this in the director changes) to run make test-start-dns, which let me create a cloudlet on Openstack from a CRM started via e2e tests. I also tested a k3s deployment with the operator changes on Openstack (both those openstack tests used the acceptance tests to test). But yeah, I only tested openstack with CrmOnEdge=true. All the CrmOnEdge=false testing is via the fake platform in unit/e2e tests.

I think that when we implement the OSM platform with CrmOnEdge=false we'll have a chance to find any other bugs lurking there.