OpenCHAMI / roadmap

Public Roadmap Project for Ochami
MIT License
0 stars 0 forks source link

[FEATURE] Create and API contract definition for inventory-based failure domain delegation #6

Open alexlovelltroy opened 7 months ago

alexlovelltroy commented 7 months ago

Placeholder issue for defining an API spec between the on-system controller and an orchestration layer that can operate as a cloud control plane for one or more controllers.

Sites define the scope of their failure domains differently depending on the kinds of jobs they typically run. Some assign entire systems as one failure domain because jobs typically span all nodes. Others split up their systems by cooling group, rack, or chassis to assert that environmental conditions should define the scope of a failure domain. Ochami shouldn't impose any one position on all sites.

In the current ochami demo codebase, we have roughly divided our services between system-local services that run within the failure domain, and cloud services that can run anywhere. SMD and BSS have been demonstrated as cloud services along with S3. DHCP, DNS, and iPXE have been demonstrated as system-local services.

It is desirable to define an OpenAPI spec for interaction between system local services and those that run on the cloud along with a contract that describes what expected behaviors are linked to those API specifications on the controllers themselves.