OpenCHAMI / roadmap

Public Roadmap Project for Ochami
MIT License
0 stars 0 forks source link

[RFD] Managing Unique Names in ochami microservices #10

Open alexlovelltroy opened 7 months ago

alexlovelltroy commented 7 months ago

Managing Unique Names in ochami Microservices

Context

In the Ochami system (Open Composable Heterogeneous Adaptive Management Infrastructure), there's a need for a flexible and extensible system to identify and manage various components. Components in an Ochami system represent hardware pieces and can range from Nodes to replaceable server parts, doors, and cooling units. This identification system needs to accommodate the system's extensibility and diverse component types.

Decision

We propose the following structure for managing component identities in Ochami:

SMD as the Central Authority: The Service Management Database (SMD) will be the authoritative inventory and state management database for all components. It will use a private UUID identifier for each component internally.

Multiple Identifier Systems:

Consequences

Open Questions

Security Considerations

External References

weallcock commented 6 months ago

I am not familiar with SMD and I am new to Ochami. Can someone point me to documentation on SMD? The reason I ask is because generally, a centralized database != scalability so I am curious how that statement is justified.

weallcock commented 6 months ago

What assumptions are being made about IP and MAC addresses? For instance, will the slingshot Algorithmic Mac Addresses (AMA) cause any problems since they can change?

weallcock commented 6 months ago

As to the question about exposing the UUID, I would argue yes unless there is a good reason (security, scalability) not to. You can't envision every possible scenario, so just make it available if people want it. If it is EVER the only identifier listed, say in an SMD log message, then this becomes pretty much required. Thats my take anyway.

alexlovelltroy commented 6 months ago

Thanks for the comments. SMD is a microservice that was built and released as part of HPE's system management suite for large HPC systems. It was open-sourced and now is a part of OpenCHAMI. At the moment, OpenCHAMI's version is a fork of the HPE version, but we hope to reconcile the changes ASAP.

You raise a good point about a centralized database. Generally speaking, a central database will become a single point of failure for a data-driven system. In CSM, all databases are clustered and deployed to be resilient against the failure of any physical node. I think we'll head that way with LANL's ochami deployment recip(ies) as well. Having said that, the SMD data doesn't change very frequently and even the state of thousands of nodes can be cached locally for most use cases. Rather than making the datastore more performant and accessible, I'd like to see us gain reliability through good contracts and development practices that encourage the use of robust client-side caches.

I'm not worried about the ip and MAC changes at the moment. Those are mutable in the system and unlikely to change frequently. With alternate pseudo-unique ids to work with that are appropriate for each use case, we shouldn't need to worry. There are edge cases around MAC changes on the management network that introduce "ghost" nodes, but those shouldn't be too difficult to work around if/when they arise.

Thanks for the feedback on UUIDs in logs as an argument for exposing them. That's an excellent point!