Open alexlovelltroy opened 7 months ago
I am not familiar with SMD and I am new to Ochami. Can someone point me to documentation on SMD? The reason I ask is because generally, a centralized database != scalability so I am curious how that statement is justified.
What assumptions are being made about IP and MAC addresses? For instance, will the slingshot Algorithmic Mac Addresses (AMA) cause any problems since they can change?
As to the question about exposing the UUID, I would argue yes unless there is a good reason (security, scalability) not to. You can't envision every possible scenario, so just make it available if people want it. If it is EVER the only identifier listed, say in an SMD log message, then this becomes pretty much required. Thats my take anyway.
Thanks for the comments. SMD is a microservice that was built and released as part of HPE's system management suite for large HPC systems. It was open-sourced and now is a part of OpenCHAMI. At the moment, OpenCHAMI's version is a fork of the HPE version, but we hope to reconcile the changes ASAP.
You raise a good point about a centralized database. Generally speaking, a central database will become a single point of failure for a data-driven system. In CSM, all databases are clustered and deployed to be resilient against the failure of any physical node. I think we'll head that way with LANL's ochami deployment recip(ies) as well. Having said that, the SMD data doesn't change very frequently and even the state of thousands of nodes can be cached locally for most use cases. Rather than making the datastore more performant and accessible, I'd like to see us gain reliability through good contracts and development practices that encourage the use of robust client-side caches.
I'm not worried about the ip and MAC changes at the moment. Those are mutable in the system and unlikely to change frequently. With alternate pseudo-unique ids to work with that are appropriate for each use case, we shouldn't need to worry. There are edge cases around MAC changes on the management network that introduce "ghost" nodes, but those shouldn't be too difficult to work around if/when they arise.
Thanks for the feedback on UUIDs in logs as an argument for exposing them. That's an excellent point!
Managing Unique Names in ochami Microservices
Context
In the Ochami system (Open Composable Heterogeneous Adaptive Management Infrastructure), there's a need for a flexible and extensible system to identify and manage various components. Components in an Ochami system represent hardware pieces and can range from Nodes to replaceable server parts, doors, and cooling units. This identification system needs to accommodate the system's extensibility and diverse component types.
Decision
We propose the following structure for managing component identities in Ochami:
SMD as the Central Authority: The Service Management Database (SMD) will be the authoritative inventory and state management database for all components. It will use a private UUID identifier for each component internally.
Multiple Identifier Systems:
XNAME: A globally unique string encoding component type and physical location, used across the system for consistent identification.
IP Address and MAC Address: For network-related components, IP addresses and MAC addresses will be used. SMD will maintain the mapping of these addresses to components.
Flexible Component Registration: The system must support the addition of a component for which not all identifiers (XNAME, MAC, IP address) are available initially. Such components can be registered in the system by an inventory discovery service and will be made available for use by other microservices once the necessary identifiers are added.
Consequences
Open Questions
Security Considerations
External References