OpenCHAMI / roadmap

Public Roadmap Project for Ochami
MIT License
0 stars 0 forks source link

[RFD] Which Services Should Ochami Include for SI? #21

Closed synackd closed 1 week ago

synackd commented 5 months ago

Discussion Topic

For our upcoming release which will support the 2024 Supercomputer Institute, we need to choose a set of services that work together in order to provide the students with the services necessary to manage their clusters. There are some choices that we've already committed to for this iteration. For example, our inventory system will be SMD. However, there are other services for which we have choices. Let's use this Issue to discuss the tradeoffs inherent in these choices and move forward together.

Parameters for our decisions

  1. We're only solving the problem at hand We're scoping the decision for the SI clusters today. While we shouldn't close the door to future considerations like scaling to 60,000 nodes or more, but we also shouldn't make our decisions based on future needs. Let's solve the problems of today and let's try to avoid one-way doors.
  2. Strong opinions, lightly held Be clear and forceful in describing your opinion -and- be ready to change your opinion quickly when the needs arise. Without lots of hard numbers to back up every decision, we're going to have to make judgement calls. Some of them will feel less than perfect over time. As long as we maintain good documentation of our reasons, we can adapt as we learn more and even reverse previous decisions.
  3. Remember the Students Each choice we make that adds a new technology for the students to learn also adds complexity to the solution. We may need to make tradeoffs on perfection in order to keep the overall solution easy to use for the students that are doing all of this for the very first time.

DHCP

DHCP is a core protocol for bringing up nodes. We'll have to provide it somehow. https://github.com/OpenCHAMI/deployment-recipes/issues/3 has a good discussion of the options and tradeoffs going

DNS

[ ] Create a fresh issue for DNS choices

OIDC Provider

Ochami authentication and authorization are based on JWT bearer tokens and spire workload attestation tokens. We need to include a facility for generating and refreshing tokens as needed to allow access to the API services.

[ ] Create a fresh issue discussing the oidc/spire options

API Gateway

Our inherited microservices all have existing url structures and known routes. They are confusing to interact with from a user perspective. We think that exposing them directly to students will be a challenge. In addition, we need a centralized way of presenting multiple APIs through one endpoint. There are several credible pieces of API Gateway software we could standardize on for SI.

[ ] Create a fresh issue for discussing the API Gateway options.

alexlovelltroy commented 5 months ago

Can you explain what you mean by using Ansible for DHCP configuration? Are you still expecting there to be a discrete dhcp server that is running somewhere? What are envisioning that Ansible will configure. This is likely related to https://github.com/OpenCHAMI/deployment-recipes/issues/3

synackd commented 5 months ago

I suppose the general topic for this RFD is which services should be considered core to Ochami (i.e. handled/managed/maintained as Ochami projects) and which can be left for users to choose their own solution/implementation?

Are you still expecting there to be a discrete dhcp server that is running somewhere?

For the demo, we had a separate DHCP server with a separate configuration,. After the demo, I was considering what the role of DHCP should be in Ochami. Should it be something that is somehow integrated as an Ochami microservice or is it something to be relegated as a separate solution where system administrators could choose their DHCP implementation? Since then, my thinking has evolved as it has become apparent that DHCP is important to network booting and should be considered a core service.

As to how to integrate DHCP see https://github.com/OpenCHAMI/deployment-recipes/issues/3 and my comment here.

Can you explain what you mean by using Ansible for DHCP configuration?

This was meant to be an example of how to manage DHCP if it wasn't part of Ochami, but recent evolution in thinking makes this irrelevant.

alexlovelltroy commented 4 months ago

Much of the discussion in this issue has moved to discussions within the deployment-recipes repo. We should be able to close this one once #25 has been converted to individual issues.