OpenShift nice-to-have feature list

esune commented 3 years ago

The following is a list of nice-to-have features for OpenShift, based on personal experience (and taste) and on the patterns established for the projects hosted on the BC DevOps instance of OCP. They are not all-new features, but rather optimizations and tweaks of current workflows and setups.

[ ] Pre-configured role bindings: when deploying a new set of services, role bindings need to be configured in order to follow the recommended pattern of storing builds/images in the tools namespace, and deployments in dev/test/prod.
[ ] Pre-configured "general" NSPs: there are a couple of NSPs that could be configured by default when generating a name set, as not having them in place will cause things to not work at all. In particular, the NSP allowing pods to talk to the K8S api should be pre-configured in all namespaces (tools/dev/test/prod), while the NSP allowing builds to pull from the internet (Git, NPM, PyPy, NuGet, etc.) should be pre-configured for the tools namespace.
[ ] Pre-configured pull secrets to make using Artifactory (or other Docker image registries) transparent.
[ ] The OCP3 way of copying the login command, without having additional pop-up windows and links to click seemes more efficient/user friendly.

This list could be extended with suggestions from other teams/developers as considered appropriate.

WadeBarnes commented 3 years ago

If the pull secrets are created and properly linked to the correct service accounts in the given environments the use of the Artifactory proxy to pull images becomes completely transparent to the project teams; i.e. there is NO need to explicitly define them in the BCs and DCs. Here are some recent updates we did to the BCDevOps/openshift-developer-tools; https://github.com/BCDevOps/openshift-developer-tools/pull/128/files.

Those functions are part of the initOSProjects.sh, which automates the initial setup of the roles and secrets on a given project set. It automatically detects the artifacts-default-* creds in the tools project and automatically sets up the pull credentials and links them to the service accounts.

Example run (on an environment that already been setup, but it gives you the gist):

Found secret artifacts-default-ugnrgl, would you like to use this as a pull secret? (y/n)
y

Pull secret, artifactory-creds already exists in 583dbf-tools ...
Linking pull secret, artifactory-creds, to the default service account in 583dbf-tools ...
Linking pull secret, artifactory-creds, to the builder service account in 583dbf-tools ...

Pull secret, artifactory-creds already exists in 583dbf-dev ...
Linking pull secret, artifactory-creds, to the default service account in 583dbf-dev ...
Linking pull secret, artifactory-creds, to the builder service account in 583dbf-dev ...

Pull secret, artifactory-creds already exists in 583dbf-test ...
Linking pull secret, artifactory-creds, to the default service account in 583dbf-test ...
Linking pull secret, artifactory-creds, to the builder service account in 583dbf-test ...

Pull secret, artifactory-creds already exists in 583dbf-prod ...
Linking pull secret, artifactory-creds, to the default service account in 583dbf-prod ...
Linking pull secret, artifactory-creds, to the builder service account in 583dbf-prod ...

Something like this could be done automatically when the projects sets are provisioned.

mitovskaol commented 3 years ago

@esune I created a ticket in our backlog to look into automating your suggestions with the Registry's provisioner bot. @WadeBarnes Your suggestion is in another ticket. We will tackle them in the next 2-3 Sprints.

IanKWatts commented 3 years ago

@esune @WadeBarnes Comments on these suggestions are inline below.

1) Preconfigured role bindings Not all teams use the tools/dev/test/prod model and the Platform Services Team prefers that role bindings be managed by project teams.

2) Preconfigured network security policies Now that the OpenShift platform is using Kubernetes network policies instead of Aporeto, this is no longer an issue. Egress traffic is now allowed by default and the Aporeto zero-trust policy is not in effect.

3) Preconfigured pull secrets My understanding is that there are some problems with this. If we use cluster-wide pull credentials so that project teams do not have to manage this on their own, then the platform team would be responsible for the management of the credentials, which the team would prefer to avoid. Each project team should have full control over their pull secrets. Alternatively, if pull secrets were automatically set up and managed by ArgoCD for project teams, that would interfere with teams managing secrets on their own. The Platform Services Team will consider options for this in the future, but for now will not manage pull secrets for teams.

4) Copying the login command Unfortunately, this is a feature of OpenShift 4 and is not something that we can change. It was implemented as a security feature by Red Hat. In OpenShift 3 the token could be copied with a single click, even if the user's session was already expired (the token might still be valid after session expiration). To prevent this, in OpenShift 4 you must open a new page in order to ensure that your session is still valid, although this is slightly less convenient. Changing this behaviour would require a change request to Red Hat.

If project teams are struggling with items 1, 3, and 4, then perhaps an update to the onboarding process or documentation is required.

esune commented 3 years ago

Thanks for the reply @IanKWatts .

A couple of additional notes:

1) Preconfigured role bindings Not all teams use the tools/dev/test/prod model and the Platform Services Team prefers that role bindings be managed by project teams.

I wasn't aware this was (still) possible, as I was under the impression that the tools/dev/test/prod pattern was strongly recommended, if not even enforced in some way. I would play devil's advocate and say that if the majority of teams is following this pattern, it might be beneficial to have the role bindings pre-populated based on the recommended behaviour. I might however be missing something that could cause grief in the long term.

3) Preconfigured pull secrets My understanding is that there are some problems with this. If we use cluster-wide pull credentials so that project teams do not have to manage this on their own, then the platform team would be responsible for the management of the credentials, which the team would prefer to avoid. Each project team should have full control over their pull secrets. Alternatively, if pull secrets were automatically set up and managed by ArgoCD for project teams, that would interfere with teams managing secrets on their own. The Platform Services Team will consider options for this in the future, but for now will not manage pull secrets for teams.

What I meant with pre-configured pull-secret is what @WadeBarnes described above, which is linking it to the default service account for the namespace. That I all I believe is needed for image pulls to succeed with Artifactory, and it looks like a similar situation to 1 to me.

Updates to the docs are definitely always welcome, I admit though that the above points/ideas came up (at least for me) after several years of working on the platform when we switched to OCP4 and things didn't "just work" because I was missing steps that, while they could be documented in the onboarding process, would have been skipped anyway as technically I was not really onboarding again 😅 (trying to provide some additional context in case it can be helpful to tackle the items and/or update docs more effectively).

mitovskaol commented 3 years ago

@j-pye What is your opinion about whether pull secrets should or should be managed by ArgoCD?

WadeBarnes commented 3 years ago

There is an automated process for provisioning a set of credentials for Artifactory in the tools environments, I don't think it would be too much effort to enhance that process to do what my scripts do; create the associated pull secrets and register them with the appropriate service accounts.

caggles commented 3 years ago

I agree that making the Artifactory pull secret process transparent to teams is a good idea, but I do not think this is the best way to do it. ArgoCD would not be involved in the creation of these additional pull secrets and service account bindings - ArgoCD only makes a ArtifactoryServiceAccount object. If we were the implement something like this, it would more likely work so that any new ArtifactoryServiceAccount object (default or otherwise) could (perhaps optionally?) create all of these links between the various namespaces and service accounts through the Archeobot operator.

The problem with this method is that this means that the operator manages a significantly increased number of total objects, with the objects existing in namespaces that where the CR does not exist, and assumes that the CR must exist in a tools namespace. This means increasing the load on the Platform Team, because suddenly a large number of secrets and service accounts are our responsibility to maintain. It means the load on the operator increases. It's not a good practice to have the operator manage objects related to a CR that doesn't live in the same namespace as the CR (because a CR in tools would now be linked to a service account in prod, for example). We could avoid this by creating different ArtifactoryServiceAccounts in each of the 4 namespaces as part of standard provisioning, but that would increase the load on the operator even more.

And, on a more philosophical note, I don't know that something like this should be transparent. I don't think we want to take ownership of things in a team's namespace, but that's what it means to make something transparent - "this isn't your problem anymore, we'll take care of it." I like the idea of transparency for things like this, don't get me wrong, I just don't think we should be implementing said transparency by taking on responsibility for objects that live in namespaces. If we create transparency, we should work hard to make sure that we're doing it through cluster settings, instead of fudging with the namespaces and then telling the teams that they don't need to understand what we've done. The namespace is theirs - they should understand everything going on within it. That's part of the reason why the default ArtifactoryServiceAccount is so barebones - it's not supposed to do a bunch of stuff for the teams, because it's an object in the namespace and therefore should be owned primarily by the team. If we make that default ArtifactoryServiceAccount in tools completely transparent, then it doesn't matter if we say the object is owned by the team - the fact of the matter is that it's on us to make sure it does what the team needs it to do.

That's why, instead, we are currently investigating the possibility of creating a cluster-wide default account for Artifactory which would grant access to the caching repos. This way, teams wanting to use that account would be able to do so (but only for caching repos - ones local private repos become available, they would need to make their own ArtifactoryServiceAccount to make use of them). This would reduce the amount of load on the operator (since provisioning a separate account in each tools namespace is not longer necessary) and would reduce the load on the Platform Team (since now there's just the one default account that we're responsible for) and any team that wants to use their own separate accounts for greater security is still absolutely free to do so. And it avoids the pitfall of us taking on ownership and responsibility for objects in a team's namespace, because this is a cluster setting and therefore should be our responsibility.

esune commented 3 years ago

Thanks for the detailed explanation @caggles, what you're bringing up makes sense to me. And just to clarify: my point was just to facilitate a pattern that is recommended, if not somewhat enforced, for BCGov projects - not trying to shift responsibility between patform and every team.

j-pye commented 3 years ago

@j-pye What is your opinion about whether pull secrets should or should be managed by ArgoCD?

TL;DR: No. Archeobot could link the secret to any service accounts for building but shouldn't touch any other namespace aside from tools. A cluster wide pull secret for the cache would be useful and reduce the issues we have with DockerHub limit spam in the event logs. (I've done this with other registries that require credentials). We could create something to automate the linking of service accounts for pulling from tools to the other environments but this should be outside of Archeobot and ArgoCD so that teams can opt-in and out. (Thinking out loud: A stand alone operator might be an option but further discussion is required and for now I'd consider Wade's script a good team driven option)

@mitovskaol While there are things we can do with ArgoCD to link the artifactory secret to the default service account, this is a feature for the Archeobot (Artifactory Operator). This is something that could be added to the ArtifactoryServiceAccount object. This is because we don't know the secret name at provisioning, when an ArtifactoryServiceAccount object is created in the tools namespace Archeobot picks it up and generates the credentials (secret) in that namespace with a unique name. This means ArgoCDs provisioning rate (sync) would be tied to the speed of Archeobot. To add to that, it's also not simple to patch core OpenShift objects with ArgoCD such as the default service account.

EDIT: Just read @caggles response, I agree that the linking between namespaces should not be on Archeobot, I wouldn't be against linking the ArtifactoryServiceAccount to the builder service account though for any build related tasks. As Cailey said, we're working on performance improvements for that specific operator though so this is not the time to add this feature.

1) Preconfigured role bindings Not all teams use the tools/dev/test/prod model and the Platform Services Team prefers that role bindings be managed by project teams.

I wasn't aware this was (still) possible, as I was under the impression that the tools/dev/test/prod pattern was strongly recommended, if not even enforced in some way. I would play devil's advocate and say that if the majority of teams is following this pattern, it might be beneficial to have the role bindings pre-populated based on the recommended behaviour. I might however be missing something that could cause grief in the long term.

@esune @mitovskaol This is a good point. We should probably create a document noting what is and isn't enforced for project sets. The tools/dev/test/prod namespaces are the static project set structure, along with the default deny Network Policies. How those namespaces are used is up to the team using them though. We try to enforce as little as possible. Unless you're causing issues for others, I doubt we'll reach out to enforce specific usage of the namespaces.

Our training materials are often geared towards that default 4 namespace structure and use though so there's benefit in sticking with the default structure/usage so you don't have to come up with workaround to fit custom use.

To your last point in the issue body, I also don't like the extra clicks to get my token but you can bookmark the token page and go straight to it if that helps.

BCDevOps / OpenShift4-Migration

OpenShift nice-to-have feature list #73