[Feature Request] Define a list of mandatory and of optional (supported) SCS OpenStack services

garloff commented 9 months ago

https://github.com/SovereignCloudStack/issues/issues/396#issuecomment-1852491416 states:

Since there is no defined list of supported and allowed OpenStack services in SCS yet, the standard cannot rely on delivering and maintaining full policy configuration templates - there are just too many services.

From the standardization view, we only require the OpenStack core services needed for the OpenStack powered Compute trademark certification: CORE=keystone, nova, glance, cinder, neutron. So technically, that's all we require (maybe plus placement, as current nova does not normally work without it). Judging from the desire to run openstack-health-monitor and k8s-cluster-api-provider resp. cluster stacks, we could construct a requirement to have octavia (LBaaS v2) as well, but we don't currently state this explicitly. Given that the main purpose for the IaaS layer is to be a good platform for the k8s layer on top, this is a strong incentive though. Nor do we explicilty require horizon. Looking at the existing SCS installations and some of the defaults in OSISM, we could argue that designate (DNS) should be part of the list, barbican (secrets) and maybe heat and swift. Some partners also run skyline, senlin, ironic, magnum. For designate and heat, the OpenStack interop group has also defined optional standards that we can reference. Elsewhere we state that the s3 protocol must be supported (without requiring it to be connected to keystone, though we should), but that's not in a stabilized standard either nor is it referenced by a certification scope yet.

IMVHO, we should define two boundaries:

What is the minimum set of services that we require? This could e.g. core(+placement)+octavia+s3 (connected to keystone).
What is the set of services fully supported by our reference implementation, i.e. what do we care about ensuring we test it, we fix it in case of issues, we take care about consistency in role handling? This could be minimum+horizon+barbican+designate +swift+heat.

I guess there is potential for discussions:

Standards (=minimal set, 1):

yaook does not normally come with octavia (breaking cluster-API in its standard setup on OpenStack at least)
some providers might not be able to connect external S3 services to keystone ec2 credentials

Supported (=full set, 2):

some providers might want to add masakari or magnum or ironic or ?
some providers might want to replace horizon by skyline or have both?
maybe we want to drop heat -- how much incremental effort do we have in continuing to maintain and test it vs. real-world usage here ...
some providers might want us to have a usage/metering solution included here ... (ceilometer/gnocchi/VP13/... )

My guess is that we should look at this in the IaaS team, in the std/cert SIG and have an additional round with providers and ideally one with users. We need to balance the desire to find broad consensus (which tends to result in a lowest common denominator) with the desire to create something that is a meaningful baseline for developers building on top.

josephineSei commented 8 months ago

As a starting point, I am working on a hedgdoc^1, where we can simply put all recent and active OpenStack services into lists. There are more then I thought of so I tried to look through various repositories to tell, how "active" they are.

markus-hentsch commented 8 months ago

Small nitpick: we should rename "oss". It is confusing as OSS is widely used for Open Source Software. Classification can stay the same, we just need a better classification category name for this. Simply use "supported" or "optional"?

josephineSei commented 8 months ago

Small nitpick: we should rename "oss". It is confusing as OSS is widely used for Open Source Software. Classification can stay the same, we just need a better classification category name for this. Simply use "supported" or "optional"?

I edited it to say supported. And I put Murano on the unsupported list, because it is inactive and there was a OSSN, that it has a severe security issue that will be disclosed in the future. Everyone should stop using it: https://wiki.openstack.org/wiki/OSSN/OSSN-0093

josephineSei commented 8 months ago

We put all the Services in different lists as a first draft, we will discuss this in the IaaS meeting.

mbuechse commented 7 months ago

@josephineSei maybe this list from the reference implementation could be of interest? https://github.com/SovereignCloudStack/issues/issues/466#issuecomment-2021448174

markus-hentsch commented 7 months ago

A thought that came to my mind during https://github.com/SovereignCloudStack/standards/issues/541 - what about the Cinder Backup service?

It is not a dedicated service but an optional subcomponent of Cinder^1. It's not default and requires a properly configured storage target^2. Its existence influences whether the volume backup functionality is available for users.

Nils98Ar commented 7 months ago

I think manila should be a supported optional project.

michaelbayr commented 7 months ago

We (artcodix) would be happy to see manila on the supported list as well. Also Masakari (Instances High Availability service) is something that would be of great interest to us. Lastly Senlin is something that we want to offer to our customers, since its functionality is something that most hyperscalers offer. Thanks for giving us the chance to participate!

artificial-intelligence commented 7 months ago

I have a question - sorry for being late - regarding the definition of an active project, the linked hedgedoc states:

All currently active Openstack projects should be found in one of the lists. Activity is reached, when there is a team with a PTL working on it.

I don't know exactly how you track that a team is working on a project, do you look at commits being merged, review statistics or something else? That information is missing from this document, as far as I can see.

Just in case you just look at a PTL being present I wanted to add the following, maybe surprising, information:

A PTL does not necessarily need to be a technical person or someone who regularly works on the code base itself. The PTL is mostly an organizational role which keeps track of release management tasks and organizational issues like hosting meetings etc. see this document for an incomplete list of tasks of a PTL:

https://docs.openstack.org/project-team-guide/ptl.html#recurrent-tasks

So while many PTLs are in fact doing also a lot of the technical work, this does not need to be the case for all projects.

So my question would be, how was a project deemed active? A PTL being there is a necessary but no sufficient precondition, I would argue.

As said elsewhere, but to reiterate again:

The addition of lifecycle tools and their judgement seems dubious at best and I don't know why they are included and the comments added to them are very unclear to me in their meaning.

Either we certify/standardize the outcome of a cloud deployment (doing end to end tests, if a desired feature is there and works), then the deployment tool doesn't matter, as long as it's working and not actively harming the standard.

Or we think it matters how certain stuff is achieved under the hood, e.g. cryptographic algorithms being used for e.g. TLS endpoint security, which is usually enforced via deployment tooling, be it kolla or k8s.

So I find the current state of the document with regards to deployment tooling unconvincing and needing more explanations as a minimum.

neuroserve commented 7 months ago

As I got the call only yesterday I apologize for commenting so late.

Apart from the "mandatory" and central components I want to mention others, that pluscloudopen is using or planning to use

Barbican (for encrypted volumes and octavia ssl certs, I guess)
Ceilometer + Gnocchi (probably in the near future)
Designate (pretty much needed for decent DNS service)
Heat (is offered - but I'm not sure whether it is used)
Horizon (obviously)
Octavia (at least as long as ovn loadbalancing is not catching up)
Ironic (might be interesting if demand for dedicated compute arises)

pluscloudopen is interested in using

Manila (if a high available solution based on ganesha/cephfs would be available - even customers using k8s seem to need rwx-storage every now and then)
Masakari (automated Instance restart on another hypervisor is a often requested feature which seems to be covered by Masakari - if there are other solutions, it would not be needed)
Senlin (seems to be the base of some auto-scaling solutions apart from k8s - which would obviously be interesting for us)

I have seen other projects out of the "Unsupported" category being used or mentioned by other Cloudproviders using Openstack:

Adjutant (see below)
Mistral (maybe only one of Adjutant or Mistral - or even both - were/are used in Catalyst Cloud for Account Setup (https://github.com/catalyst-cloud/adjutant-odoo) and - but I'm not sure where - one of them was used for setting up 2FA IIRC).
Magnum (with its k8s as-a-service based on Magnum and cluster-api Vexxhost has made Magnum worth looking at again)
Zun/Kuryr (is also an interesting combination for easy deployments of applications in docker containers which have no complex orchestration requirements)

Regarding the list itself: What is the lifecycle of the list? As SCS is following upstream projects: What happens if "mandatory" or "optional" projects become "inactive" or "importance" is no longer deemed so "high"?

markus-hentsch commented 7 months ago

Thanks for the feedback! I started adding remarks and shifting services around in the list based on the remarks. Will attempt to finalize the draft with @josephineSei until next IaaS community call.

markus-hentsch commented 7 months ago

Regarding the list itself: What is the lifecycle of the list? As SCS is following upstream projects: What happens if "mandatory" or "optional" projects become "inactive" or "importance" is no longer deemed so "high"?

Since standards are versioned in SCS, we might deprecate services later on or add new ones in later iterations of the standard.

Main goal of the standard for now is to set a scope to the current standardization of SCS. We have a few things (#396 comes to mind) that are more or less blocked until we know exactly how much we need to address with those, which has impact on the standards in terms of what they need to cover.

[...] The addition of lifecycle tools and their judgement seems dubious at best and I don't know why they are included and the comments added to them are very unclear to me in their meaning. [...]

I agree. They were initially part of the main lists because they are usually listed when looking up OpenStack services. Then we deemed them out of scope because deployment and lifecycle management are mostly unrelated to what we standardize in most standards. They are just the means to an end. As you said, it does not matter as long as it does not actively harm the standard.

When we started removing them from the main lists, we put them into a separate category at the bottom just so that they are archived there. They don't serve a purpose in the mandatory/supported/unsupported categorization right now and will most likely not be part of the resulting standard or decision record at all.

markus-hentsch commented 7 months ago

@artificial-intelligence

I have a question - sorry for being late - regarding the definition of an active project [...]

I see your point. The "active" marker was just a helper for us to categorize the large amount of possible components/services a bit more quickly initially as well as helping us to decide when we were on the fence between two categories for a service.

I don't think the resulting decision record regarding the lists of supported services will include this "active" categorization at all.

josephineSei commented 6 months ago

@artificial-intelligence as @markus-hentsch mentioned: we originated from one long list of all services and life-cyclse tools and we wanted to get an overview over all of them, so I looked into the activity of the repositories to see, whether a project is active at all. This was just meant as a first coarse definition of: maybe these inactive projects should not be included in our list.

We rated the importancy of lifecycle tools, when they were still part of that huge list - after they are separated and will not be needed in the final lists of mandatory and supported OpenStack-services, I nearly forgot about them. But to not confuse anybody anymore I removed the importancy point from the lifecycle tools (so we now only state whether they are active or retired.) This reflects the freedom of usage of those tools way better.

josephineSei commented 6 months ago

I looked into Senlin again: I am a bit hesitant to put a leaderless project on a supported list, especially as @neuroserve and @michaelbayr only indicated that they wanted to use this service. So is there someone who already uses it?

artificial-intelligence commented 6 months ago

senlin is not only leaderless, it is marked "inactive" upstream, see: https://governance.openstack.org/tc/reference/emerging-technology-and-inactive-projects.html#current-inactive-projects

that means there is no one dealing with security issues etc. someone needs to step up and maintain it to actually be useful, this would also include keeping the upstream CI green, which is not a small task itself.

I don't think we should advertise projects in any form that really are not maintained currently, as useful as they might be. maybe add them under "use at your own risk"?

We can't fix bugs in those projects and I think it's wrong to create the impression that any software project "just works". you always need maintenance to happen, even if a project is "finished" (bumping dependencies, fixing resulting CI- and general-bugs, fixing security issues, as a minimum).

josephineSei commented 6 months ago

Yesterday evening there were a few mails on the openstack-discuss mailing list, that will retire certain projects. I updated the list accordingly. There were only services on the unsupported list retired. One of them is Senlin.

So this just makes it clear again Senlin is retired and thus will not be moved on the supported list.

anjastrunk commented 6 months ago

How do we document mandatory services? Do we write a standard or decision record? At least, we will have a test script at the end... I prefer to write a standard. A change in list of mandatory services, will cause a new version of standard.

josephineSei commented 6 months ago

I transferred the guide to be a standard and will work on tests for the mandatory service list.

josephineSei commented 5 months ago

We discussed this PR today in the IaaS Call. Especially what "supported service" means.

What should users expect from these supported services?

that standards refering to overall IaaS also include them (== all services can be enabled to fulfill the standards?)

that these services have been tested for integration?

that these services are part of the reference implementation? (imho that might be too much)

"supported" in the standardization sense can not make statements on the reference implementation

meaning of "supported" in standardization may be limited to ensuring that our standards don't conflict with these

we don't break them with our standards!

renaming "supported" to "recommended"? No: "recommended" has a stronger meaning ...

we originally wanted to separate between

(mandatory) you need these APIs to have a scs-compliant cloud

(supported) you may use these APIs, they can be integrated with scs (e.g. work with the scs role standard)

(unsupported) you may use these APIs on your own risk

hints and recommendations for the implementation may be added to the implementation notes

Generic discussion: We do standardize concrete APIs (the ones that come from OpenStack or K8s or CAPI) to make the standards useful -- alternative implementations for services are possible, but a completely different set of technologies will unlikely implement the SCS APIs.

We need to use the correct terminology "OpenStack Compute API" as opposed to "Nova"

Complexity: These APIs are huge, with lots of optional and deprecated pieces

Need to work through these, tedious

The OpenStack Interop Guideline tests ("OpenStack powered XXX") are a good starting point for this

This is a fairly small set, we need to add to this to produce something that is really useful ("this is the motivation for SCS standardization")

In conclusion the PR should focus mor on APIs than OpenStack services. Which has the downside, that we would also need to clarify for each API endpoint, whether this is mandatory, recommended or optional / not needed.

This is a direction which is worth going for, but this will add a load of complexity and might need specific documents for each service API.

SovereignCloudStack / issues

[Feature Request] Define a list of mandatory and of optional (supported) SCS OpenStack services #528