Support for automated testing for k8s and odr

Currently we have one application (busybox) and kustomizations to deploy it manually on regional-dr or metro-dr using rbd or cephfs storage.

For automated testing we need subscription kustomization for each application variant (rdr-rbd, rdr-cephfs, mdr-rbd).

The layout should make it easy to add more applications (e.g. busybox statefulset, busybox daemonset, kubevirt vms with pvc, data-volume, or data-volume-template).

How it should work

Nothing is shared between applications, so we can deploy and undeploy multiple applications concurrently. Example use case - test all applications variants concurrently on the same cluster.
Split application resources and ocm resources. Application resources are different for each application. All application can share the same subscription by kustomizing the subscription path and channel.
Add kustomization for particular DR type (rdr, mdr), storage type (rbd, cephfs) and cluster type (odr, k8s).
Uniform names so we can build applications and subscription names programatically using a template (e.g. "subscription/{platform}/{application}-{topology}-{storage}")
Don't include the drpc - current ramen tests generate the drpc since it needs to many changes (drpolicy, namespace, subscription, placement, pvc selector)

Suggested layout

busybox-deployment/
    odr-rdr-rbd/
        kustomization.yaml
    odr-rdr-cephfs/
        kustomization.yaml
    odr-mdr-rbd/
        kustomization.yaml
    k8s-rdr-rbd/
        kustomization.yaml
    deployment.yaml
    kustomization.yaml
    pvc.yaml
subscription/
    odr/
        busybox-deployment-rdr-rbd/
            kustomization.yaml
        busybox-deployment-rdr-cephfs/
            kustomization.yaml
        busybox-deployment-mdr-rbd/
            kustomization.yaml
    k8s/
        busybox-deployment-rdr-rbd/
            kustomization.yaml
    binding.yaml
    channel.yaml
    kustomization.yaml
    namespace.yaml
    placement.yaml
    subscription.yaml

Usage in OpenShift console

When selecting application path use one of:

busybox-deployment/odr-rdr-rbd
busybox-deployment/odr-rdr-cephfs
busybox-deployment/odr-mdr-rbd

Usage in automated tests

When deploying a subscription in automated tests use one of these:

Subscriptions for odr tests:

subscription/odr/busybox-deployment-rdr-rbd
subscription/odr/busybox-deployment-rdr-cephfs
subscription/odr/busybox-deployment-mdr-rbd

Subscriptions for k8s tests:

subscription/k8s/busybox-deployment-rdr-rbd

The proposal looks good to me. This will make adding applications to the samples repository very easy. Thanks!

Broadly agree to the scheme except actually creating the directories for every combination of the subscription and workload, as that would leave a lot of directories and would impact the repository readability IMHO (more below).

For the actual workloads itself the scheme above is fine, just fine grained it as below:

│   └── workloads
│       └── busybox-deployment
│           ├── base
│           │   ├── busybox-deployment.yaml
│           │   ├── busybox-pvc.yaml
│           │   └── kustomization.yaml
│           └── odf-regional-rwo
│               └── kustomization.yaml

In the above, odf-regional-rwo is for Ceph-RBD and we can have 2/3 more directories for the variations, rwx and metro. These are for use with the ACM console as is. (others would be odf-regional-rwx, odf-metro-rwo, odf-metro-rwx)

Further I suggest we do not provide any more ACM console ready workload directories for other types, and reduce clutter. Instead we can kustomize the workloads as deployed by Subscriptions or ApplicationSets as below.

Given a workload, we want to potentially kustomize the following for the workload:

PVCs StorageClass name
PVCs AccessMode
Workload namespace
Common workload label
Workload resources suffix

All of the above for Subscriptions [1] and ApplicationSets [2] can be achieved using the workload kustomization specification in these resources.

Based on the above, other workload form factors can be kustomized when added from the console, IOW the Subscription or the ApplicationSet YAML can be edited according to the environment. This reduces clutter in the repository as other wise these 4 directories will keep repeating itself for every workload.

Further as we move forward with clusters that consume ODF created storage instances from another cluster, the StorageClass names would change and be non-specific, hence moving towards providing these values from the Subscription would be more usable than actually having them in the repository hard coded.

For the Subscriptions themselves the structure laid out is fine (with the change to add a base):

│   ├── subscriptions
│   │   ├── base
│   │   │   ├── binding.yaml
│   │   │   ├── kustomization.yaml
│   │   │   ├── subscription.yaml
│   │   |   └── placement.yaml
│   │   └── busybox-deployment
            └── kustomization.yaml

Again here I suggest we do not provide overlays for every combination that includes partial hard-coded paths, and instead provide a base kustomization, in subscriptions/busybox-workloads for example, that contains rules to kustomize the resources deployed to the hub, and to kustomize the workload resources as above.

For hub resources we would want:

Workload namespace
Common workload label
Workload resources suffix

Now, using this from automated tools could be as follows:

e2e or basic-test
- Instead of creating config files for all combinations, let the tools provide options to choose:
- --workload
  - The values for these are already known (i.e workloads is all workloads in the samples and such
- --storageclassname --PVCModes
- Assume or default the namespaces, labels and suffix or provide options
- IOW, let there be known configs in code than as files

The DRPC itself needs:

Policy: make this a user input
PVC label selector: Can be formed based on earlier inputs from labels

The DRPC being a part of the repository is useful, as it serves as an example to keep the hub resources declarative as well.

I think we should discuss this a little more and close on it.

[1] Subscription workload kustomization: https://github.com/open-cluster-management-io/multicloud-operators-subscription/blob/main/docs/gitrepo_subscription.md#kustomize

[2] ApplicationSets workload kustomization: https://argo-cd.readthedocs.io/en/stable/user-guide/kustomize/

Broadly agree to the scheme except actually creating the directories for every combination of the subscription and workload, as that would leave a lot of directories and would impact the repository readability IMHO (more below).

But this is the goal of this work - making it easy to test:

deploying an application directly. This is required for developing the application or for testing protecting the application using imperative apps support
deploying an application via ocm - make it easy to test the ocm resources and to debug issues with ramen restore

Enabling DR for an application is different, the drpc requires too much customization to adapt to the application, subscription or applicationset, cluster names etc. I don't plan to provide a working drpc for every sample. This is best done by a tool, and currently implemented in drevn.test module: https://github.com/nirs/ramen/blob/7aae6e1d7af362efbc6a1f28a6340444b7594d6a/test/drenv/test.py#L167

We need to agree on this goal - if you want to keep this repository clean than this is not the right place to keep the testing resource, and we need another repo.

For the actual workloads itself the scheme above is fine, just fine grained it as below:

│   └── workloads
│       └── busybox-deployment
│           ├── base
│           │   ├── busybox-deployment.yaml
│           │   ├── busybox-pvc.yaml
│           │   └── kustomization.yaml
│           └── odf-regional-rwo
│               └── kustomization.yaml

Looks nicer this way.

In the above, odf-regional-rwo is for Ceph-RBD and we can have 2/3 more directories for the variations, rwx and metro. These are for use with the ACM console as is. (others would be odf-regional-rwx, odf-metro-rwo, odf-metro-rwx)

Why use the pvc access mode instead of the storage class name? This makes it harder to use for testing. We know that we have rbd and cephfs on ocp, and rbd on drenv. It is easy to pick the right configuration when you want to run a test. With access mode, I don't know which version can be used on which cluster.

Maybe this is again something which is better for the samples use case and not for managing set of testing configurations?

Further I suggest we do not provide any more ACM console ready workload directories for other types, and reduce clutter. Instead we can kustomize the workloads as deployed by Subscriptions or ApplicationSets as below.

But this means we don't have a way to test deployment without OCM. I think this is the wrong trade-off, optimizing for cleaner repository instead for ease of use for developers.

Given a workload, we want to potentially kustomize the following for the workload:

PVCs StorageClass name

PVCs AccessMode

Workload namespace

Common workload label

Workload resources suffix

Also pvc selector (later volume snapshot selector and imperative apps selectors). This is the current configuration for a workload: https://github.com/nirs/ramen/blob/test-path/test/basic-test/config.yaml#L5

All of the above for Subscriptions [1] and ApplicationSets [2] can be achieved using the workload kustomization specification in these resources.

Based on the above, other workload form factors can be kustomized when added from the console, IOW the Subscription or the ApplicationSet YAML can be edited according to the environment. This reduces clutter in the repository as other wise these 4 directories will keep repeating itself for every workload.

If we need to edit yamls manually at deploy time we failed to provide good way to test. My goal is to eliminate these manual steps, so it is easy to reproduce the same workload using shared configuration.

Further as we move forward with clusters that consume ODF created storage instances from another cluster, the StorageClass names would change and be non-specific, hence moving towards providing these values from the Subscription would be more usable than actually having them in the repository hard coded.

This is a big usability issue if you cannot have working workloads and need to customize them manually for every deployment. If this is only about storage class name it can be solved by forking the repo and creating a version with the right storage class for your specific setup. If this is something we test regularly I expect to keep ready configuration for testing for this variant.

For the Subscriptions themselves the structure laid out is fine (with the change to add a base):

│   ├── subscriptions
│   │   ├── base
│   │   │   ├── binding.yaml
│   │   │   ├── kustomization.yaml
│   │   │   ├── subscription.yaml
│   │   |   └── placement.yaml
│   │   └── busybox-deployment
            └── kustomization.yaml

Looks better like this.

Again here I suggest we do not provide overlays for every combination that includes partial hard-coded paths, and instead provide a base kustomization, in subscriptions/busybox-workloads for example, that contains rules to kustomize the resources deployed to the hub, and to kustomize the workload resources as above.

For hub resources we would want:

Workload namespace

Common workload label

Workload resources suffix

Now, using this from automated tools could be as follows:

e2e or basic-test

Instead of creating config files for all combinations, let the tools provide options to choose:

--workload

The values for these are already known (i.e workloads is all workloads in the samples and such

--storageclassname --PVCModes

Assume or default the namespaces, labels and suffix or provide options

IOW, let there be known configs in code than as files

Having to customize using a tool means we cannot test resource using kustomize build and we cannot apply the resources using kubectl apply -k. The only way to use them will be via the tool.

The DRPC itself needs:

Policy: make this a user input

PVC label selector: Can be formed based on earlier inputs from labels

The drpc needs more - this is the current implementation: https://github.com/nirs/ramen/blob/7aae6e1d7af362efbc6a1f28a6340444b7594d6a/test/drenv/test.py#L190

This is the reason I don't want to depend on static drpc resource in the repo, and instead generate it for every deploy.

The DRPC being a part of the repository is useful, as it serves as an example to keep the hub resources declarative as well.

Agree, but maybe one example is good enough, one that need to be modified to match the application and cluster.

We can also add a drpolicy and drcluster samples to match the sample drpc. They will also have to adjusted to the actual cluster (e.g. managed cluster names).

Notes from discussion with Shaym and Talur:

Need to test customizing workloads via subscription or applicationsets
We wan to improve the testing tool so it can accept arguments and customize the workload without maintaining static kustomization for every permutation
For 4.16, we will add kustomizations for workloads, subscriptions
We will keep the drpc customized for at least one workload, this will server as example (test generate the drpc)
We want to group workloads:
- core (better name?)
- README.md
- deployment
- statefulset
- daemonset
- kubevirt
- README.md
- vm-pvc
- vm-dv
- vm-dvt
- Kafka?
- MongoDB?
Documenting main DR concepts and flows can be done in a common (main?) README.md. Every workload will have a more specific README as needed.

RamenDR / ocm-ramen-samples