NetApp / trident

Storage orchestrator for containers
Apache License 2.0
753 stars 219 forks source link

Allow TridentBackendConfig to be incoperated into helm chart as CRD #704

Open dc232 opened 2 years ago

dc232 commented 2 years ago

Describe the solution you'd like Helm chart should have the ability to incorporate the CRD of TridentBackendConfig this definition seems to be missing from the helm chart making it difficult to incorporate into CI/CD systems

on top of this, it looks like there is means to install the existing CRDs from the helm chart

Describe alternatives you've considered so essentially there should be a field called

extraDeploy which is an Array of extra objects to deploy with the release

an example of which can be taken from https://github.com/bitnami/charts/tree/master/bitnami/grafana/#installing-the-chart

which would allow the backend to be rendered alternatively it can be included in the helm chart as a template to be populated via the values.yaml file https://github.com/bitnami/charts/tree/master/bitnami/cert-manager where the key value pair of installCRDs: true

is used in this instance

in terms of the CRDs for installation this should be set in the values.yaml an example of this can be found in

Additional context in addition to the above, I feel that the trident installer needs to be bundled with the helm chart in the form of kubernetes manifest in this case, there does not seem to be much in terms of documentation around getting the TridentBackendConfig CRD functioning as it seems to be committed from the helm chart making documentation like https://netapp-trident.readthedocs.io/en/latest/kubernetes/operations/tasks/managing-backends/tbc.html#step-2-create-the-tridentbackendconfig-cr effectively useless unless tridentctl is used as per https://docs.netapp.com/us-en/trident/pdfs/sidebar/Configure_backends.pdf

this and to help with CI/CD system a helm repo is also beneficial which can be seen up via https://helm.sh/docs/topics/chart_repository/

and avoid errors such as

kubectl : error: unable to recognize "testbackend.yaml": no matches for kind "TridentBackendConfig" in version "trident.netapp.io/v1"

so instead of having to extract the helm charts from the release in github the changes are passed via CI/CD into the chart repo and downloaded in the form of a URL such as

https://charts.bitnami.com/bitnami

also a tutorial on how to deploy TridentBackendConfig in yaml would be useful in the form of a youtube video most tutorials seem to focus on tridentctl

balaramesh commented 2 years ago

@dc232 thank you for posting this. Today, backends can be created by 2 means: tridentctl, kubectl [Using tridentBackendConfigs. I understand that your ask is to create backends as part of a Helm install. Is that right? The best way to do it right now would be to a.Install Trident with Helm b. Create backends with kubectl or tridentctl post-install.

  1. Have you taken a look at the instructions for creating backends using TridentBackendConfigs? I noticed you referenced our docs from its previous home. Did you see it linked somewhere?
  2. Trident's Helm chart is hosted on a repository. Instead of downloading the installer from our GitHub repo, you can pull the Helm chart and install
dc232 commented 2 years ago

Hey @balaramesh so i'm trying to create the backend through azure pipelines and a terraform template

example code

data "template_file" "netapp_backend_config"{
    depends_on = [kubernetes_secret_v1.netapp]
  template = file("${path.module}/BackendCRD/backend-anf.yaml")
  vars = {
      # "clientID" = data.azurerm_client_config.current.client_id
      # "clientSecret" = var.netappclientsecret
      "metadataname" =  "nett-app-backend"
      "namespace" = kubernetes_namespace.netapp.metadata[0].name                     
      "storagedrivername" = "azure-netapp-files"   
      "subscriptionid" = data.azurerm_client_config.current.subscription_id
      "tenantid" = data.azurerm_client_config.current.tenant_id
      "location" = var.location
      "servicelevel" = var.servicelevel
      "secretcredentails" = kubernetes_secret_v1.netapp.metadata[0].name
      "subnet" = var.subnet
      "virtualNetwork" = var.virtualNetwork
      "nfsMountOptions" = "nfsvers=4"
      "backendName" = var.backendName
    }
}
resource "kubectl_manifest" "netapp_production_backend_config" {
    yaml_body = data.template_file.netapp_backend_config.rendered
    depends_on =[kubernetes_secret_v1.netapp]
}
apiVersion: trident.netapp.io/v1
kind: TridentBackendConfig
metadata:
  name: ${metadataname}
  namespace: ${namespace}
spec:
  version: 1
  storageDriverName: ${storagedrivername}
  subscriptionID: ${subscriptionid}
  tenantID: ${tenantid}
  location: ${location}
  serviceLevel: ${servicelevel}
  virtualNetwork: ${virtualNetwork}
  subnet: ${subnet}
  backendName: ${backendName}
  nfsMountOptions: ${nfsMountOptions}
  credentials:
    name: ${secretcredentails}

however this approach results in errors such as

kubectl : error: unable to recognize "testbackend.yaml": no matches for kind "TridentBackendConfig" in version "trident.netapp.io/v1"

when the pipeline is run For some reason if the templated yaml is applied outside of the pipeline the backend then get applied once the helm chart is applied as expected, this is slightly perplexing as I have templated cert-manager CRD in the same manner and have it apply in the cluster without error

The documents that I have found most useful in terms of the backend configuration have been https://docs.microsoft.com/en-us/azure/aks/azure-netapp-files https://netapp-trident.readthedocs.io/en/latest/kubernetes/operations/tasks/backends/anf.html

Thank you for the instructions link on the backend I will be sure to take a look

in terms of the

helm repo add netapp-trident https://netapp.github.io/trident-helm-chart

I have tried this with the terraform

resource "helm_release" "netapphelm" {
    depends_on = [
      kubernetes_namespace.netapp,
      kubectl_manifest.deploy_trident_orchestrator_crd
    ]
    name  = "netapp-trident-operator"
    namespace = kubernetes_namespace.netapp.metadata[0].name
    repository = "https://netapp.github.io/trident-helm-chart"
    chart = "trident-operator"
}

however this didn't seem to work, I will double check this tomorrow however can you confirm if the params ae correct for the pulling of the chart?

also when speficing a backend i noticed i got the erorr

Warning  Failed  64s                trident-crd-controller  Failed to create backend: problem initializing storage driver 'azure-netapp-files': error initializing azure-netapp-files SDK client. azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/providers/Microsoft.ResourceGraph/res
ources?api-version=2021-03-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to ap
p '{ommited}'.\r\nTrace ID: {ommited}\r\nCorrelation ID: {ommited}\r\nTimestamp: 2022-02-14 09:42:35Z","error_codes":[7000215],"timestamp":"2022-02-14 09:42:35Z","trace_id":"{ommited}","correlation_id":"{ommited}
","error_uri":"https://login.microsoftonline.com/error?code=7000215"} Endpoint https://login.microsoftonline.com/{ommited}/oauth2/token?api-version=1.0; Error(s) after resource discovery: no capacity pools found for storage pool netappbackendcsi_pool

Does the storage pool get created via the CSI or does it need to exist prior? Would this be the name of the capacity pool in azure net app files?

balaramesh commented 2 years ago

The error that you don't have the TridentBackendConfig spec is mostly caused by the fact that the custom resource is defined by the operator. This happens after a successful deploy using the Helm Chart. Maybe you need to delay the creation of a backend by a few seconds/minutes?

The parameters of the chart look ok to me. Your second error's probably caused by interchanging the client ID and secret from Azure, as it states.

dc232 commented 2 years ago

@balaramesh yeah that's makes sense I did try this technique via

resource "time_sleep" "wait_30_seconds" {
  depends_on = [helm_release.netapphelm]
  create_duration = "240s"
}

unfortunately I recived the same error :(

balaramesh commented 2 years ago

I think you should give a larger wait-time a try, or try to create a backend post-deploy (you could do this by running a command to get the version of Trident)

dc232 commented 2 years ago

@balaramesh i can try 10 minutes maybe, tried 6 minutes but got the same error? is the command for the version of trident

kubectl describe torc trident
balaramesh commented 2 years ago

Yes. You should look at the status of the torc object. When it reports itself as "Installed", you are ready to create a backend.

dc232 commented 2 years ago

@balaramesh tried a 10 minute timmer this time sadly i got the same result :(

╷
│ Error: netapp/nett-app-backend failed to create kubernetes rest client for update of resource: resource [trident.netapp.io/v1/TridentBackendConfig] isn't valid for cluster, check the APIVersion and Kind fields are valid
│ 
│   with module.Netapp.kubectl_manifest.netapp_production_backend_config,
│   on Netapp_module\helm.tf line 92, in resource "kubectl_manifest" "netapp_production_backend_config":
│   92: resource "kubectl_manifest" "netapp_production_backend_config" {
│ 
╵

The helm chart link seems to work fine thank you for that

ryangrush commented 2 years ago

@dc232 were you ever able to find a solution for this?

dc232 commented 2 years ago

Hi @ryangrush I did in the end it was a bit of hack as i needed to create a service account user, i also found that because of a bug in Linux nfs4 doesn't mount natively so i managed to get it working with nfs3 i'll post the terraform module for this on my gitub for people to use in thier CI/CD pipelines

dc232 commented 2 years ago

@ryangrush have a look at https://github.com/dc232/Terraform-Trident-Config i haven't run terraform validate against it yet as its a little tricky but should get the job done with a few tweaks

ryangrush commented 2 years ago

@dc232 thank you for uploading the repo! There's a scarce few resources online for this.

I was able to manually install Trident last week, but I'm still having problems getting it to work with Terraform. This part doesn't seem to pass validation (at least not in TF v1.1.6), do you remember if that was needed?

dc232 commented 2 years ago

Hi @ryangrush so the idea in that line is to render out the YAML in memory so i was trying to get the raw output from the get request as that was the file that was needed, when you perform the validate what's the error that you get, I have updated the repo slightly to remove the empty depends on

dc232 commented 2 years ago

Hi @ryangrush added a fix to it as it was using the wrong resource type, try now

ryangrush commented 2 years ago

@dc232 I was able to use data.http.crds.response_body now, thanks. I also noticed a small syntax error here btw.

I saw you had run into something similar back in February, but I kept running into this error -

 Error: tridentorchestrators.trident.netapp.io failed to create kubernetes rest client for update of resource: Get "http://localhost/api?timeout=32s": dial tcp [::1]:80: connect: connection refused
│ 
│   with module.netapp.kubectl_manifest.deploy_trident_orchestrator_crd,
│   on modules/netapp/main.tf line 72, in resource "kubectl_manifest" "deploy_trident_orchestrator_crd":
│   72: resource "kubectl_manifest" "deploy_trident_orchestrator_crd" {

I did manage to get it working yesterday, but while in the process of adding the changes to a PR and applying it to our "sandbox" AKS cluster it gets stuck on that error again. The key to getting it to work yesterday was adding a kubernetes_persistent_volume_claim resource to my proof-of-concept.

This is the TF code currently, with some of the other resources removed until I get past this problem. The Azure App for var.azure_client_id has the API permissions listed in the screenshot.

Do you remember how you resolved that error message? The only thing I can think of is a permissions issue but I'm also quite mentally fatigued at this point too lol.

dc232 commented 2 years ago

Hey @ryangrush thanks for pointing out the error I have updated it to make it compatible with the new trident version via https://github.com/NetApp/trident/issues/716

i don't remember exactly how i got working in the past it however i suspect its via the kubectl provider via gavin bunny it has a bit of funny config so when i created this project i did it this way

provider "kubectl" {
  host                   = coalesce(module.MianModule.kown_kube_host, module.MianModule.kube_host)
  client_certificate     = base64decode(coalesce(module.MianModule.client_certificate_file, module.MianModule.kube_client_certificate))
  client_key             = base64decode(coalesce(module.MianModule.client_key_file, module.MianModule.kube_client_key))
  cluster_ca_certificate = base64decode(coalesce(module.MianModule.cluster_ca_certificate_file, module.MianModule.kube_cluster_ca_cert))
  load_config_file       = false #this is true by deafult see #https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs
}

the docs for the provider also suggests you can load the kubeconfig directly so something like

provider "kubectl" {
  load_config_file       = true
  config_path    = "~/.kube/config"
  config_context = "my-context"
}

i think how i got round the error in the end was by setting load_config_file = false in my case

these docs should be able to help with your use case https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs

The kubernetes_persistent_volume_claim shouldn't be required if your using the CSI, all that is required is to set the name of the storage class it should create the persistent volume and the claim automatically for you when using using something like helm charts its only when you creating a single stand alone deployment that a kubernetes_persistent_volume_claim would be required

Hope this helps, I feel your pain if you want an more assistance let me know

ryangrush commented 2 years ago

@dc232 ok thanks, I'll look into the kubectl provider angle.

One of the few differences between the PoC I stood up yesterday and integrating it into our main Terraform repo is that some providers are defined elsewhere, so that could be inline with the kubectl theory. Also its using a dedicated TF Azure App for it's identity, I was using my personal Azure user as identity for the PoC.

Thanks for the code snippets and link.

ryangrush commented 2 years ago

@dc232 it looks like load_config_file = false and defining the provider "kubectl" block was key in getting it to play nice with our other Terraform code. I think I've managed to finally get it working, thanks again for everything!

It does look like its still dependent on the kubernetes_persistent_volume_claim resource being defined however. I tried removing that resource and just referencing the storage class but it says 0/1 nodes are available: 1 persistentvolumeclaim "storage-class-netapp-pr" not found.

Here is my TF code for what its worth -

volume {
  name = "netapp-pr"

  persistent_volume_claim {
    claim_name = "storage-class-netapp-pr"
  }
}

volume_mount {
  name       = "netapp-pr"
  read_only  = false
  mount_path = "/data-netapp-pr/"
}