IBM-Cloud / hpc-cluster-slurm

Apache License 2.0
1 stars 5 forks source link

Errors with Planning in IBM Console #8

Open CN734539 opened 2 months ago

CN734539 commented 2 months ago

Using the following:

{
    "name": "hpcc-slurm-scale-cluster-marvel",
    "type": [
      "terraform_v1.5"
    ],
    "location": "us-east",
    "resource_group": "Default",
    "description": "",
    "tags": ["hpcc", "slurm"],
    "template_repo": {
      "url": "https://github.com/UAlbany/suny-ibm-hpc-slurm"
    },
    "template_data": [
      {
        "folder": ".",
        "type": "terraform_v1.5",
        "env_values":[
          { 
            "TF_CLI_ARGS_apply": "-parallelism=250"
          },
          { 
            "TF_CLI_ARGS_plan": "-parallelism=250"
          },
          {
            "TF_CLI_ARGS_destroy": "-parallelism=100"
          }
        ],
        "variablestore": [
          {
            "name": "ssh_key_name",
            "value": "ibm-hpc",
            "type": "string",
            "secure": false,
            "description":"Comma-separated list of names of the SSH key configured in your IBM Cloud account that is used to establish a connection to the Slurm management node. Ensure the SSH key is present in the same resource group and region where the cluster is being provisioned. If you do not have an SSH key in your IBM Cloud account, create one by using the instructions given here. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-ssh-keys)."
          },
          {
            "name": "api_key",
            "value": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
            "type": "string",
            "secure": true,
            "description": "This is the API key for IBM Cloud account in which the Slurm cluster needs to be deployed. [Learn more](https://cloud.ibm.com/docs/account?topic=account-userapikey)."
          },
          {
            "name": "vpc_name",
            "value": "",
            "type": "string",
            "secure": false,
            "description": "Name of an existing VPC in which the cluster resources will be deployed. If no value is given, then a new VPC will be provisioned for the cluster. [Learn more](https://cloud.ibm.com/docs/vpc)"
          },
          {
            "name": "vpc_cidr_block",
            "value": ["10.92.0.0/16"],
            "type": "list(string)",
            "secure": false,
            "description": "Creates the address prefix for the new VPC, when the vpc_name variable is empty. Only a single address prefix is allowed. For more information, see [Setting IP ranges](https://cloud.ibm.com/docs/vpc?topic=vpc-vpc-addressing-plan-design)."
          },
          {
            "name": "vpc_cluster_private_subnets_cidr_blocks",
            "value": ["10.92.10.0/24"],   
            "type": "list(string)",
            "secure": false,
            "description": "The CIDR block that's required for the creation of the cluster private subnet. Modify the CIDR block if it has already been reserved or used for other applications within the VPC or conflicts with any on-premises CIDR blocks when using a hybrid environment. Provide only one CIDR block for the creation of the subnet. Make sure to select a CIDR block size that will accommodate the maximum number of management, storage, and both static worker nodes that you expect to have in your cluster.  For more information on CIDR block size selection, see [Choosing IP ranges for your VPC](https://cloud.ibm.com/docs/vpc?topic=vpc-choosing-ip-ranges-for-your-vpc)."
          },
          {
            "name": "vpc_cluster_login_private_subnets_cidr_blocks",
            "value": ["10.92.20.0/24"],    
            "type": "list(string)",
            "secure": false,
            "description": "The CIDR block that's required for the creation of the login cluster private subnet. Modify the CIDR block if it has already been reserved or used for other applications within the VPC or conflicts with any on-premises CIDR blocks when using a hybrid environment. Provide only one CIDR block for the creation of the login subnet. Since login subnet is used only for the creation of login virtual server instance provide a CIDR range of /28."
          },
          {
            "name": "resource_group",
            "value": "Default",          
            "type": "string",
            "secure": false,
            "description":"Resource group name from your IBM Cloud account where the VPC resources should be deployed. [Learn more](https://cloud.ibm.com/docs/account?topic=account-rgs)."
          },
          {
            "name": "cluster_prefix",
            "value": "hpcc-slurm",
            "type": "string",
            "secure": false,
            "description": "Prefix that is used to name the Slurm cluster and IBM Cloud resources that are provisioned to build the Slurm cluster instance. You cannot create more than one instance of the Slurm cluster with the same name. Make sure that the name is unique. Enter a prefix name, such as my-hpcc."
          },
          {
            "name": "cluster_id",
            "value": "SlurmCluster",
            "type": "string",
            "secure": false,
            "description": "ID of the cluster used by Slurm for configuration of resources. This must be up to 39 alphanumeric characters including the underscore (_), the hyphen (-), and the period (.). Other special characters and spaces are not allowed. Do not use the name of any host or user as the name of your cluster. You cannot change it after installation."
          },
          {
            "name": "zone",
            "value": "us-east-3",  
            "type": "string",
            "secure": false,
            "description": "IBM Cloud zone name within the selected region where the Slurm cluster should be deployed. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-creating-a-vpc-in-a-different-region#get-zones-using-the-cli)."
          },
          {
            "name": "management_node_image_name",
            "value": "ibm-ubuntu-20-04-minimal-amd64-2",
            "type": "string",
            "secure": false,
            "description":"Name of the image that you want to use to create virtual server instances in your IBM Cloud account to deploy as worker nodes in the Slurm cluster. By default, the automation uses a stock operating system image. If you would like to include your application-specific binary files, follow the instructions in [Planning for custom images](https://cloud.ibm.com/docs/vpc?topic=vpc-planning-custom-images) to create your own custom image and use that to build the Slurm cluster through this offering. Note that use of your own custom image may require changes to the cloud-init scripts, and potentially other files, in the Terraform code repository if different post-provisioning actions or variables need to be implemented."
          },
          {
            "name": "management_node_instance_type",
            "value": "bx2-2x8",          
            "type": "string",
            "secure": false,
            "description": "Specify the virtual server instance profile type name to be used to create the management nodes for the Slurm cluster. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-profiles)."
          },
          {
            "name": "login_node_instance_type",
            "value": "bx2-2x8",     
            "type": "string",
            "secure": false,
            "description": "Specify the virtual server instance profile type name to be used to create the login node for the Slurm cluster. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-profiles)."
          },
          {
            "name": "storage_node_instance_type",
            "value": "bx2-2x8",    
            "type": "string",
            "secure": false,
            "description": "Specify the virtual server instance profile type to be used to create the storage nodes for the Slurm cluster. The storage nodes are the ones that are used to create an NFS instance to manage the data for HPC workloads. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-profiles)."
          },
          {
            "name": "worker_node_type",
            "value": "vsi",            
            "type": "string",
            "secure": false,
            "description": "The type of server that's used for the worker nodes: virtual server instance or bare metal server. If you choose vsi, the worker nodes are deployed on virtual server instances, or if you choose baremetal, the worker nodes are deployed on bare metal servers."
          },          
          {
            "name": "worker_node_instance_type",
            "value": "bx2-2x8",       
            "type": "string",
            "secure": false,
            "description": "Specify the virtual server instance profile type name to be used to create the worker nodes for the Slurm cluster. The worker nodes are the ones where the workload execution takes place and the choice should be made according to the characteristic of workloads. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-profiles)."
          },
          {
            "name": "worker_node_count",
            "value": "2",              
            "type": "number",
            "secure": false,
            "description": "This is the number of worker nodes that will be provisioned at the time the cluster is created. Enter a value in the range 1 - 500."
          },

          {
            "name": "volume_capacity",
            "value": "1000",           
            "type": "number",
            "secure": false,
            "description": "Size in GB for the block storage that would be used to build the NFS instance and would be available as a mount on Slurm management node. Enter a value in the range 10 - 16000."
          },
          {
            "name": "volume_profile",
            "value": "10iops-tier",
            "type": "string",
            "secure": false,
            "description": "Name of the block storage volume type to be used for NFS instance. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-block-storage-profiles)."
          },       
          {
            "name": "remote_allowed_ips",
            "value": ["169.226.58.175", "169.226.58.176", "169.226.58.177"],           
            "type": "list(string)",
            "secure": false,
            "description": "Comma-separated list of IP addresses that can access the Slurm instance through an SSH. For security purposes, provide the public IP addresses assigned to the devices that are authorized to establish SSH connections (for example, [\"169.45.117.34\"]). To fetch the IP address of the device, use [https://ipv4.icanhazip.com/](https://ipv4.icanhazip.com/)."
          },       
          {
            "name": "spectrum_scale_enabled",
            "value": "false",        
            "type": "bool",
            "secure": false,
            "description": "Setting this to true will enables Spectrum Scale integration with the cluster. Otherwise, Spectrum Scale integration will be disabled (default). By entering 'true' for the property, you have also agreed to one of the two conditions: (1) You are using the software in production and confirm you have sufficient licenses to cover your use under the International Program License Agreement (IPLA). (2) You are evaluating the software and agree to abide by the International License Agreement for Evaluation of Programs (ILAE). Note: Failure to comply with licenses for production use of software is a violation of [IBM International Program License Agreement](https://www.ibm.com/software/passportadvantage/programlicense.html)."
          },
          {
            "name": "TF_WAIT_DURATION",
            "value": "true",
            "type": "bool",
            "secure": false,
            "description": "wait duration time set for the storage and worker node to complete the entire setup"
          },
          {
            "name": "storage_cluster_gui_username",
            "value": "hpc-manager",
            "type": "string",
            "secure": false,
            "description": "GUI user to perform system management and monitoring tasks on storage cluster."
          },
          {
            "name": "storage_cluster_gui_password",
            "value": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
            "type": "string",
            "secure": false,
            "description": "Password for storage cluster GUI"
          },
          {
            "name": "scale_storage_cluster_filesystem_mountpoint",
            "value": "/gpfs/fs1",     
            "type": "string",
            "secure": false,
            "description": "Spectrum Scale storage cluster (owningCluster) file system mount point. The owningCluster is the cluster that owns and serves the file system to be mounted.  For more information, see [Mounting a remote GPFS file system](https://www.ibm.com/docs/en/spectrum-scale/5.1.5?topic=system-mounting-remote-gpfs-file)."
          },
          {
            "name": "scale_filesystem_block_size",
            "value": "4M",       
            "type": "string",
            "secure": false,
            "description": "Specified block size must be a valid IBM Spectrum Scale supported block sizes (256K, 512K, 1M, 2M, 4M, 8M, 16M)."
          },
          {
            "name": "vpn_enabled",
            "value": "true",        
            "type": "bool",
            "secure": false,
            "description": "Set to true to deploy a VPN gateway for VPC in the cluster (default: false)."
          },
          {
            "name": "vpn_peer_cidrs",
            "value": "",     
            "type": "string",
            "secure": false,
            "description": "Comma separated list of peer CIDRs (e.g., 192.168.0.0/24) to which the VPN will be connected."
          },
          {
            "name": "vpn_peer_address",
            "value": "169.226.0.92/32",      
            "type": "string",
            "secure": false,
            "description": "The peer public IP address to which the VPN will be connected."
          },
          {
            "name": "vpn_preshared_key",
            "value": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",     
            "type": "string",
            "secure": true,
            "description": "The pre-shared key for the VPN."
          }
        ]
      }
    ]
  }

We get the following errors:

2024/04/30 15:19:53 Terraform plan | Error: zone not found
2024/04/30 15:19:53 Terraform plan | 
2024/04/30 15:19:53 Terraform plan |   with data.ibm_is_zone.zone,
2024/04/30 15:19:53 Terraform plan |   on datasources.tf line 17, in data "ibm_is_zone" "zone":
2024/04/30 15:19:53 Terraform plan |   17: data "ibm_is_zone" "zone" {
2024/04/30 15:19:53 Terraform plan | 
2024/04/30 15:19:53 Terraform plan | Plan: 17 to add, 0 to change, 0 to destroy.
2024/04/30 15:19:53 Terraform plan | 
2024/04/30 15:19:53 Terraform plan | Changes to Outputs:
2024/04/30 15:19:53 Terraform plan |   + region_name        = "us-east"
2024/04/30 15:19:53 Terraform plan |   + var_zone           = "wdc03"
2024/04/30 15:19:53 Terraform plan |   + vpc                = (known after apply)
2024/04/30 15:19:53 Terraform plan | 
2024/04/30 15:19:53 Terraform plan | Error: Invalid value for variable
2024/04/30 15:19:53 Terraform plan | 
2024/04/30 15:19:53 Terraform plan |   on variables.tf line 185:
2024/04/30 15:19:53 Terraform plan |  185: variable "remote_allowed_ips" {
2024/04/30 15:19:53 Terraform plan |     ├────────────────
2024/04/30 15:19:53 Terraform plan |     │ var.remote_allowed_ips is list of string with 3 elements
2024/04/30 15:19:53 Terraform plan | 
2024/04/30 15:19:53 Terraform plan | Provided IP address format is not valid. Check if Ip address format has comma
2024/04/30 15:19:53 Terraform plan | instead of dot and there should be double quotes between each IP address
2024/04/30 15:19:53 Terraform plan | range if using multiple ip ranges. For multiple IP address use format
2024/04/30 15:19:53 Terraform plan | ["169.45.117.34","128.122.144.145"].
2024/04/30 15:19:53 Terraform plan | 
2024/04/30 15:19:53 Terraform plan | This was checked by the validation rule at variables.tf:194,3-13.

We also explicitly defined region_name in locals.tf to "us-east".

For the first error, Error: zone not found, I am not exactly sure where the zone is coming from. We have zone defined in our config.json as us-east-3, which is WDC07, but the var_zone in output is wdc03.

For the second error, Error: Invalid value for variable, it says our remote_allowed_ips needs to be a string of 3 elements. Our remote allowed Ips are defined in config.json as ["169.226.58.175", "169.226.58.176", "169.226.58.177"]. So I don’t know where shcmeatics is getting the values for this.

Anand-Reddy7 commented 1 month ago

Hello @CN734539, we've fixed the problem. Can you try it out, please?

I used the below config.json file to create the cluster.

{
    "name": "anand-hpcc-slurm",
    "type": [
      "terraform_v1.5"
    ],
    "location": "us-east",
    "resource_group": "Default",
    "description": "",
    "tags": ["hpcc", "slurm"],
    "template_repo": {
      "url": "https://github.com/IBM-Cloud/hpc-cluster-slurm"
    },
    "template_data": [
      {
        "folder": ".",
        "type": "terraform_v1.5",
        "env_values":[
          { 
            "TF_CLI_ARGS_apply": "-parallelism=250"
          },
          { 
            "TF_CLI_ARGS_plan": "-parallelism=250"
          },
          {
            "TF_CLI_ARGS_destroy": "-parallelism=100"
          },
          { 
            "VAR1":"<val1>"
          },
          {
            "VAR2":"<val2>"
          } 
        ],
        "variablestore": [
          {
            "name": "ssh_key_name",
            "value": "anand-mac-key",
            "type": "string",
            "secure": false,
            "description":"Comma-separated list of names of the SSH key configured in your IBM Cloud account that is used to establish a connection to the Slurm management node. Ensure the SSH key is present in the same resource group and region where the cluster is being provisioned. If you do not have an SSH key in your IBM Cloud account, create one by using the instructions given here. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-ssh-keys)."
          },
          {
            "name": "api_key",
            "value": "xxxxxxxxxxxxxxxxxxxx",
            "type": "string",
            "secure": true,
            "description": "This is the API key for IBM Cloud account in which the Slurm cluster needs to be deployed. [Learn more](https://cloud.ibm.com/docs/account?topic=account-userapikey)."
          },
          {
            "name": "vpc_name",
            "value": "",
            "type": "string",
            "secure": false,
            "description": "Name of an existing VPC in which the cluster resources will be deployed. If no value is given, then a new VPC will be provisioned for the cluster. [Learn more](https://cloud.ibm.com/docs/vpc)"
          },
          {
            "name": "vpc_cidr_block",
            "value": "[\"10.241.0.0/18\"]",
            "type": "list(string)",
            "secure": false,
            "description": "Creates the address prefix for the new VPC, when the vpc_name variable is empty. Only a single address prefix is allowed. For more information, see [Setting IP ranges](https://cloud.ibm.com/docs/vpc?topic=vpc-vpc-addressing-plan-design)."
          },
          {
            "name": "vpc_cluster_private_subnets_cidr_blocks",
            "value": "[\"10.241.0.0/22\"]",
            "type": "list(string)",
            "secure": false,
            "description": "The CIDR block that's required for the creation of the cluster private subnet. Modify the CIDR block if it has already been reserved or used for other applications within the VPC or conflicts with any on-premises CIDR blocks when using a hybrid environment. Provide only one CIDR block for the creation of the subnet. Make sure to select a CIDR block size that will accommodate the maximum number of management, storage, and both static worker nodes that you expect to have in your cluster.  For more information on CIDR block size selection, see [Choosing IP ranges for your VPC](https://cloud.ibm.com/docs/vpc?topic=vpc-choosing-ip-ranges-for-your-vpc)."
          },
          {
            "name": "vpc_cluster_login_private_subnets_cidr_blocks",
            "value": "[\"10.241.4.0/28\"]",
            "type": "list(string)",
            "secure": false,
            "description": "The CIDR block that's required for the creation of the login cluster private subnet. Modify the CIDR block if it has already been reserved or used for other applications within the VPC or conflicts with any on-premises CIDR blocks when using a hybrid environment. Provide only one CIDR block for the creation of the login subnet. Since login subnet is used only for the creation of login virtual server instance provide a CIDR range of /28."
          },
          {
            "name": "resource_group",
            "value": "HPCC",
            "type": "string",
            "secure": false,
            "description":"Resource group name from your IBM Cloud account where the VPC resources should be deployed. [Learn more](https://cloud.ibm.com/docs/account?topic=account-rgs)."
          },
          {
            "name": "cluster_prefix",
            "value": "hpcc-slurm",
            "type": "string",
            "secure": false,
            "description": "Prefix that is used to name the Slurm cluster and IBM Cloud resources that are provisioned to build the Slurm cluster instance. You cannot create more than one instance of the Slurm cluster with the same name. Make sure that the name is unique. Enter a prefix name, such as my-hpcc."
          },
          {
            "name": "cluster_id",
            "value": "SlurmCluster",
            "type": "string",
            "secure": false,
            "description": "ID of the cluster used by Slurm for configuration of resources. This must be up to 39 alphanumeric characters including the underscore (_), the hyphen (-), and the period (.). Other special characters and spaces are not allowed. Do not use the name of any host or user as the name of your cluster. You cannot change it after installation."
          },
          {
            "name": "zone",
            "value": "us-east-3",
            "type": "string",
            "secure": false,
            "description": "IBM Cloud zone name within the selected region where the Slurm cluster should be deployed. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-creating-a-vpc-in-a-different-region#get-zones-using-the-cli)."
          },
          {
            "name": "management_node_image_name",
            "value": "hpcc-slurm-management-v1-03may23",
            "type": "string",
            "secure": false,
            "description":"Name of the image that you want to use to create virtual server instances in your IBM Cloud account to deploy as worker nodes in the Slurm cluster. By default, the automation uses a stock operating system image. If you would like to include your application-specific binary files, follow the instructions in [Planning for custom images](https://cloud.ibm.com/docs/vpc?topic=vpc-planning-custom-images) to create your own custom image and use that to build the Slurm cluster through this offering. Note that use of your own custom image may require changes to the cloud-init scripts, and potentially other files, in the Terraform code repository if different post-provisioning actions or variables need to be implemented."
          },
          {
            "name": "management_node_instance_type",
            "value": "cx2-16x32",
            "type": "string",
            "secure": false,
            "description": "Specify the virtual server instance profile type name to be used to create the management nodes for the Slurm cluster. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-profiles)."
          },
          {
            "name": "login_node_instance_type",
            "value": "cx2-16x32",
            "type": "string",
            "secure": false,
            "description": "Specify the virtual server instance profile type name to be used to create the login node for the Slurm cluster. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-profiles)."
          },
          {
            "name": "storage_node_instance_type",
            "value": "bx2-2x8",
            "type": "string",
            "secure": false,
            "description": "Specify the virtual server instance profile type to be used to create the storage nodes for the Slurm cluster. The storage nodes are the ones that are used to create an NFS instance to manage the data for HPC workloads. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-profiles)."
          },
          {
            "name": "worker_node_type",
            "value": "vsi",
            "type": "string",
            "secure": false,
            "description": "The type of server that's used for the worker nodes: virtual server instance or bare metal server. If you choose vsi, the worker nodes are deployed on virtual server instances, or if you choose baremetal, the worker nodes are deployed on bare metal servers."
          },          
          {
            "name": "worker_node_instance_type",
            "value": "bx2-4x16",
            "type": "string",
            "secure": false,
            "description": "Specify the virtual server instance profile type name to be used to create the worker nodes for the Slurm cluster. The worker nodes are the ones where the workload execution takes place and the choice should be made according to the characteristic of workloads. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-profiles)."
          },
          {
            "name": "worker_node_count",
            "value": "3",
            "type": "number",
            "secure": false,
            "description": "This is the number of worker nodes that will be provisioned at the time the cluster is created. Enter a value in the range 1 - 500."
          },

          {
            "name": "volume_capacity",
            "value": "100",
            "type": "number",
            "secure": false,
            "description": "Size in GB for the block storage that would be used to build the NFS instance and would be available as a mount on Slurm management node. Enter a value in the range 10 - 16000."
          },
          {
            "name": "volume_profile",
            "value": "general-purpose",
            "type": "string",
            "secure": false,
            "description": "Name of the block storage volume type to be used for NFS instance. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-block-storage-profiles)."
          },
          {
            "name": "volume_iops",
            "value": "300",
            "type": "number",
            "secure": false,
            "description": "Number to represent the IOPS(Input Output Per Second) configuration for block storage to be used for NFS instance (valid only for volume_profile=custom, dependent on volume_capacity). Enter a value in the range 100 - 48000. [Learn more](https://cloud.ibm.com/docs/vpc?topic=vpc-block-storage-profiles#custom)."
          },       
          {
            "name": "remote_allowed_ips",
            "value": "[\"49.37.163.132\", \"49.37.163.132\"]",
            "type": "list(string)",
            "secure": false,
            "description": "Comma-separated list of IP addresses that can access the Slurm instance through an SSH. For security purposes, provide the public IP addresses assigned to the devices that are authorized to establish SSH connections (for example, [\"169.45.117.34\"]). To fetch the IP address of the device, use [https://ipv4.icanhazip.com/](https://ipv4.icanhazip.com/)."
          },       
          {
            "name": "spectrum_scale_enabled",
            "value": "false",
            "type": "bool",
            "secure": false,
            "description": "Setting this to true will enables Spectrum Scale integration with the cluster. Otherwise, Spectrum Scale integration will be disabled (default). By entering 'true' for the property, you have also agreed to one of the two conditions: (1) You are using the software in production and confirm you have sufficient licenses to cover your use under the International Program License Agreement (IPLA). (2) You are evaluating the software and agree to abide by the International License Agreement for Evaluation of Programs (ILAE). Note: Failure to comply with licenses for production use of software is a violation of [IBM International Program License Agreement](https://www.ibm.com/software/passportadvantage/programlicense.html)."
          },
          {
            "name": "TF_WAIT_DURATION",
            "value": "600s",
            "type": "string",
            "secure": false,
            "description": "wait duration time set for the storage and worker node to complete the entire setup"
          },
          {
            "name": "storage_cluster_gui_username",
            "value": "Please fill here",
            "type": "string",
            "secure": false,
            "description": "GUI user to perform system management and monitoring tasks on storage cluster."
          },
          {
            "name": "storage_cluster_gui_password",
            "value": "Please fill here",
            "type": "string",
            "secure": true,
            "description": "Password for storage cluster GUI"
          },
          {
            "name": "scale_storage_cluster_filesystem_mountpoint",
            "value": "/gpfs/fs1",
            "type": "string",
            "secure": false,
            "description": "Spectrum Scale storage cluster (owningCluster) file system mount point. The owningCluster is the cluster that owns and serves the file system to be mounted.  For more information, see [Mounting a remote GPFS file system](https://www.ibm.com/docs/en/spectrum-scale/5.1.5?topic=system-mounting-remote-gpfs-file)."
          },
          {
            "name": "scale_filesystem_block_size",
            "value": "4M",
            "type": "string",
            "secure": false,
            "description": "Specified block size must be a valid IBM Spectrum Scale supported block sizes (256K, 512K, 1M, 2M, 4M, 8M, 16M)."
          },
          {
            "name": "vpn_enabled",
            "value": "false",
            "type": "bool",
            "secure": false,
            "description": "Set to true to deploy a VPN gateway for VPC in the cluster (default: false)."
          },
          {
            "name": "vpn_peer_cidrs",
            "value": "",
            "type": "string",
            "secure": false,
            "description": "Comma separated list of peer CIDRs (e.g., 192.168.0.0/24) to which the VPN will be connected."
          },
          {
            "name": "vpn_peer_address",
            "value": "",
            "type": "string",
            "secure": false,
            "description": "The peer public IP address to which the VPN will be connected."
          },
          {
            "name": "vpn_preshared_key",
            "value": "",
            "type": "string",
            "secure": true,
            "description": "The pre-shared key for the VPN."
          }
        ]
      }
    ]
  }
RH593591 commented 1 month ago

The Generate Plan action was successful with the value of remote_allowed_ips being a list of ips. Thank you Anand. This gets us further.

I do like to mention that when I tried remote_allowed_ips with a list of ip ranges, it failed. In our case, we would eventually need the value to be a list of ranges since our user base is rather large.

I would also bring your attention to an error I am seeing when I try the Apply Plan action: (Should I create a new github issue?) Note: We have been testing (with the worker nodes set to deploy to bx2 profiles just so that we can test without having the A100s released)

024/05/22 19:49:26 Terraform apply | 2024/05/22 19:49:26 Terraform apply | Error: Forbidden 2024/05/22 19:49:26 Terraform apply | 2024/05/22 19:49:26 Terraform apply | with module.nfs_storage[0].ibm_is_instance.storage, 2024/05/22 19:49:26 Terraform apply | on resources/ibmcloud/compute/vsi_nfs_storage_server/vsi_nfs_storage_server.tf line 29, in resource "ibm_is_instance" "storage": 2024/05/22 19:49:26 Terraform apply | 29: resource "ibm_is_instance" "storage" { 2024/05/22 19:49:26 Terraform apply | 2024/05/22 19:49:26 [1m[31mTerraform APPLY error: Terraform APPLY errorexit status 1[39m[0m 2024/05/22 19:49:26 [1m[31mCould not execute job: Error : Terraform APPLY errorexit status 1[39m[0m

Anand-Reddy7 commented 1 month ago

Hello @RH593591, Good day!

I have successfully spun up a new cluster using the above provided "config.json" file, and I was able to provision the cluster without any issues.

Please pull the latest code and give it a try.

For your reference, below are my cluster logs:

3 13:46:01 Terraform refresh | data.template_file.storage_user_data: Reading...
 2024/05/23 13:46:01 Terraform refresh | data.template_file.management_user_data: Reading...
 2024/05/23 13:46:01 Terraform refresh | data.template_file.worker_user_data: Reading...
 2024/05/23 13:46:01 Terraform refresh | data.template_file.storage_user_data: Read complete after 1s [id=014f77f320b859f517635c1d921dde3f452a53ab1c589ca86e8eb0bbc5d4328e]
 2024/05/23 13:46:01 Terraform refresh | data.template_file.management_user_data: Read complete after 1s [id=c448b4d2d8ab2fb1c86f2f3b6f979f82aeb052694ec9560b1ac7a64bb6aa9938]
 2024/05/23 13:46:01 Terraform refresh | data.template_file.worker_user_data: Read complete after 1s [id=7f49a7c27539e7c9dafa76762e5d1d98bd06a5a0ce4bf7669120db0f733e9980]
 2024/05/23 13:46:01 Terraform refresh | module.login_vsi.ibm_is_instance.login: Refreshing state... [id=0777_a1e64cf3-a5f2-4bea-b941-1df67f7add21]
 2024/05/23 13:46:02 Terraform refresh | module.nfs_storage[0].ibm_is_instance.storage: Refreshing state... [id=0777_f779b8c8-055e-4e97-a374-c29b1d0cb5ac]
 2024/05/23 13:46:02 Terraform refresh | module.management[0].ibm_is_instance.management: Refreshing state... [id=0777_dbc03aa4-8cdf-46ef-bec2-ec64995635b1]
 2024/05/23 13:46:03 Terraform refresh | module.login_fip.ibm_is_floating_ip.login_fip: Refreshing state... [id=r014-61c5fd9e-ac0b-4418-97e5-859ca7e639c6]
 2024/05/23 13:46:04 Terraform refresh | module.worker_vsi[0].ibm_is_instance.worker["1"]: Refreshing state... [id=0777_7b8bd148-7d8d-4972-9c23-b17b847498a2]
 2024/05/23 13:46:04 Terraform refresh | module.worker_vsi[0].ibm_is_instance.worker["2"]: Refreshing state... [id=0777_f9989eee-0ec1-4123-a9a3-bed3d0710b4c]
 2024/05/23 13:46:04 Terraform refresh | module.worker_vsi[0].ibm_is_instance.worker["0"]: Refreshing state... [id=0777_e256419b-4216-4717-9b33-116d3b3de636]
 2024/05/23 13:46:07 Terraform refresh | module.worker_nodes_wait[0].time_sleep.waiter: Refreshing state... [id=2024-05-23T13:45:21Z]
 2024/05/23 13:46:07 Terraform refresh | 
 2024/05/23 13:46:07 Terraform refresh | Outputs:
 2024/05/23 13:46:07 Terraform refresh | 
 2024/05/23 13:46:07 Terraform refresh | nfs_ssh_command = "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -J root@52.117.125.23 root@10.241.0.4"
 2024/05/23 13:46:07 Terraform refresh | region_name = "us-east"
 2024/05/23 13:46:07 Terraform refresh | ssh_command = "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -J root@52.117.125.23  ubuntu@10.241.0.5"
 2024/05/23 13:46:07 Terraform refresh | vpc = "Name:- anand-psl-vpc | ID:- r014-92e1109b-9256-4501-ae0e-bf6ac910e201"
 2024/05/23 13:46:07 Command finished successfully.

 2024/05/23 13:46:07 -----  Terraform OUTPUT  -----

 2024/05/23 13:46:07 Starting command: terraform1.5 output -no-color -json
 2024/05/23 13:46:07 Starting command: terraform1.5 output -no-color -json
 2024/05/23 13:46:11 Command finished successfully.
 2024/05/23 13:46:22 Done with the workspace action