aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
https://github.com/aws/aws-parallelcluster
Apache License 2.0
837 stars 312 forks source link

terraform-provider-aws-parallelcluster fails on parallelcluster 3.11.0 with login nodes enabled #6489

Open kondakovm opened 1 month ago

kondakovm commented 1 month ago

The terraform-provider-aws-parallelcluster fails while parsing the cluster status during creation on the 3.11 API with login nodes enabled, resulting in the following error:

 Error: Error while waiting for cluster to finish updating.

   with module.parallelcluster_clusters.aws-parallelcluster_cluster.managed_configs["ParallelCluster"],
   on .terraform/modules/parallelcluster_clusters/modules/clusters/main.tf line 35, in resource "aws-parallelcluster_cluster" "managed_configs":
   35: resource "aws-parallelcluster_cluster" "managed_configs" {

 json: cannot unmarshal array into Go struct field _DescribeClusterResponseContent.loginNodes of type map[string]interface {}

Despite this error, the cluster was created and is fully operational, but terraform cannot read or import it ending with the same error. This is most likely connected to transitioning from a single login node to multiple login nodes in a pool.

Additional info: The deployment with login nodes works on parallelcluster 3.10.1. The deployment works on parallelcluster 3.11.0 without login nodes enabled.

Required Info:

hanwen-pcluste commented 1 month ago

Thank you for reporting the issue. We will work on a fix

hanwen-pcluste commented 1 month ago

The problem is solved in ParallelCluster 3.11.1. Please use the latest version.

kondakovm commented 1 month ago

Thank you for taking care of the issue, unfortunately, I got the same error when deploying the config with the 3.11.1 API using the parallelcluster provider:

Error: Error while waiting for cluster to finish updating.
  with module.parallelcluster_clusters.aws-parallelcluster_cluster.managed_configs["ParallelCluster"],
  on .terraform/modules/parallelcluster_clusters/modules/clusters/main.tf line 35, in resource "aws-parallelcluster_cluster" "managed_configs":
  35: resource "aws-parallelcluster_cluster" "managed_configs" {
json: cannot unmarshal array into Go struct field _DescribeClusterResponseContent.loginNodes of type map[string]interface {}

The cluster is created and fully functional, no errors in Lambda API logs, but terraform can't read/modify login nodes status.

gmarciani commented 3 weeks ago

Hi @kondakovm ,

we are working on the issue with 3.11.1. Will give an update there once we have more info.

Thank you for reporting the issue.

gmarciani commented 2 weeks ago

Currently working on the fix for the next provider version https://github.com/aws-tf/terraform-provider-aws-parallelcluster/pull/206