aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
Apache License 2.0
828 stars 312 forks source link

Fsx Lustre with IAM custom policy #1312

Closed fabfacts closed 4 years ago

fabfacts commented 5 years ago



[cluster test]
key_name = mykey
vpc_settings = custom_vpn
base_os = centos7
cwl_region = us-east-1
master_instance_type = t3.small
compute_instance_type = t3.small
schedLustreuler = sge
ec2_iam_role = my_custom_role
fsx_settings = fs
initial_queue_size = 0
max_queue_size = 16
maintain_initial_size = false
cluster_type = spot

[fsx fs]
shared_dir = /fsx
storage_capacity = 3600
imported_file_chunk_size = 1024
export_path = s3://fsxlustre.mytest/export
import_path = s3://fsxlustre.mytest
weekly_maintenance_start_time = 1:00:00

Bug description and how to reproduce:

I guess this is actually more like a simple question than a bug.


2019-09-16 15:12:40.025000+00:00 CREATE_FAILED AWS::IAM::InstanceProfile RootInstanceProfile Resource creation cancelled
2019-09-16 15:12:39.983000+00:00 CREATE_FAILED AWS::EC2::EIPAssociation AssociateEIP Resource creation cancelled
2019-09-16 15:12:39.883000+00:00 CREATE_FAILED AWS::CloudFormation::Stack EBSCfnStack Resource creation cancelled
2019-09-16 15:12:39.880000+00:00 CREATE_FAILED AWS::DynamoDB::Table DynamoDBTable Resource creation cancelled
2019-09-16 15:12:39.453000+00:00 CREATE_FAILED AWS::CloudFormation::Stack FSXSubstack Embedded stack arn:aws:cloudformation:us-east-1:123456:stack/parallelcluster-mytest-FSXSubstack-1234/1234-1234-1234-1234 was not successfully created: The following resource(s) failed to create: [FileSystem].

I'm almost sure Fsx Lustre creation need extra IAM permissions in my custom IAM User policy , could you suggest me which policy to add ? I haven't found it in the documentation.

to reproduce:

Just create a new cluster with the template above: pcluster create -t test mytest

Palak-15 commented 5 years ago

Hello @extremoburo ,

You need to provide S3 full acces IAM ROLE. , CREATE ROLE HERE.

First of all , create your cluster with this command. pcluster create clustername --norollback, then ssh into your master node and send the content of cfn-init.log file.

fabfacts commented 5 years ago

Hello @Palak-15

thanks for replying , well it's not due to S3 as it fails before, no master instance is created. From the stack on Cloudformation I could check that it happens in the nested substack "FSXSubstack" and the error it's clear:

Embedded stack arn:aws:cloudformation:us-east-1:123456:stack/parallelcluster-burotest-FSXSubstack-1234/1234 was not successfully created: The following resource(s) failed to create: [FileSystem].

more into details:

User: arn:aws:iam::1234:user/cfnclustermanager is not authorized to perform: fsx:CreateFileSystem on resource: arn:aws:fsx:us-east-1:1234:file-system/* (Service: AmazonFSx; Status Code: 400; Error Code: AccessDeniedException; Request ID: 1234)

that's why I'm thinking I need to add or update the User policy to my custom role, at least adding "fsx:CreateFileSystem" permissions but I fear it's not the only one.

If there is a way to check missing permission without actually creating/deleting a cluster for any change it would be appreciated, it will save some money and time. For instance I could create the cluster without custom role thus using admin full permissions and see what policy/roles are auto-created, it will cost come dollars but maybe could help. let me know what you think guys. not in hurry.

concerning the S3 ARN I've given full permissions with a bucket policy to the VPC I'm creating the cluster in, I though it could be ok but I may be wrong. In anycase I think main issue the one above.

Thanks in advance. F.

sean-smith commented 5 years ago

Thanks @extremoburo for reporting this, I've marked it as a bug and we'll update this when we update the policy.

fabfacts commented 4 years ago

thanks @sean-smith