Closed bkanzki-onica closed 3 years ago
Hi @bkanzki-onica, so if I understand correctly, resources are correctly deployed in the CICD account but not in the child account?
Can you please confirm that you can see two stacks in CREATE_COMPLETE
in the CICD account like in this picture:
Then in the child account, for the sdlf-cicd-child-foundations
stack which is in ROLLBACK_COMPLETE
could you identify the first error that you see in the Events
section of that CloudFormation stack and provide it here?
It's most likely something to do with your role's permissions and searching in the Events of a failed stack would inform you about that
Yes I do see two stacks in CICD account.
In the child account I have only this under events:
Could you try to:
Manually delete the sdlf-cicd-child-foundations
stack in the child account
Manually recreate the same stack by uploading this template from the repository into CloudFormation. In the parameters, it will ask for these two inputs:
Outputs
section of the sdlf-cicd-shared-foundations-dev
stack in the CICD account under oKMSKeyId
As soon as the stack launches, please monitor it for any issues and let us know what you encounter
Hi
I followed your instructions and this is what I got:
Invalid Principal sdlf-cicd-team-repos-rTeamReposCodeBuildRolexxxxx Arn in sdlf-cicd-child-foundations stack
I can't find that role in my list of roles
@bkanzki-onica something strange is that the sdlf-cicd-team-repos-rTeamReposCodeBuildRolexxxxx
is defined in the CICD AWS account, not in the child account. The role is not defined in the sdlf-cicd-child-foundations
template, so I don't understand how the child account can even detect this role... Could it be that this child account was previously used for an initial deployment of SDLF (where CICD and Child accounts were combined into one)?
In the Lake Formation console of the child account, could you also check the Admins and database creators
section for any reference to this role?
@jaidisido the child account was used for an initial deployment of SDLF, then I deleted everything to deploy the multi environment account. In the Lake Formation console there is no reference to that. I could recreate that role, do you know what permissions came with it?
That would explain it partly. It seems that Lake Formation is still hanging over this role somehow, although it was previously deleted. What remains a mystery is why the role ended up in Lake Formation in the first place. At no point does SDLF adds it to Lake Formation, so it must have been added manually.
I am not sure if recreating the role would help, but it cannot hurt to try. The role is defined here, and not sure if you need to fully recreate it or just having a role with the same name would be enough.
A more radical solution would be to consider deploying in another (clean) child account. Appreciate it might not be possible however.
I don't understand,
This is the template for the sdlf-cicd-team-repos.
It creates that role in the devops account. It's only the child account that it doesn't create it. Why? What is missing to deploy that role there? These accounts were empty before. So if by accident someone deletes that role, your stack doesn't recreate it?
How can that role be deleted from lake formation?
As you say, the sdlf-cicd-team-repos-rTeamReposCodeBuildRolexxxxx
is created through the sdlf-cicd-team-repos
template. So here is my understanding of the events that led to the issue:
sdlf-cicd-team-repos-rTeamReposCodeBuildRolexxxxx
role was created via the sdlf-cicd-team-repos
templatesdlf-cicd-team-repos-rTeamReposCodeBuildRolexxxxx
also ended up in Lake Formation. And this is the part that I don't understand since at no point should this happen in the framework unless someone adds it manually for some reasonsdlf-cicd-team-repos-rTeamReposCodeBuildRolexxxxx
role was created in the CICD account because that is where the sdlf-cicd-team-repos
stack is defined. It should NOT appear in the child account at this pointI was finally able to delete it. and the sdlf-cicd-child-foundations has been created completely which gives me access to codepipeline and codebuild. However when doing a push after modifying the parameters-dev.json file, I get the following error:
99 | ./deploy.sh
I was able to manually deploy the resources by runnind this command in the sdlf-foundations folder: ./deploy.sh -n sdlf-cicd-child-foundations -s sdlf-cfn-artifacts-us-east-1-XXXXXXXXXXXXX -p bdlf-dev
But the previously codebuild and codepipeline resources disapeared and got deleted. Would you know why?
They got deleted because you used the same name (sdlf-cicd-child-foundations
) when deploying your sdlf-foundations
resources, effectively asking CloudFormation to replace the resources that were previously defined in the existing sdlf-cicd-child-foundations
stack. The command should have been:
./deploy.sh -n sdlf-foundations -s sdlf-cfn-artifacts-us-east-1-XXXXXXXXXXXXX -p bdlf-dev
@jaidisido That command does deploy the stack but it gets stuck at the and can't create the glue catalog and common policy:
In the terminal I get also: An error occurred (ParameterNotFound) when calling the GetParameter operation: upload failed: scripts/deequ/jar/deequ-1.0.3-RC1.jar to s3:///deequ/jars/deequ-1.0.3-RC1.jar Parameter validation failed: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.-]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-]{1,63}$" fatal error: Parameter validation failed: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.-]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-]{1,63}$"
And then it goes to RollBack complete
UPDATE: Tried a few times. it rolls back and deletes everything all the time
They got deleted because you used the same name (
sdlf-cicd-child-foundations
) when deploying yoursdlf-foundations
resources, effectively asking CloudFormation to replace the resources that were previously defined in the existingsdlf-cicd-child-foundations
stack. The command should have been:./deploy.sh -n sdlf-foundations -s sdlf-cfn-artifacts-us-east-1-XXXXXXXXXXXXX -p bdlf-dev
What's interesting is that when I deploy it with that stack, Codebuild and codepipeline disapear, but all the resources get created properly, which is not the case when I call it sdlf-foundations. There seems to be a permission issue in that stack
Is it possible to setup a meeting to discuss this?
@jaidisido That command does deploy the stack but it gets stuck at the and can't create the glue catalog and common policy:
![]()
In the terminal I get also: An error occurred (ParameterNotFound) when calling the GetParameter operation: upload failed: scripts/deequ/jar/deequ-1.0.3-RC1.jar to s3:///deequ/jars/deequ-1.0.3-RC1.jar Parameter validation failed: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.-]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-]{1,63}$" fatal error: Parameter validation failed: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.-]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-]{1,63}$"
And then it goes to RollBack complete
UPDATE: Tried a few times. it rolls back and deletes everything all the time
There seems to be a number of different issues here.
I would recommend deploying the infrastructure using the CICD resources (CodePipeline, CodeBuild...) not running the deploy script manually. The error you are seeing about the Failed Get Parameter is most likely due to your environment missing jq
, a utility used to query json files. This utility is installed by default in CodeBuild environments, and you would need to run echo y | sudo yum install jq
to install it on your environment
It seems that resources are still lingering from your very first SDLF deployment. For instance, DynamoDB tables are retained even when the stack is deleted. So if you try to redeploy the Dynamo stack it will fail because the tables are already there. Given that this account is polluted from the previous deployment, I would strongly recommend testing in a different one and using the CICD instead of your own environment
@trejas When trying to deploy the multi environment sdlf in this workshop. I run into errors.
my config file looks like this: `[default] region = us-east-1 output=json
[profile bdlf-dev] account=TTTTTTTTT region = us-east-1 output = json
[profile bdlf-qa] account=ZZZZZZZZZ region = us-east-1 output = json
[profile bdlf-prod] account=YYYYYYYY region = us-east-1 output = json
[profile bdlf-devops-dev] account=XXXXXXXXXXX role_arn=arn:aws:iam::XXXXXXXXXXX:role/big-data-labs-data-engineer-dev source_profile=bdlf-dev region = us-east-1 output = json
[profile bdlf-devops-qa] account=XXXXXXXXXXX role_arn=arn:aws:iam::XXXXXXXXXXX:role/big-data-labs-data-engineer-qa source_profile=bdlf-qa region = us-east-1 output = json
[profile bdlf-devops-prod] account=XXXXXXXXXXX role_arn=arn:aws:iam::XXXXXXXXXXX:role/big-data-labs-data-engineer-prod source_profile=bdlf-prod region = us-east-1 output = json
[profile bdlf-devops-main] region = us-east-1 output = json`
step 1 was to run this command:
./deploy.sh -s bdlf-devops-main -r us-east-1 -f
Step 2 is to run this command:
./deploy.sh -s bdlf-devops-main -t bdlf-dev -r us-east-1 -e dev -o -c
But every time I run it, I get:
An error occurred (ValidationError) when calling the DescribeStackEvents operation: Stack [sdlf-cicd-child-foundations] does not exist
and the stack's status goes to ROLLBACK_COMPLETE and cannot be updated afterwards. None of the resources get created in the child account.Could you help with this?