aws-samples / serverless-jenkins-on-aws-fargate

MIT No Attribution
123 stars 111 forks source link

Problems executing Deploy_example.sh #12

Closed edjumacator closed 3 years ago

edjumacator commented 3 years ago

Hey Guys,

I'm having issues deploying the example, I've created the required prerequisites (VPC, 2 private & public subnets, a route53 domain, with s3 bucket & dynamoDB bootstrapped) prior to executing it, and then I've properly changed the vars.sh file to use the newly created parameters but after successfully executing the script and seeing this:

module.serverless_jenkins.aws_route53_record.this[0]: Creation complete after 40s [id=xxxxxxxxxxxxxxxxxxxx_jenkins-controller_A]
Releasing state lock. This may take a few moments...

Apply complete! Resources: 34 added, 0 changed, 0 destroyed.

Outputs:

jenkins_fargate_efs = {
  "efs_access_point_id" = "fsap-xxxxxxxxxxxxxxxxxx"
  "efs_aws_backup_plan_name" = [
    "serverless-srh-jenkins-plan",
  ]
  "efs_aws_backup_vault_name" = [
    "serverless-srh-jenkins-vault",
  ]
  "efs_file_system_dns_name" = "fs-xxxxxxxx.efs.us-west-1.amazonaws.com"
  "efs_file_system_id" = "fs-xxxxxxxx"
  "efs_security_group_id" = "sg-xxxxxxxxxxxxxxxxx"
}

However, when I try to access the alias domain I'm getting a 503 error. I'm not exactly sure how to proceed.

Screen Shot 2021-08-13 at 4 22 42 PM
cbishop-elsevier commented 3 years ago

@edjumacator

A couple of suggestions on how to begin troubleshooting, and if these don't steer you in the direction of a fix, please share event logs etc from same after properly sanitizing for any sensitive information:

That should give you a sufficient troubleshooting starting point.

Beyond that, I would suggest taking a look at the full output from your terraform plan and apply operations, see if anything obvious jumps out there.

If your ECS Cluster and ECS Services are up and Green, you may also consider enabling CloudWatch Container Insights - however keep in mind this option incurs additional charges, so use at your own discretion:

FWIW - In my experience 503 Service Unavailable Errors typically mean that your client request has at least been received and acknowledged by your web server, proxy server, or load balancer (whichever endpoint serves as the initial entry point), and the server either currently can not continue routing your request (because downstream services / endpoints are down or "partitioned" somehow in your network layer), or the server does not "understand" how to continue routing your request (routing / proxy rules do not provide a sufficient routing solution).

If you are still stuck - please share any additional information you can securely share on this ticket, I will try to assist as best I can.

Good hunting!

Cheers,

@chris-bishop

cbishop-elsevier commented 3 years ago

@edjumacator

One other suggestion (and please forgive me if this one is a "blinding flash of the obvious" 😄 )...

Have you confirmed that the value you supplied for:

https://github.com/aws-samples/serverless-jenkins-on-aws-fargate/blob/main/example/vars.sh.example#L13

...
export TF_VAR_route53_domain_name="exampledomain.com"
...

Matches the URL you are trying to nav to via your browser?

If it matches, what happens if you try appending /login like so:

https://exampledomain.com/login

Hope my suggestions help in some way!

Cheers,

@chris-bishop

edjumacator commented 3 years ago

Hey Chris,

I'll go ahead and follow the troubleshooting steps and provide information related to each bullet point.

If you log in to your AWS Account's ECS Console for the AWS Region you are provisioning your infrastructure in, do you see the provisioned ECS Cluster, ECS Services, ECS Tasks, etc running there, post terraform apply?

So the first thing I noticed is that there are no tasks running in either cluster

Screen Shot 2021-08-16 at 11 54 07 AM

Then upon further inspection of the main cluster I noticed inside the jenkins controller service had 4 stopped tasks

Screen Shot 2021-08-16 at 11 54 33 AM

After inspecting the stopped task I was provided with a CannotPullContainerError

Screen Shot 2021-08-16 at 11 56 34 AM

So I opened my ECR and did see a created registry but I did not see any images created inside.

Screen Shot 2021-08-16 at 11 56 57 AM

I've reviewed the jenkins_image.tf file and I believe I've located where terraform attempts to push the newly built image to the created ECR repository. However, when executing the deploy_example.sh script there was no error or anything about it.

Screen Shot 2021-08-16 at 12 06 29 PM

Additional Info I had to add the --context default flag to the commands because the commands were failing to execute when I first started trying the deploy_example.sh file as I was receiving an error that build and push could not be executed in the context (jenkins)

I'm not sure exactly how to troubleshoot this issue though lol.

apogorielov commented 3 years ago

Closing this issue since the module was successfully tested as part of last PR