aws-samples / aws-opensource-mailserver

MIT No Attribution
11 stars 10 forks source link

Failed to receive 1 resource signal(s) within the specified duration #6

Open nmckeown opened 2 months ago

nmckeown commented 2 months ago

All resources are being created but EC2Instance is failing to CREATE_COMPLETE. I suspect something is a miss in in userdata and it is failing to send cfn-signal.

Looking at syslogs, here are some notable errors:

Aug 25 17:14:51 ip-172-31-23-99 amazon-ssm-agent.amazon-ssm-agent[1138]: Error occurred fetching the seelog config file path: open /etc/amazon/ssm/seelog.xml: no such file or directory Aug 25 17:14:51 ip-172-31-23-99 amazon-ssm-agent.amazon-ssm-agent[1138]: 2024-08-25 17:14:51 WARN Error adding the directory '/etc/amazon/ssm' to watcher: no such file or directory Aug 25 17:14:54 ip-172-31-23-99 amazon-ssm-agent.amazon-ssm-agent[1138]: 2024-08-25 17:14:52 WARN EC2RoleProvider Failed to connect to Systems Manager with instance profile role credentials. Err: retrieved credentials failed to report to ssm. RequestId: 14310a44-fbee-4b1a-916c-bb8f4bbc7f78 Error: AccessDeniedException: User: arn:aws:sts::861979030611:assumed-role/MailInABoxInstanceRole/i-0377f7c10d3bd0dbd is not authorized to perform: ssm:UpdateInstanceInformation on resource: arn:aws:ec2:eu-west-1:861979030611:instance/i-0377f7c10d3bd0dbd because no identity-based policy allows the ssm:UpdateInstanceInformation action

Aug 25 17:14:54 ip-172-31-23-99 amazon-ssm-agent.amazon-ssm-agent[1138]: 2024-08-25 17:14:52 ERROR EC2RoleProvider Failed to connect to Systems Manager with SSM role credentials. error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 861979030611
Aug 25 17:14:54 ip-172-31-23-99 amazon-ssm-agent.amazon-ssm-agent[1138]: 2024-08-25 17:14:52 ERROR [CredentialRefresher] Retrieve credentials produced error: no valid credentials could be retrieved for ec2 identity. Default Host Management Err: error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 861979030611

Aug 25 17:24:41 box postfix/submission/smtpd[38450]: SSL_accept error from [ec2-54-220-97-61.eu-west-1.compute.amazonaws.com](http://ec2-54-220-97-61.eu-west-1.compute.amazonaws.com/)[54.220.97.61]: lost connection

Posting if someone has better idea of whats the issue. Will try on my end to work it out.

nmckeown commented 2 months ago

Attaching the AmazonSSMManagedInstanceCore policy to EC2 role like below in CF instance.yaml gets rid of the ssm errors but the EC2 instances still rolls back after 30 mins! I can get on mailinabox server while its up but unsure why it still fails to fully create.

  InstanceRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: MailInABoxInstanceRole
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - ec2.amazonaws.com
            Action:
              - sts:AssumeRole
      ManagedPolicyArns:
        - 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore'
nmckeown commented 2 months ago

I switched to root and sent the cfn-signal to keep the host up and stepped through the userdata script, The only issue I could find is the backup script is missing the duplicity package:

root@box:/opt/mailinabox/management# ./backup.py

Traceback (most recent call last):
  File "/usr/bin/duplicity", line 5, in <module>
    from duplicity.__main__ import dup_run
ModuleNotFoundError: No module named 'duplicity.__main__'
Something is wrong with the backup:

Looks like a known issue: https://discourse.mailinabox.email/t/modulenotfounderror-no-module-named-duplicity-duplicity/11447

Was able to get around by:

apt remove duplicity
rm -rf /etc/apt/sources.list.d/duplicity-team-ubuntu-duplicity-release-git-jammy.list
snap install duplicity --classic
rm -rf /usr/bin/duplicity
which duplicity
duplicity --version

# change path in /opt/mailinabox/management/backup.py to
/use/local/bin/duplicity

There is still an issue with backup.py connecting to S3, looks like address for bucket is wrong format but that's a MIAB config.