broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
989 stars 358 forks source link

AWS CloudFormation stack fails with "UnicodeDecodeError" #4674

Open multimeric opened 5 years ago

multimeric commented 5 years ago

I have been attempting to spin up the CloudFormation stack provided here. This intermittently fails and succeeds, I'm not sure why it sometimes does and sometimes doesn't work.

When it fails, it's because the EC2 instance fails to initialize. The most promising section I can find from the EC2 logs shows the following:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 891, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 49: ordinal not in range(128)
Logged from file util.py, line 476
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 891, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 71: ordinal not in range(128)
Logged from file util.py, line 476
[   72.975338] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: (null)
Error occurred during build: Command 04InstallECSAdditions failed

The line EXT4-fs (dm-3): mounted filesystem with ordered data mode is repeated about 100 times in the real logs (below), I've just abridged it here for clarity. Anyway, the main thing this tells us that it's failing during step 04 of the EC2 startup script, which does the following:

04InstallECSAdditions:
  command:
    Fn::If:
      - UseCromwell
      - !Join [" ", ["sh", "/opt/ecs-additions/ecs-additions-cromwell.sh"]]
      - echo "OK"
  env:
    PATH: "/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin"

My best guess as to what is happening, is this blog post, which suggests:

Apparently cfn-init has a limit on the amount of output it can process from a command, and I was pushing that limit.

I suspect the reason for the UTF8 error is that the output was truncated between two bytes or something, and when the parser underneath cfn-init tried to parse it, it encountered what appeared to be an invalid UTF8 character.

So perhaps the reason this issue is intermittent is because the length of the logs from this command are occasionally too long for the cfn-init script? Or this might be a red herring.

To aid with debugging, here are some useful logs

ruchim commented 5 years ago

@wleepang -- I don't believe I've ever seen this. Any ideas?

wleepang commented 5 years ago

@TMiguelT - Thanks! The logs should help me debug what's going here. I've seen this intermittently in the past, but it was difficult to catch.

wleepang commented 5 years ago

@TMiguelT - This should be resolved in the latest update to the CloudFormation templates at: https://docs.opendata.aws/genomics-workflows

The use of a Custom AMI is now deprecated in favor of EC2 Launch Templates.

multimeric commented 5 years ago

Wow, great work! I look forward to trying them out

ion-girloanta commented 5 years ago

Got the same error in https://github.com/aws-samples/aws-refarch-wordpress

seth-xdam commented 5 years ago

@wleepang Could you share what change you made to fix the problem? I'm getting the same error on my own CloudFormation template.

Mahesh81 commented 4 years ago

Can you please share the fix, i face the same issue in my template