canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.92k stars 871 forks source link

Data source: none, and not able to execute user-script in AWS EC2 #4082

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #2008764

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = 2023-05-01T04:17:29.061233+00:00
date_created = 2023-02-28T09:07:21.150797+00:00
date_fix_committed = None
date_fix_released = None
id = 2008764
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/2008764
milestone = None
owner = stardust-aryan
owner_name = Aryan Agarwal
private = False
status = expired
submitter = stardust-aryan
submitter_name = Aryan Agarwal
tags = ['aws', 'datasource-none', 'ec2', 'user-script']
duplicates = []

Launchpad user Aryan Agarwal(stardust-aryan) wrote on 2023-02-28T09:07:21.150797+00:00

Details can be found here : https://stackoverflow.com/questions/75507635/not-able-to-run-a-user-script-on-some-ec2-instances-started-from-an-ami-of-ubunt

I am launching multiple ( more than 100 EC2 instances ) from a previously set AMI. This AMI is made from an Ubuntu 20.04 AMI. Everything that is required for the work is already satisfied in AMI. Now I just want execute an user-script while launching EC2 using AMI. But the problem is coming that for 2-3% of the Ec2 instances are not able to execute the userscript , othere 97-98% of launched instances are running perfectly fine. These instances are being launched using spot requests so they are exactly similar in configurations. I am able to ssh into the faulty instances too . Last few lines of /var/log/cloud-init-output.log looks something like this:

Cloud-init v. 22.2-0ubuntu1~22.04.3 running 'modules:config' at Mon, 20 Feb 2023 08:21:36 +0000. Up 41.66 seconds. Cloud-init v. 22.2-0ubuntu1~22.04.3 running 'modules:final' at Mon, 20 Feb 2023 08:21:39 +0000. Up 45.11 seconds. Cloud-init v. 22.2-0ubuntu1~22.04.3 finished at Mon, 20 Feb 2023 08:21:39 +0000. Datasource DataSourceNone. Up 45.51 seconds 2023-02-20 08:21:39,953 - cc_final_message.py[WARNING]: Used fallback datasource

/var/lib/cloud/instance normally points to the a folder in with the name of instance id but in this faulty case , it is pointing at this -> /var/lib/cloud/instances/iid-datasource-none

For Faulty instance:

ubuntu@ip-172-31-21-43:/var/lib/cloud/instances/iid-datasource-none$ cat datasource DataSourceNone: DataSourceNone

For healthy instance:

ubuntu@ip-172-31-20-76:/var/lib/cloud/instances/i-0c964c834bf91ce81$ cat datasource DataSourceEc2Local: DataSourceEc2Local

Now I am not able to figure out that why the userscript is being executed in most of the instances but these few 2-3%.

ubuntu-server-builder commented 1 year ago

Launchpad user Aryan Agarwal(stardust-aryan) wrote on 2023-02-28T09:07:21.150797+00:00

Launchpad attachments: cloud-init tarball

ubuntu-server-builder commented 1 year ago

Launchpad user James Falcon(falcojr) wrote on 2023-03-01T20:34:40.953745+00:00

I see a number of boots in cloud-init.log, so I'm curious which one represents the failure you're referring to.

The first boot I see is from 11/18. The datasource was detected fine (Ec2), but I see a traceback when attempting to run your userdata:

2022-11-18 05:40:13,142 - subp.py[DEBUG]: Unexpected error while running command. Command: ['/var/lib/cloud/instance/scripts/part-001'] Exit code: 32 Reason: - Stdout: - Stderr: - 2022-11-18 05:40:13,142 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts) 2022-11-18 05:40:13,142 - handlers.py[DEBUG]: finish: modules-final/config-scripts-user: FAIL: running config-scripts-user with frequency once-per-instance 2022-11-18 05:40:13,142 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed 2022-11-18 05:40:13,142 - util.py[DEBUG]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/config/modules.py", line 231, in _run_modules ran, _r = cc.run( File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 67, in run return self._runners.run(name, functor, args, freq, clear_on_fail) File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 185, in run results = functor(*args) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py", line 48, in handle subp.runparts(runparts_path) File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 427, in runparts raise RuntimeError( RuntimeError: Runparts: 1 failures (part-001) in 1 attempted commands

The next set of boots seem fine until we get to the final boot on 2/28. This time we see the same Traceback, but earlier in the logs, we also fail getting some metadata from EC2:

2023-02-28 07:36:36,371 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:36,374 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:37,375 - url_helper.py[DEBUG]: [1/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:37,378 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:38,379 - url_helper.py[DEBUG]: [2/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:38,383 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:39,384 - url_helper.py[DEBUG]: [3/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:39,388 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:40,389 - url_helper.py[DEBUG]: [4/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:40,393 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:41,394 - url_helper.py[DEBUG]: [5/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:41,402 - util.py[WARNING]: Failed fetching meta-data/ from url http://169.254.169.254/2021-03-23/meta-data/ 2023-02-28 07:36:41,416 - util.py[DEBUG]: Failed fetching meta-data/ from url http://169.254.169.254/2021-03-23/meta-data/ Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 213, in _get_instance_metadata md = materializer.materialize() File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 111, in materialize self._md = self._materialize(self._blob, self._base_url) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 122, in _materialize child_contents[c] = self._materialize(child_blob, child_url) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 121, in _materialize child_blob = self._caller(child_url) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 206, in mcaller return caller(url).contents File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 87, in read_file_or_url return readurl(url, **kwargs) File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 350, in readurl raise excps[-1] cloudinit.url_helper.UrlError: 404 Client Error: Not Found for url: http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/

The 404 is strange. I can't think of anything that can cause this to happen other than EC2's metadata service having issues when you tried to contact it. If that's the case, we shouldn't see it on any additional relaunches.

Without access to your user data, I can't dig into exactly what caused the traceback, but if the first boot from 11/18 was used as the source of your AMI, I'm wondering if there's some flakiness in your script that would lead to some transient failures on launch. If so, the datasource warning is probably unrelated.

Can you tell me if the boot on 11/18 was used as the source of the AMI? Also, are you able to provide me with your user data with any sensitive pieces redacted? If not, is there any way part of your script could have failed transiently?

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2023-05-01T04:17:28.855349+00:00

[Expired for cloud-init because there has been no activity for 60 days.]