Closed ubuntu-server-builder closed 1 year ago
Launchpad user Aryan Agarwal(stardust-aryan) wrote on 2023-02-28T09:07:21.150797+00:00
Launchpad attachments: cloud-init tarball
Launchpad user James Falcon(falcojr) wrote on 2023-03-01T20:34:40.953745+00:00
I see a number of boots in cloud-init.log, so I'm curious which one represents the failure you're referring to.
The first boot I see is from 11/18. The datasource was detected fine (Ec2), but I see a traceback when attempting to run your userdata:
2022-11-18 05:40:13,142 - subp.py[DEBUG]: Unexpected error while running command. Command: ['/var/lib/cloud/instance/scripts/part-001'] Exit code: 32 Reason: - Stdout: - Stderr: - 2022-11-18 05:40:13,142 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts) 2022-11-18 05:40:13,142 - handlers.py[DEBUG]: finish: modules-final/config-scripts-user: FAIL: running config-scripts-user with frequency once-per-instance 2022-11-18 05:40:13,142 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed 2022-11-18 05:40:13,142 - util.py[DEBUG]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/config/modules.py", line 231, in _run_modules ran, _r = cc.run( File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 67, in run return self._runners.run(name, functor, args, freq, clear_on_fail) File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 185, in run results = functor(*args) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py", line 48, in handle subp.runparts(runparts_path) File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 427, in runparts raise RuntimeError( RuntimeError: Runparts: 1 failures (part-001) in 1 attempted commands
The next set of boots seem fine until we get to the final boot on 2/28. This time we see the same Traceback, but earlier in the logs, we also fail getting some metadata from EC2:
2023-02-28 07:36:36,371 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:36,374 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:37,375 - url_helper.py[DEBUG]: [1/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:37,378 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:38,379 - url_helper.py[DEBUG]: [2/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:38,383 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:39,384 - url_helper.py[DEBUG]: [3/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:39,388 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:40,389 - url_helper.py[DEBUG]: [4/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:40,393 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again 2023-02-28 07:36:41,394 - url_helper.py[DEBUG]: [5/6] open 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/' with {'url': 'http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/', 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'X-aws-ec2-metadata-token': 'REDACTED'}} configuration 2023-02-28 07:36:41,402 - util.py[WARNING]: Failed fetching meta-data/ from url http://169.254.169.254/2021-03-23/meta-data/ 2023-02-28 07:36:41,416 - util.py[DEBUG]: Failed fetching meta-data/ from url http://169.254.169.254/2021-03-23/meta-data/ Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 213, in _get_instance_metadata md = materializer.materialize() File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 111, in materialize self._md = self._materialize(self._blob, self._base_url) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 122, in _materialize child_contents[c] = self._materialize(child_blob, child_url) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 121, in _materialize child_blob = self._caller(child_url) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/ec2.py", line 206, in mcaller return caller(url).contents File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 87, in read_file_or_url return readurl(url, **kwargs) File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 350, in readurl raise excps[-1] cloudinit.url_helper.UrlError: 404 Client Error: Not Found for url: http://169.254.169.254/2021-03-23/meta-data/identity-credentials/ec2/
The 404 is strange. I can't think of anything that can cause this to happen other than EC2's metadata service having issues when you tried to contact it. If that's the case, we shouldn't see it on any additional relaunches.
Without access to your user data, I can't dig into exactly what caused the traceback, but if the first boot from 11/18 was used as the source of your AMI, I'm wondering if there's some flakiness in your script that would lead to some transient failures on launch. If so, the datasource warning is probably unrelated.
Can you tell me if the boot on 11/18 was used as the source of the AMI? Also, are you able to provide me with your user data with any sensitive pieces redacted? If not, is there any way part of your script could have failed transiently?
Launchpad user Launchpad Janitor(janitor) wrote on 2023-05-01T04:17:28.855349+00:00
[Expired for cloud-init because there has been no activity for 60 days.]
This bug was originally filed in Launchpad as LP: #2008764
Launchpad details
Launchpad user Aryan Agarwal(stardust-aryan) wrote on 2023-02-28T09:07:21.150797+00:00
Details can be found here : https://stackoverflow.com/questions/75507635/not-able-to-run-a-user-script-on-some-ec2-instances-started-from-an-ami-of-ubunt
I am launching multiple ( more than 100 EC2 instances ) from a previously set AMI. This AMI is made from an Ubuntu 20.04 AMI. Everything that is required for the work is already satisfied in AMI. Now I just want execute an user-script while launching EC2 using AMI. But the problem is coming that for 2-3% of the Ec2 instances are not able to execute the userscript , othere 97-98% of launched instances are running perfectly fine. These instances are being launched using spot requests so they are exactly similar in configurations. I am able to ssh into the faulty instances too . Last few lines of /var/log/cloud-init-output.log looks something like this:
Cloud-init v. 22.2-0ubuntu1~22.04.3 running 'modules:config' at Mon, 20 Feb 2023 08:21:36 +0000. Up 41.66 seconds. Cloud-init v. 22.2-0ubuntu1~22.04.3 running 'modules:final' at Mon, 20 Feb 2023 08:21:39 +0000. Up 45.11 seconds. Cloud-init v. 22.2-0ubuntu1~22.04.3 finished at Mon, 20 Feb 2023 08:21:39 +0000. Datasource DataSourceNone. Up 45.51 seconds 2023-02-20 08:21:39,953 - cc_final_message.py[WARNING]: Used fallback datasource
/var/lib/cloud/instance normally points to the a folder in with the name of instance id but in this faulty case , it is pointing at this -> /var/lib/cloud/instances/iid-datasource-none
For Faulty instance:
ubuntu@ip-172-31-21-43:/var/lib/cloud/instances/iid-datasource-none$ cat datasource DataSourceNone: DataSourceNone
For healthy instance:
ubuntu@ip-172-31-20-76:/var/lib/cloud/instances/i-0c964c834bf91ce81$ cat datasource DataSourceEc2Local: DataSourceEc2Local
Now I am not able to figure out that why the userscript is being executed in most of the instances but these few 2-3%.