awslabs / aws-service-catalog-puppet

This is a framework where you list your AWS accounts with tags and your AWS Service Catalog products with tags or target accounts. The framework works through your lists, dedupes and spots collisions and then provisions the products into your AWS accounts for you. It handles the Portfolio sharing, its acceptance and can provision products cross account and cross region.
Apache License 2.0
76 stars 41 forks source link

Intermittent Error: An error occurred (ValidationException) when calling the DescribeProvisioningParameters #136

Open jordan-evans opened 5 years ago

jordan-evans commented 5 years ago

We're seeing the below error with Puppet during deployments

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the DescribeProvisioningParameters operation: S3 error: Access Denied 
For more information check http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html 

They seem to be intermittent, not happening every run, and a retry normally fixes it

eamonnfaherty commented 5 years ago

Hi Jordan

Sorry you are seeing this issue.

Would you be able to send me the full codebuild output and I will take a look for anything suspicious.

beataa commented 5 years ago

Hello,

I am experiencing same issue. Here is stack trace:

Traceback (most recent call last):

File "/usr/local/lib/python3.7/site-packages/luigi/worker.py", line 199, in run new_deps = self._run_get_new_deps()

File "/usr/local/lib/python3.7/site-packages/luigi/worker.py", line 139, in _run_get_new_deps task_gen = self.task.run()

File "/usr/local/lib/python3.7/site-packages/servicecatalog_puppet/luigi_tasks_and_targets.py", line 187, in run ProductId=self.product_id, ProvisioningArtifactId=self.version_id, PathId=path_id,

File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call return self._make_api_call(operation_name, kwargs)

File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call raise error_class(parsed_response, operation_name)

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the DescribeProvisioningParameters operation: S3 error: Access Denied For more information check http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

[Container] 2019/09/19 09:51:19 Command did not exit successfully servicecatalog-puppet --info deploy manifest-expanded.yaml exit status 1

Beata

jordan-evans commented 5 years ago

The error is thrown on this line:

                provisioning_artifact_parameters = service_catalog.describe_provisioning_parameters(
                    ProductId=product_id, ProvisioningArtifactId=version_id, PathId=path_id,
                ).get('ProvisioningArtifactParameters', [])

Which looks fine. The service_catalog object is created from the call to

with betterboto_client.CrossAccountClientContextManager(
                'servicecatalog', role, f'sc-{self.region}-{self.account_id}', region_name=self.region
        ) as service_catalog

Could we be looking at a betterboto bug here instead maybe? Are some kind of temporary credentials expiring or something, which causes the Access Denied errors?

atoa commented 5 years ago

I am also sporadically hitting the same exact error at the same code path reported above. Version 0.35.0.

I work around it by retrying the pipeline Deploy stage one or two times and eventually goes through. However, I am noticing that it is getting more frequent as I add more launches to the manifest.

I am wondering, given the sporadic nature of this issue, if this this may be caused by a threading/concurrency issue of boto3 running under luigi.

Would a threading issue be a plausible explanation?

jordan-evans commented 5 years ago

I captured one of the requests that failed with us, and then hit it with a quick python script, making the same call over and over again with the same parameters as when it failed.

No issues, returned ok every time. So I think this rules out any issues with the actual Service Catalog API maybe?

eamonnfaherty commented 5 years ago

https://github.com/awslabs/aws-service-catalog-puppet/releases/tag/0.43.0 has an attempt to fix this.

At the moment you would have to bootstrap the spokes yourself with the latest version of servicecatalog-puppet-spoke.template.yaml

Rebootstrapping spokes and bootstrapping an OU is in the pipeline.

Please let me know if this release changes the behaviour you are seeing.

dheeraj-tripathi commented 5 years ago

I hit the same error today, seems to be intermittent , as it only occurred once, thereafter everything seems to be running fine

Service Catalog : S3 error: Access Denied For more information check http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html (Service: AmazonCloudFormation; Status Code: 400; Error Code: ValidationError; Request ID: 1d66b8e6-93d4-493a-8866-bc46d6e332d5)

eamonnfaherty commented 5 years ago

Would you be able to share which version was used to bootstrap your spoke where you saw this?

It should be a parameter for your servicecatalog-puppet-spoke stack in CloudFormation

eamonnfaherty commented 5 years ago

I am seeing this in my local testing. When rerunning the workflow the error goes away.

As part of a performance improvement piece I am moving the code that errors into its own luigi task and I am going to be calling it significantly less frequently.

At the moment it is called for each product version in each spoke but after the change I am going to call it once for the master account and have luigi handle the sharing of the value.

If that does not fix it I can request luigi retry that task upto n times. This will not fix the issue but will give a better experience the majority of the time.

jordan-evans commented 4 years ago

Error message has changed as of today I've noticed. It's now something like:

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the DescribeProvisioningParameters operation: S3 url for getting CFN template: https://s3.eu-west-2.amazonaws.com/<auto_generated_bucket_name>
tv-17 commented 4 years ago

Facing this error too. The latest version 0.70.0 does not fix this.

nikolaigauss commented 4 years ago

Version 0.70.1 is experiencing this issue as well.

puddleglum1904 commented 3 years ago

Ran into this same error on version 0.91.0, a re-run on the deploy stage went through w/o issue.