aws-samples / aws-dataexchange-api-samples

Samples to help you get started with the AWS Data Exchange API.
https://aws.amazon.com/data-exchange
MIT No Attribution
22 stars 13 forks source link

Update tf-auto-export-to-s3 to use new ExportRevisionToS3 feature, combining Boto3 Library Layer #44

Closed gerrardcowburn closed 3 years ago

gerrardcowburn commented 3 years ago

Issue #: N/A

Description of changes: Updated the tf-auto-export-to-s3 Python Subscriber to use the new ExportRevisionToS3 feature. Given this requires the Boto3 v1.17 library which isn't currently supported by Lambda, also included a Boto3 layer.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

dblock commented 3 years ago

subscribers/python/tf-auto-export-to-s3/Boto3LibLayer.zip looks empty?

dblock commented 3 years ago

Btw, no need to do it here, but you could improve the following code:

completed_jobs = set()
    while job_ids != completed_jobs:
        for job_id in job_ids:
            if job_id in completed_jobs:
                continue
            get_job_response = dataexchange.get_job(JobId=job_id)
            if get_job_response['State'] == 'COMPLETED':
                print ("Job {} completed".format(job_id))
                completed_jobs.add(job_id)
            if get_job_response['State'] == 'ERROR':
                job_errors = get_job_response['Errors']
                raise Exception('JobId: {} failed with errors:\n{}'.format(job_id, job_errors))
            # Sleep to ensure we don't get throttled by the GetJob API.
            time.sleep(0.2)

Here errors are raised as soon as one job fails, others are just left running. Probably a better pattern would be to collect results of all the jobs first that complete or error, then raise a single error for all the jobs that have failed. You could externalize the "wait for jobs" part in some generic helper.

You could also move the completed jobs to a separate collection this way you'd be iterating over a smaller and smaller set. And yes it means modifying the thing in which you're iterating while iterating through it, but you could start at the tail (not sure how to accomplish this in python ;).

gerrardcowburn commented 3 years ago

I've worked out how to include just the ADX specific boto3 library components as part of the boto3 layer here so I've swapped from uploading a full replica of boto3 as a zip file on the repo to just sharing the specific dataexchange/ folder and including the zip file creation in the build.sh. Let me know if this is OK.

melchii commented 3 years ago

Has the Python SDK not yet been updated in lambda to contain the ExportRevisionsToS3 code?

gerrardcowburn commented 3 years ago

No it's still on 1.16.31. I think the newer ADX APIs came in 1.17? In any case Lambda teams recommended including the necessary libraries as a layer to mitigate issues with Lambda versions changing underneath the function and causing problems.

On Wed, 7 Apr 2021 at 14:47, Mike Melchione @.***> wrote:

Has the Python SDK not yet been updated in lambda to contain the ExportRevisionsToS3 code?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/aws-samples/aws-dataexchange-api-samples/pull/44#issuecomment-814927985, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALGVSDVVXEOGHKWFDZEGI4TTHRO6TANCNFSM4YE76QLQ .