awsdocs / aws-doc-sdk-examples

Welcome to the AWS Code Examples Repository. This repo contains code examples used in the AWS documentation, AWS SDK Developer Guides, and more. For more information, see the Readme.md file below.
Apache License 2.0
9.57k stars 5.63k forks source link

Swift Glue MVP #4598

Open shepazon opened 1 year ago

shepazon commented 1 year ago

Implement the following for the Swift SDK.

Service actions

Service actions can either be pulled out as individual functions or can be incorporated into the scenario, but each service action must be included as an excerpt in the SOS output.

Scenario

A scenario runs at a command prompt and prints output to the user on the result of each service action. A scenario can run in one of two ways: straight through, printing out progress as it goes, or as an interactive question/answer script.

This scenario follows the console steps outlined in these two topics:

Scaffold resources

These resources are scaffolding for the scenario. Create them with the setup.yaml CloudFormation script or with the CDK in resources/cdk/glue_role_bucket. Running the script gives the role name and bucket name as outputs, which you will need to use in your scenario.

You will also need to upload the Python ETL script to the S3 bucket, either manually or through SDK calls:

Getting started with crawlers and jobs

  1. Create a crawler, pass it the IAM role and the URL to the public S3 bucket that contains the source data: s3://crawler-public-us-east-1/flight/2016/csv.
  2. Start the crawler. This takes a few minutes. Loop and call GetCrawler until it returns state 'READY'.
  3. Get the database created by the crawler and the tables in the database. Display these to the user.
  4. Create a job, pass it the IAM role and the URL to the Python ETL script you uploaded to the user's S3 bucket, something like: s3://doc-example-bucket-123456/flight_etl_job_script.py.
  5. Start a job run, pass it these custom Arguments. These are expected by the ETL script, so must match exactly:
    • --input_database: [name of the database created by the crawler]
    • --input_table: [name of the table created by the crawler]
    • --output_bucket_url: [URL to the scaffold bucket you created for the user]
  6. Loop and get the job run until it returns state 'SUCCEEDED', 'STOPPED', 'FAILED', or 'TIMEOUT'.
  7. Output data is stored in a group of files in the user's bucket. Either direct them to look at it or download a file for them and display some of the results.
  8. List jobs for the user's account.
  9. Get job run detail for a job run.
  10. Delete the demo job.
  11. Delete the database and tables.
  12. Delete the crawler.
github-actions[bot] commented 1 year ago

Marked stale by the Shirriff. Notifying @awsdocs/aws-sdk-docs-code-maintainers