awsdocs / aws-doc-sdk-examples

Welcome to the AWS Code Examples Repository. This repo contains code examples used in the AWS documentation, AWS SDK Developer Guides, and more. For more information, see the Readme.md file below.
Apache License 2.0
9.46k stars 5.6k forks source link

EPIC: Glue MVP #2941

Closed Laren-AWS closed 8 months ago

Laren-AWS commented 2 years ago

Implement the scenario and service action calls to create examples for each SDK.

Service actions

Service actions can either be pulled out as individual functions or can be incorporated into the scenario, but each service action must be included as an excerpt in the SOS output.

GetCrawler CreateCrawler StartCrawler GetDatabase GetTables CreateJob StartJobRun ListJobs GetJobRuns GetJobRun DeleteJob DeleteTable DeleteDatabase DeleteCrawler

Scenario

A scenario runs at a command prompt and prints output to the user on the result of each service action. A scenario can run in one of two ways: straight through, printing out progress as it goes, or as an interactive question/answer script.

This scenario follows the console steps outlined in these two topics:

Scaffold resources

These resources are scaffolding for the scenario. Create them with the setup.yaml CloudFormation script or with the CDK in resources/cdk/glue_role_bucket. Running the script gives the role name and bucket name as outputs, which you will need to use in your scenario.

You will also need to upload the Python ETL script to the S3 bucket, either manually or through SDK calls:

Getting started with crawlers and jobs

  1. Create a crawler, pass it the IAM role and the URL to the public S3 bucket that contains the source data: s3://crawler-public-us-east-1/flight/2016/csv.
  2. Start the crawler. This takes a few minutes. Loop and call GetCrawler until it returns state 'READY'.
  3. Get the database created by the crawler and the tables in the database. Display these to the user.
  4. Create a job, pass it the IAM role and the URL to the Python ETL script you uploaded to the user's S3 bucket, something like: s3://doc-example-bucket-123456/flight_etl_job_script.py.
  5. Start a job run, pass it these custom Arguments. These are expected by the ETL script, so must match exactly:
    • --input_database: [name of the database created by the crawler]
    • --input_table: [name of the table created by the crawler]
    • --output_bucket_url: [URL to the scaffold bucket you created for the user]
  6. Loop and get the job run until it returns state 'SUCCEEDED', 'STOPPED', 'FAILED', or 'TIMEOUT'.
  7. Output data is stored in a group of files in the user's bucket. Either direct them to look at it or download a file for them and display some of the results.
  8. List jobs for the user's account.
  9. Get job run detail for a job run.
  10. Delete the demo job.
  11. Delete the database and tables.
  12. Delete the crawler.

SDKs

irenepsmith commented 1 year ago

Closed accidentally.

brmur commented 8 months ago

Reached minimum quote of SDK languages.