SageMaker Async Inference with scale-to-zero

Issue #, if available: #8

Description of changes:

Update pipeline to support both asynchronous and real-time SageMaker endpoints.
Add an optional notebook extension to deploy the trained model to an async endpoint with autoscale-to-zero functionality, to optimize running cost for low-traffic processes.

Additional updates:

Normalize inference.py to use 'S3Uri' in place of 'URI' for consistency with other pipeline components.
- BREAKING: This means using the new version of the notebook with an old trained model and inference.py (or vice versa) will not work. However, the pipeline integration should be backward-compatible. If needed, use the "deep dive" model deployment steps in notebook 2 to re-package your existing model.tar.gz with the updated inference.py.
Rename the preproc custom container ECR repo and broaden DataSci permissions to better accommodate experiments with custom training/inference containers.
- BREAKING: Since the target pre-processing ECR repo is renamed and the SageMaker ECR permissions are changed to match the new naming convention, data scientists may no longer be able to access the old ECR repo after the updated CDK stack is deployed. If needed, re-run the container build step in notebook 1 and manually delete the old sm-scikit-ocrtools repository from ECR.
Adjust some model training parameters to better match with recent performance testing.

Targeting this change against support/1.x before porting to main.

Testing done:

Both sync and async flows run successfully in test environment. Async endpoint correctly scales to 0 instances under no-traffic and re-starts to serve incoming requests when required.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

aws-samples / amazon-textract-transformer-pipeline

SageMaker Async Inference with scale-to-zero #13