Although online extraction of standard-sized thumbnails is achieved via a SageMaker endpoint, it's more of a functional aspect of the E2E pipeline than a data science activity. This PR therefore moves the setup of the pre-processing & thumbnailing endpoint (and its container image) from the walkthrough notebooks to the CDK app, to speed up solution setup.
Automate deployment of the pre-processing ECR image and thumbnailer endpoint within the CDK app (by default, can be disabled).
Simplify/remove affected sections of the walkthrough notebooks to use the pre-created image and endpoint.
Add a new Optional Extras notebook capturing the required steps to configure this manually from the notebook, in case users want to customise the code. Also move discussion of SageMaker endpoint auto-scaling into this optional notebook (previously NB2).
Refactor preproc code in preparation for closer integration with other script-mode features, to reduce duplication and pave the way to eventually get an automated test suite in there.
Update lockfile dependency versions
⚠️ BREAKING CHANGES:
This PR adds CDK app deployment dependencies, so users will need to re-run poetry install.
The lockfile CDK library version has upgraded, so users may need to re-run cdk bootstrap.
Additional SSM stack parameters have been added, so users will need to re-deploy their CDK app to use the new versions of the notebooks (new util.project.init(...) will fail against old deployed pipeline)
Testing done:
Re-deployed solution in test environment and validated relevant sections of affected notebooks.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available: N/A
Description of changes:
Although online extraction of standard-sized thumbnails is achieved via a SageMaker endpoint, it's more of a functional aspect of the E2E pipeline than a data science activity. This PR therefore moves the setup of the pre-processing & thumbnailing endpoint (and its container image) from the walkthrough notebooks to the CDK app, to speed up solution setup.
⚠️ BREAKING CHANGES:
poetry install
.cdk bootstrap
.util.project.init(...)
will fail against old deployed pipeline)Testing done:
Re-deployed solution in test environment and validated relevant sections of affected notebooks.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.