Added sample sagemaker pyspark notebook for big data etl

awslabs / mlmax

Example templates for the delivery of custom ML solutions to production so you can get started quickly without having to make too many design choices.

https://mlmax.readthedocs.io/en/latest/

Apache License 2.0

66 stars 19 forks source link

Added sample sagemaker pyspark notebook for big data etl #44

Closed kianho closed 3 years ago

kianho commented 3 years ago

Issue #, if available: N/A

Description of changes:

Added a minimal example notebook of using sagemaker processing to process large, out-of-core datasets via pyspark.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

josiahdavis commented 3 years ago

Thanks @kianho! Since we will use this as a base to build out the data management and ETL module you mind putting it in in the following location while we work out the logic for the cfn and step functions? Thanks so much.

├── notebooks
├── contrib
│   └── data <-- put it here! 
├── modules
│   ├── environment
│   └── pipeline

kianho commented 3 years ago

Thanks @kianho! Since we will use this as a base to build out the data management and ETL module you mind putting it in in the following location while we work out the logic for the cfn and step functions? Thanks so much.
├── notebooks
├── contrib
│   └── data <-- put it here! 
├── modules
│   ├── environment
│   └── pipeline

@josiahdavis done