Closed sublimotion closed 4 years ago
/cc @Jeffwan I think this PR is ready to go, can you take a look at this?
@PatrickXYS @sublimotion I remember we have a scripts to prepare the data? Any reason to move all of them in the notebook?
What's the DAG now? Could you give me a screenshot? Originally, we didn't plan to add HPO job there, one of the reason is it takes some time. Have you updated hyper parameter to make sure it can finish in minutes. The reason I ask this is because we need to make sure parameter are well tuned for workshop users
@Jeffwan Primarily we just copy and paste original S3 data into our S3 bucket
In my opinion, I think the data has already been prepared and uploaded into the kubeflow-pipeline-data
S3 bucket.
Do you think we should follow the same pattern to prepare dataset on our side, and upload it to kubeflow-pipeline-data
bucket, then in the notebook, we just cp data into users' own S3 bucket?
@sublimotion Can you show us the DAG? Btw, we should fine-tune HPO and make sure it can be finished within minutes, it's very important to us.
@PatrickXYS @sublimotion I remember we have a scripts to prepare the data? Any reason to move all of them in the notebook?
What's the DAG now? Could you give me a screenshot? Originally, we didn't plan to add HPO job there, one of the reason is it takes some time. Have you updated hyper parameter to make sure it can finish in minutes. The reason I ask this is because we need to make sure parameter are well tuned for workshop users
I tried to use the existing dataset, but I was getting an error. I did not look for a data script. I find one that worked for this particular problem.
I think it is useful to include HPO, so that it will show an end to end example. The pipeline took in total ~35 min. If the user kick it off and move on to another example, they can go back and look at the pipeline once it has completed.
Can you collect the time that the HPO step takes? I think other steps should be fast, if HPO takes too long to proceed, given good user experience, we should fine-tune it.
Can you collect the time that the HPO step takes? I think other steps should be fast, if HPO takes too long to proceed, given good user experience, we should fine-tune it.
HPO Took 8 min total.
Overall it looks good to me, would you mind squash your three commits to be one commit?
@PatrickXYS I disable merge and rebase merge for this repo. We only supports squash merge. It's fine for this repo
Fix for SageMaker pipeline example.
Issue #1 , if available:
Description of changes: Fix for SageMaker KFP compilation error.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.