Closed vikeshpandey closed 3 years ago
I didn't find the info you mentioned. Could you point me to the file, line that has this instruction, pleas?
sure, so this is the notebook which uploads the data to s3:
https://github.com/awslabs/amazon-sagemaker-mlops-workshop/blob/master/lab/01_CreateAlgorithmContainer/03_Testing%20the%20container%20using%20SageMaker%20Estimator.ipynb
and it is part of lab01
and then in this notebook:
https://github.com/awslabs/amazon-sagemaker-mlops-workshop/blob/master/lab/02_TrainYourModel/01_Training%20our%20model.ipynb
it says: The dataset was already uploaded in the Exercise: 01 - Creating a Classifier Container. So, we just need to start a new automated training/deployment job in our MLOps env.
and it does not have any cell to upload the data to s3 and hence if you skipped the lab01 and directly jump to lab02, the training fails with s3 error.
Fixed. Thanks for pointing it out.
Hm, are you sure this fixed it? The missing file for the training is s3://sagemaker-us-east-1-ACCTNR/iris-model/input/train/training.csv but the added files in the Dec 1st commit are under "/mlops/iris/...". The failed training puzzled me a lot until I realized this (as the S3 error hinted towards a permission problem, so I kept chasing that instead).
I did the same thing as @vikeshpandey and skipped over (i) and (ii) and went straight to (iii), but then the training failed in CodePipeline. It worked for a colleague of mine though, and I eventually realized it is because the "/iris-model/input/train/training.csv" file is uploaded in step (ii), and he had run through everything. Once I did the optional steps, step (iii) worked as expected.
You are right Jens. Just pushed the correct fix for this issue. Thanks.
Great, thanks -- I have not tested the fix but it looks sensible from just viewing the commit.
the readme says that lab01 is optional and you can skip it. but if we skip it, the required training data is never uploaded to the required s3 bucket and the lab02 fails. the lab02 needs to be fixed to include the cells for uploading the train and validation datasets. is this issue known? let me know and i can drop a PR to fix it.