aws / sagemaker-training-toolkit

Train machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Apache License 2.0
496 stars 118 forks source link

Enable custom failure logging #118

Closed satishpasumarthi closed 2 years ago

satishpasumarthi commented 2 years ago

Issue #, if available: https://github.com/aws/sagemaker-training-toolkit/issues/111

Description of changes: The current state of SM Training toolkit doesn't allow users to write their own custom failure messages in the failure file. This PR fixes this issue by allowing users to do the above. Also, we have made capture error to True by default. Testing done: Unit tests and Integration tests passed locally.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

Tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

satishpasumarthi commented 2 years ago

@josephevans Could you please review this PR?

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository