awslabs / sagemaker-debugger

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors
Apache License 2.0
161 stars 83 forks source link

Refactored MXNet/PyTorch/TF Exceptions, and End-of-training log directory check #613

Closed jleeleee closed 2 years ago

jleeleee commented 2 years ago

Description of changes:

Refactored all exception types under the smdebug/mxnet, smdebug/pytorch, and smdebug/tensorflow directories to return the SMDebugError exception type. Also changed the logic of the end of training log directory existence check in smdebug/core/hook_utils.py for the writer to return a boolean instead of raising an exception that gets caught later on, to make it clear that exceptions are generally meant to flow up to the customer, not to be used as a check.

Style and formatting:

I have run pre-commit install && pre-commit run --all-files to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.