awslabs / sagemaker-debugger

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors
Apache License 2.0
161 stars 83 forks source link

Add error handling for XGBoost #496

Closed ndodda-amazon closed 3 years ago

ndodda-amazon commented 3 years ago

Description of changes:

Add error handling for XGBoost. Defined a new has_default_hook_configuration for XGBoost since the default XGBoost collections do not match the general default collections.

Also defined a new function set_mode in KerasHook that gets wrapped with error handler and removed the error handling from the base hook's set_mode function. This is because set_mode is only called in the default smdebug configuration for TF, but not the other frameworks.

Style and formatting:

I have run pre-commit install to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-commenter commented 3 years ago

Codecov Report

Merging #496 (3b4d029) into master (8dccd06) will decrease coverage by 0.03%. The diff coverage is 41.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #496      +/-   ##
==========================================
- Coverage   72.59%   72.56%   -0.04%     
==========================================
  Files         115      115              
  Lines       10382    10383       +1     
==========================================
- Hits         7537     7534       -3     
- Misses       2845     2849       +4     
Impacted Files Coverage Δ
smdebug/xgboost/hook.py 0.00% <0.00%> (ø)
smdebug/xgboost/singleton_utils.py 0.00% <0.00%> (ø)
smdebug/core/hook.py 90.28% <100.00%> (+0.33%) :arrow_up:
smdebug/tensorflow/base_hook.py 73.52% <100.00%> (-0.29%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8dccd06...3b4d029. Read the comment docs.