awslabs / sagemaker-debugger

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors
Apache License 2.0
161 stars 83 forks source link

Extended Debugger reductions and fixed some bugs #523

Open NRauschmayr opened 3 years ago

NRauschmayr commented 3 years ago

Description of changes:

Extended smdebug's reductions to check for nan- and inf-values and to compute quantiles for PT tensors. Tensors are now also written out in Tensorboard format such that users can display all reductions for a specific tensor within the same visualization and visualizations will be grouped by Debugger collections. Here is an example visualization:

Screen Shot 2021-11-14 at 2 52 42 PM

Style and formatting:

I have run pre-commit install && pre-commit run --all-files to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-commenter commented 3 years ago

Codecov Report

Merging #523 (66999aa) into master (b4dd4c1) will decrease coverage by 6.36%. The diff coverage is 57.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #523      +/-   ##
==========================================
- Coverage   77.60%   71.24%   -6.37%     
==========================================
  Files         127      117      -10     
  Lines       11111    10614     -497     
==========================================
- Hits         8623     7562    -1061     
- Misses       2488     3052     +564     
Impacted Files Coverage Δ
smdebug/pytorch/utils.py 48.14% <23.52%> (-32.81%) :arrow_down:
smdebug/core/locations.py 85.71% <60.00%> (-5.96%) :arrow_down:
smdebug/core/reduction_config.py 86.58% <77.77%> (-9.52%) :arrow_down:
smdebug/core/hook.py 86.90% <81.25%> (-0.53%) :arrow_down:
smdebug/mxnet/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/mxnet/singleton_utils.py 0.00% <0.00%> (-100.00%) :arrow_down:
...debug/profiler/analysis/notebook_utils/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/mxnet/hook.py 0.00% <0.00%> (-84.85%) :arrow_down:
smdebug/mxnet/utils.py 0.00% <0.00%> (-78.13%) :arrow_down:
smdebug/rules/action/message_action.py 13.25% <0.00%> (-75.91%) :arrow_down:
... and 61 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b4dd4c1...66999aa. Read the comment docs.