awslabs / sagemaker-debugger

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors
Apache License 2.0
161 stars 83 forks source link

Cache the output of is_framework_version_supported #508

Closed NihalHarish closed 3 years ago

NihalHarish commented 3 years ago

Description of changes:

Testing

I have validated that this change reduces the perf impact of the function using pyinstrument. See code below:

from torch.utils.smdebug import get_smdebug_hook

NUM_ITER = 10000

for i in range(NUM_ITER):
    hook = get_smdebug_hook()

Without change:

Program: test_smdebug_hook.py

1.300 <module>  <string>:1
   [4 frames hidden]  <string>, runpy
      1.300 _run_code  runpy.py:64
      └─ 1.300 <module>  test_smdebug_hook.py:1
         ├─ 0.782 get_smdebug_hook  torch/utils/smdebug.py:28
         │  ├─ 0.351 error_handling_agent  smdebug/core/error_handling_agent.py:99
         │  │  └─ 0.349 get_hook  smdebug/pytorch/singleton_utils.py:17
         │  │     ├─ 0.248 validate_training_job  smdebug/core/config_validator.py:65
         │  │     │  └─ 0.242 _validate_training_environment  smdebug/core/config_validator.py:25
         │  │     │     └─ 0.240 is_framework_version_supported  smdebug/core/utils.py:641
         │  │     │        └─ 0.238 is_current_version_supported  smdebug/pytorch/utils.py:91
         │  │     │           └─ 0.223 parse  packaging/version.py:49
         │  │     │                 [24 frames hidden]  packaging, <built-in>, <string>

With Change:

Program: test_smdebug_hook.py

1.021 <module>  <string>:1
   [4 frames hidden]  <string>, runpy
      1.021 _run_code  runpy.py:64
      └─ 1.021 <module>  test_smdebug_hook.py:1
         ├─ 0.519 get_smdebug_hook  torch/utils/smdebug.py:28
         │  ├─ 0.261 <module>  smdebug/__init__.py:2
         │  │  └─ 0.261 <module>  smdebug/core/collection.py:2
         │  │     └─ 0.261 <module>  smdebug/core/reduction_config.py:2
         │  │        └─ 0.260 <module>  smdebug/core/utils.py:2
         │  │           ├─ 0.193 <module>  smdistributed/modelparallel/torch/__init__.py:2
         │  │           │     [655 frames hidden]  smdistributed, smexperiments, pkg_res...
         │  │           ├─ 0.041 <module>  requests/__init__.py:8
         │  │           │     [122 frames hidden]  requests, urllib3, re, sre_compile, s...
         │  │           └─ 0.021 <module>  horovod/torch/__init__.py:17
         │  │                 [64 frames hidden]  horovod, psutil, <built-in>, collecti...

Style and formatting:

I have run pre-commit install to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-commenter commented 3 years ago

Codecov Report

Merging #508 (42117b0) into master (314c091) will decrease coverage by 1.73%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #508      +/-   ##
==========================================
- Coverage   75.89%   74.16%   -1.74%     
==========================================
  Files         126      116      -10     
  Lines       10872    10527     -345     
==========================================
- Hits         8251     7807     -444     
- Misses       2621     2720      +99     
Impacted Files Coverage Δ
smdebug/core/utils.py 80.17% <100.00%> (-1.12%) :arrow_down:
smdebug/rules/action/message_action.py 81.92% <0.00%> (-15.67%) :arrow_down:
smdebug/xgboost/utils.py 0.00% <0.00%> (-14.76%) :arrow_down:
smdebug/profiler/tf_profiler_parser.py 54.54% <0.00%> (-11.58%) :arrow_down:
smdebug/rules/action/stop_training_action.py 54.68% <0.00%> (-7.82%) :arrow_down:
smdebug/mxnet/collection.py 73.33% <0.00%> (-6.67%) :arrow_down:
smdebug/mxnet/utils.py 59.37% <0.00%> (-6.25%) :arrow_down:
smdebug/core/logger.py 70.83% <0.00%> (-5.56%) :arrow_down:
smdebug/core/access_layer/s3.py 91.54% <0.00%> (-4.23%) :arrow_down:
smdebug/core/reader.py 85.18% <0.00%> (-3.71%) :arrow_down:
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 314c091...42117b0. Read the comment docs.