awslabs / sagemaker-debugger

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors
Apache License 2.0
161 stars 83 forks source link

Supporting PT 1.8 #455

Closed leleamol closed 3 years ago

leleamol commented 3 years ago

Description of changes:

The PR contains code change to support the PT 1.8

The change is mainly due to the fact the PT1.8 has changed the signatures of private methods in torch autograd profiler.

There is no other functionality changes. The integration tests suite was run against this code change with DLC that contained the updated torch 1.8 binary wheel.

Here is the link: https://tiny.amazon.com/gwxvxtl8/IsenLink

Testing:

Ran the unit tests in the DLC container with PT1.8. The unit tests include eagleeye zero code change tests too.

Style and formatting:

I have run pre-commit install to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-io commented 3 years ago

Codecov Report

Merging #455 (6911f3c) into master (7230c4a) will decrease coverage by 9.04%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #455      +/-   ##
==========================================
- Coverage   51.05%   42.01%   -9.05%     
==========================================
  Files         162      162              
  Lines       12898    12898              
==========================================
- Hits         6585     5419    -1166     
- Misses       6313     7479    +1166     
Impacted Files Coverage Δ
smdebug/mxnet/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/rules/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/mxnet/singleton_utils.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/rules/action/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/mxnet/hook.py 0.00% <0.00%> (-93.80%) :arrow_down:
smdebug/rules/action/action.py 0.00% <0.00%> (-87.76%) :arrow_down:
smdebug/mxnet/utils.py 0.00% <0.00%> (-87.50%) :arrow_down:
smdebug/rules/action/message_action.py 0.00% <0.00%> (-81.93%) :arrow_down:
smdebug/core/reader.py 0.00% <0.00%> (-77.78%) :arrow_down:
smdebug/core/reductions.py 17.39% <0.00%> (-76.09%) :arrow_down:
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7230c4a...6911f3c. Read the comment docs.

leleamol commented 3 years ago

The codebuild project for 1.8 is already created. Would need help in setting up the webhook for PR CI

ndodda-amazon commented 3 years ago

The codebuild project for 1.8 is already created. Would need help in setting up the webhook for PR CI

To clarify, I meant a build to run the unit tests for PT 1.8. But as we discussed offline, you've manually run the unit tests in the PT 1.8 container, which is good.

Please retrigger the CI, I'll approve once tests pass.