awslabs / sagemaker-debugger

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors
Apache License 2.0
158 stars 82 forks source link

Compatibility with TF 2.7 #526

Closed adimux closed 2 years ago

adimux commented 2 years ago

...by fixing broken tests in test_keras_mirrored.py

Motive: Tensorflow 2.7.0 release

smdebug.tensorflow.get_hook() was returning None, which made many tests fail, but this was actually a false alarm. We just have to mark the version of tensorflow we are trying to release (2.7.0) as a supported framework version, so get_hook() doesn't return `None'...

I know what you think. It's not pythonic to silently fail and we should instead throw an exception, but this code preceded me and I don't want to introduce a breaking change (an exception being raised) and I don't want to introduce warnings either in case `get_hook()' gets called in unexpected ways multiple times during the lifecycle of a training job.

Instead, I just created a tests that alarms us of the situation when get_hook() might return None when it's actually just because we forgot to update a constant.

Description of changes:

Style and formatting:

I have run pre-commit install && pre-commit run --all-files to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-commenter commented 2 years ago

Codecov Report

Merging #526 (3927d20) into master (88be531) will decrease coverage by 43.71%. The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #526       +/-   ##
===========================================
- Coverage   74.70%   30.98%   -43.72%     
===========================================
  Files         127      117       -10     
  Lines       11111    10591      -520     
===========================================
- Hits         8300     3282     -5018     
- Misses       2811     7309     +4498     
Impacted Files Coverage Δ
smdebug/core/utils.py 60.98% <0.00%> (-19.84%) :arrow_down:
smdebug/pytorch/utils.py 0.00% <0.00%> (-57.15%) :arrow_down:
smdebug/tensorflow/singleton_utils.py 0.00% <ø> (-94.12%) :arrow_down:
smdebug/tensorflow/utils.py 0.00% <0.00%> (-86.18%) :arrow_down:
smdebug/pytorch/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/tensorflow/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/tensorflow/constants.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/core/config_validator.py 0.00% <0.00%> (-100.00%) :arrow_down:
smdebug/pytorch/singleton_utils.py 0.00% <0.00%> (-100.00%) :arrow_down:
...ug/profiler/analysis/utils/pandas_data_analysis.py 0.00% <0.00%> (-98.43%) :arrow_down:
... and 83 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 88be531...3927d20. Read the comment docs.

adimux commented 2 years ago

Our pipeline runs with an outdated version of Tensorflow (2.4). I will fix the pipeline later, but right now we need to unblock the release of 2.7. Fortunately, the DLC pipelines do run the sagemaker-debugger tests and I also ran them locally and they're green.

hongshanli23 commented 2 years ago

LGTM

abhinavs95 commented 2 years ago

LGTM

adimux commented 2 years ago

created other PR to be able to merge