awslabs / sagemaker-debugger

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors
Apache License 2.0
161 stars 83 forks source link

fix: numpy deprecated aliases #639

Closed ShiboXing closed 1 year ago

ShiboXing commented 1 year ago

Description of changes:

Style and formatting:

I have run pre-commit install && pre-commit run --all-files to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

ShiboXing commented 1 year ago

I've tested with the newest numpy that the following checks are all True.

import numpy as np
test = np.array([True, False])

test.dtype == np.bool_
test.dtype == bool

test = np.array([1.0, 2.0])
test.dtype == float
test.dtype == np.float_
np.float_ == np.float64

However, I think using np.float and np.bool instead of native python data types could be safer due to potential future changes by numpy. What do you think?

yl-to commented 1 year ago

I've tested with the newest numpy that the following checks are all True.

import numpy as np
test = np.array([True, False])

test.dtype == np.bool_
test.dtype == bool

test = np.array([1.0, 2.0])
test.dtype == float
test.dtype == np.float_
np.float_ == np.float64

However, I think using np.float and np.bool instead of native python data types could be safer due to potential future changes by numpy. What do you think?

I tested: np.bool_ == bool, it returned False, are they different?

ShiboXing commented 1 year ago

Yeah they are the same when they are compared to the dtype of a np.array separately. But their classes are impelmented differently.

I just realized that dtypes are returned from this mapping: https://github.com/awslabs/sagemaker-debugger/blob/ff3b53607c36008164372b4fa785545f2e3ec2ef/smdebug/core/tfevent/event_file_reader.py#L42 Here the bool is python native. So I think it's better to stick with Python native for if check as well?

yl-to commented 1 year ago

A deprecation warning notice from numpy is like this:

DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. 
To silence this warning, use `float` by itself. 
Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.

So I think we would like to follow numpy's suggestion, use built-in types for safe changes.

codecov-commenter commented 1 year ago

Codecov Report

Base: 69.00% // Head: 69.00% // No change to project coverage :thumbsup:

Coverage data is based on head (ff3b536) compared to base (ff3b536). Patch has no changes to coverable lines.

:exclamation: Current head ff3b536 differs from pull request most recent head c08a155. Consider uploading reports for the commit c08a155 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #639 +/- ## ======================================= Coverage 69.00% 69.00% ======================================= Files 127 127 Lines 11674 11674 ======================================= Hits 8056 8056 Misses 3618 3618 ``` Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

ShiboXing commented 1 year ago

np.greater(counts, 0, dtype=np.int32)) returns the result:

TypeError: No loop matching the specified signature and casting was found for ufunc greater