numpy.float and numpy.int deprecated/removed in newer versions of numpy

richard-lane commented 1 year ago

These data types have been deprecated for a little while, and have been removed in later versions of numpy:

https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

This means (e.g.) gradient boosting doesn't work with newer versions of numpy

They are identical to the python builtins float, int etc, so could be replaced by these; alternatively, if the numpy scalar type is explicitly required then these could be replaced by numpy.float64 etc.

richard-lane commented 1 year ago

here's the diff needed to replace numpy.float and numpy.int with float and int respectively:

diff --git a/hep_ml/commonutils.py b/hep_ml/commonutils.py
index b888c69..bf46e75 100755
--- a/hep_ml/commonutils.py
+++ b/hep_ml/commonutils.py
@@ -222,7 +222,7 @@ def compute_knn_indices_of_same_class(X, y, n_neighbours=50):
     :rtype numpy.array, shape [len(dataframe), knn], each row contains indices of closest signal events
     """
     assert len(X) == len(y), "different size"
-    result = numpy.zeros([len(X), n_neighbours], dtype=numpy.int)
+    result = numpy.zeros([len(X), n_neighbours], dtype=int)
     for label in set(y):
         is_signal = y == label
         label_knn = compute_knn_indices_of_signal(X, is_signal, n_neighbours)
diff --git a/hep_ml/losses.py b/hep_ml/losses.py
index e1e079a..89fc18f 100644
--- a/hep_ml/losses.py
+++ b/hep_ml/losses.py
@@ -727,7 +727,7 @@ class AbstractFlatnessLossFunction(AbstractLossFunction):

     def _compute_fl_derivatives(self, y_pred):
         y_pred = numpy.ravel(y_pred)
-        neg_gradient = numpy.zeros(len(self.y), dtype=numpy.float)
+        neg_gradient = numpy.zeros(len(self.y), dtype=float)

         for label in self.uniform_label:
             label_mask = self.label_masks[label]
diff --git a/hep_ml/metrics_utils.py b/hep_ml/metrics_utils.py
index b705f7a..e00c7e2 100644
--- a/hep_ml/metrics_utils.py
+++ b/hep_ml/metrics_utils.py
@@ -67,7 +67,7 @@ def compute_bin_indices(X_part, bin_limits=None, n_bins=20):
             variable_data = X_part[:, variable_index]
             bin_limits.append(numpy.linspace(numpy.min(variable_data), numpy.max(variable_data), n_bins + 1)[1: -1])

-    bin_indices = numpy.zeros(len(X_part), dtype=numpy.int)
+    bin_indices = numpy.zeros(len(X_part), dtype=int)
     for axis, bin_limits_axis in enumerate(bin_limits):
         bin_indices *= (len(bin_limits_axis) + 1)
         bin_indices += numpy.searchsorted(bin_limits_axis, X_part[:, axis])
diff --git a/tests/test_gradientboosting.py b/tests/test_gradientboosting.py
index fbec64a..c77ac68 100644
--- a/tests/test_gradientboosting.py
+++ b/tests/test_gradientboosting.py
@@ -112,7 +112,7 @@ def test_constant_fitting(n_samples=1000, n_features=5):
     Testing if initial constant fitted properly
     """
     X, y = generate_sample(n_samples=n_samples, n_features=n_features)
-    y = y.astype(numpy.float) + 1000.
+    y = y.astype(float) + 1000.
     for loss in [MSELossFunction(), losses.MAELossFunction()]:
         gb = UGradientBoostingRegressor(loss=loss, n_estimators=10)
         gb.fit(X, y)

(i didn't run the tests because I couldn't get them to immediately work)

arogozhnikov commented 1 year ago

(i didn't run the tests because I couldn't get them to immediately work)

Those are all safe changes, can you open PR anyways?

arogozhnikov commented 1 year ago

Merged, thanks Richard!

arogozhnikov / hep_ml

numpy.float and numpy.int deprecated/removed in newer versions of numpy #77