aai-institute / pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
https://pydvl.org
GNU Lesser General Public License v3.0
89 stars 9 forks source link

Inconsistent test behavior #474

Closed schroedk closed 1 month ago

schroedk commented 6 months ago

Running the test test_classwise_scorer_accuracies_manual_derivation locally on a Macbook M2 fails with an AssertionError, in contrast to CI running on Linux, where everything is fine.

janosg commented 3 months ago

I get the same problem on a Macbook M3. The error is assert 0.75 == 0.0

schroedk commented 2 months ago

The reason for the inconsistent test behavior is that:

np.array([np.nan]).astype(int)[0] == -9223372036854775808  (on linux-amd64 and maybe other systems)
np.array([np.nan]).astype(int)[0] == 0  (on osx-arm64)

see also this numpy issue.

According to this, I would say the test is flawed and should be rewritten from scratch or if possible just removed.

schroedk commented 2 months ago

The reason for the occurrence of np.nanis the following code:

class ClosedFormLinearClassifier:
    def __init__(self):
        self._beta = None

    def fit(self, x: NDArray, y: NDArray) -> float:
        v = x[:, 0]
        self._beta = np.dot(v, y) / np.dot(v, v)
        return -1

    def predict(self, x: NDArray) -> NDArray:
        if self._beta is None:
            raise AttributeError("Model not fitted")

        x = x[:, 0]
        probs = self._beta * x
        return np.clip(np.round(probs + 1e-10), 0, 1).astype(int)

    def score(self, x: NDArray, y: NDArray) -> float:
        pred_y = self.predict(x)
        return np.sum(pred_y == y) / 4

in tests/value/shapley/test_classwise.py. In the fit step, v is zero and thus self._beta is np.nan.

schroedk commented 2 months ago

@kosmitive do you have time to support on this? I created a draft PR with a temporary fix.