X-DataInitiative / tick

Module for statistical learning, with a particular emphasis on time-dependent modelling
https://x-datainitiative.github.io/tick/
BSD 3-Clause "New" or "Revised" License
491 stars 109 forks source link

Updated wheel release mechanism #511

Open PhilipDeegan opened 1 year ago

PhilipDeegan commented 1 year ago

Biggest change really is now,

Fixes somethings from https://github.com/X-DataInitiative/tick/issues/504

OSX still has some weird issues with random, and I'm guessing this is the reason why some real tests are failing now

for instance

Traceback (most recent call last):
  File "/Users/runner/work/tick/tick/tick/linear_model/tests/logistic_regression_test.py", line 93, in test_LogisticRegression_fit
    self.assertGreater(
AssertionError: 0.6981863527347516 not greater than 0.699 : solver sgd with penalty binarsity and intercept False reached too low AUC

I changed it from .7 to .69, but I'm not sure it really should be changed, maybe it's ok? @Mbompr thoughts?

With this pull request, we should be able to publish new wheels to both test-pypi repo on master merge And push to pypi regular on release/tag

Mbompr commented 1 year ago

Sorry, just discovering this thread :o, wow the amount of work is huge !

Honestly, these tests involving randomness are very bad designed, but I didn't know that back then :).

About changing the AUC threshold from .7 to .69, this is not an issue at all. The only thing we are checking is that the model is learning something significantly better than random (which would lead to 0.5), so 0.65 or 0.6 would also be fine.

About the darwin test filtering, this is due to

        # This is temporary
        # At the moment, seeds are not cross platform as distributions in
        # C++ depends on standard library shipped with compiler
        import os
        if os.name == 'posix':
            import platform
            if platform.system() == 'Darwin':
                # We check that we get the same as what was recorded
                np.testing.assert_almost_equal(seeded_sample_1, seeded_sample)

(note the "temporary"...). That is also bad designed. The test purpose was to check if with the same seed we were obtaining the same random numbers. But nothing really guarantees this and I guess an update in darwin random number generation can break this easily.

I would thus simply remove this part of the test as well as all seeded_sample arguments. I would need help tell me, otherwise I'm quite happy to not setup tick, cmake and swig on my machine :)