HealthCatalyst / healthcareai-py

Python tools for healthcare machine learning
http://healthcare.ai
MIT License
309 stars 186 forks source link

Expected failing test_feature_availability_profiler tests on Linux (others?) #449

Open jlitzingerdev opened 6 years ago

jlitzingerdev commented 6 years ago

This is likely more a question than an issue, but an issue seemed more appropriate than StackOverflow for a unit test. With current master (installed in a virtualenv with all dependencies) I get a few errors/failures, one each of which are in test_feature_availability_profiler. Specifically:

healthcareai.tests.test_feature_availability_profiler.TestFeatureAvailabilityProfiler
healthcareai.tests.test_feature_availability_profiler.TestFeatureAvailabilityProfilerError3

raise exceptions about the fact that the elements are not date types, instead of the expected exception.

I noticed some changes in this area in cb4c162, are the failing tests expected in master, or is it likely a platform issue?

Debugging follows, feel free to ignore

Digging in, it looks as though feature_availability_profiler wants to verify the dtype of the Series is a datetime64[ns], yet since the initial type for the only element is int, the dtype becomes an object once datetimes are mixed in, whereas if it is instantiated only with datetimes the error goes away...my quick hackery:

    def setUp(self):
        self.df = pd.DataFrame(np.random.randn(1000, 2),
                               columns=['AdmitDTS',
                                        'LastLoadDTS'])
        # generate load date
        self.df['LastLoadDTS'] = pd.datetime(2015, 5, 20)
        # generate datetime objects for admit date
        delta = pd.datetime(2015, 5, 20) - pd.datetime(2015, 5, 1)
        int_delta = (delta.days * 24 * 60 * 60) + delta.seconds

        def test_time(random_second):
            return pd.datetime(2015, 5, 1) + timedelta(seconds=random_second)

        admit = [test_time(randrange(int_delta)) for _ in range(1000)]
        self.df['AdmitDTS'] = pd.Series.from_array(admit)
jlitzingerdev commented 6 years ago

Proposed: https://github.com/HealthCatalyst/healthcareai-py/pull/467