SlidingWindowTransformer for working with time-series like data

lmcinnes commented 3 years ago

This is essentially just a Taken's embedding, but gives it tools to work with classical time-series data. I tried to have a reasonable range of options/flexibility in how you sample windows, but would welcome any suggestions for further options. In principle it might be possible to "learn" good window parameters from the input data (right now fit does nothing beyond verifying parameters) but I don't quite know what the right way to do that would be exactly.

jc-healy commented 3 years ago

I'm not sure why the Travis is failing on your PR. The only test that is failing (test_cooccurrence_vectorizer_coo_mem_limit) doesn't look like it's in your code path. Colin do you have any ideas?

`__ test_cooccurrence_vectorizer_coo_mem_limit __ def test_cooccurrence_vectorizer_coo_mem_limit(): vectorizer_a = TokenCooccurrenceVectorizer( window_functions="fixed", n_iter=0, coo_max_memory="1k", normalize_windows=False, )

    vectorizer_b = TokenCooccurrenceVectorizer(
        window_functions="fixed",
        n_iter=0,
        normalize_windows=False,
    )
    data = [[np.random.randint(0, 10) for i in range(100)]]
    mat1 = vectorizer_a.fit_transform(data).toarray()
    mat2 = vectorizer_b.fit_transform(data).toarray()

  assert np.allclose(mat1, mat2)
E assert False

E + where False = <function allclose at 0x7ffbeeff4710>(array([[ 7., 3., 7., 0., 5., 4., 10., 7., 9., 3., 7., 5., 3.,\n 3., 3., 4., 9., 7., 2., 12.],\n ... 4., 0., 3., 4., 4., 4., 1., 4., 4., 3., 2., 3.,\n 3., 7., 6., 6., 3., 3., 4.]], dtype=float32), array([[ 7., 3., 7., 0., 5., 4., 10., 7., 9., 3., 7., 5., 3.,\n 3., 3., 4., 9., 7., 2., 12.],\n ... 4., 0., 3., 4., 4., 4., 1., 4., 4., 3., 2., 3.,\n 3., 7., 6., 6., 3., 3., 4.]], dtype=float32))

E + where <function allclose at 0x7ffbeeff4710> = np.allclose

../../../miniconda/envs/testenv/lib/python3.7/site-packages/vectorizers/tests/test_common.py:557: AssertionError`

lmcinnes commented 3 years ago

I was looking into this -- there is still an underlying problem, but it only occasionally triggers as it is related to the random data generated. Thus it trips occasionally when a travis build hits a bad random number for example.

jc-healy commented 3 years ago

Fair, enough, I'm going to accept the PR then and we'll keep in mind that we should try and track down that random data case at some point.

TutteInstitute / vectorizers

SlidingWindowTransformer for working with time-series like data #76