brentp / skidmarks

find runs (non-randomness) in sequences
MIT License
7 stars 2 forks source link

Which tests are adequate to use with non-binary sequences? #6

Open robertour opened 6 years ago

robertour commented 6 years ago

From the README.md:

This module implements some functions to check a sequence for randomness. in some cases, it is assumed to be a binary sequence (not only 1's and 0's but containing only 2 distinct values)

Could you clarify which of the tests are OK for non-binary (multivariate) and continuous (real values)?

As discussed in https://github.com/brentp/skidmarks/issues/4, there is no restriction for autocorrelation, but what about the other three tests: gap_test, wald_wolfowitz, serial_test,

I will probably have to research them and try to complete the documentation, but if you already know something, it would be a start.

robertour commented 6 years ago

The wald_wolfowitz is, in principle, a two valued test.

The https://github.com/brentp/skidmarks/blob/master/skidmarks.py#L85 would count the runs for the value/digit/symbol that appears first to count the runs; and the https://github.com/brentp/skidmarks/blob/master/skidmarks.py#L86 would builds the second group with all the rest of the values/digits/symbols.

It is suggested by @storchi in the comments in this SO question that a generalization should be possible.

This seems correct, as the tests depends on the expected mean runs and the expected variance runs (https://github.com/brentp/skidmarks/blob/master/skidmarks.py#L88-L92)

Looking now for a reputable source of how to correctly implement this generalization.

robinsonkwame commented 3 years ago

@psinger You would do something like this: https://github.com/psinger/RunsTest