Open ericphanson opened 4 years ago
IsoZNormalizer
is short for "isometric", which it started out as being. Now it's actually using a diagonal covariance matrix, i.e., a separate variance for each dimension, so the name is misleading and I should change that.
The undef
is due to advance!
not being called, it adds one point at a time so that if early stopping is employed, it does not do unnecessary work. You can see how it's used in the tests
https://github.com/baggepinnen/SlidingDistancesBase.jl/blob/a124a374add252d8e7637da805c66b7f92c49826/test/test_normalizers.jl#L76
I can add some docs for the normalizers if you want to add new ones, so far I only have Z
and the poorly named IsoZ
.
Ah okay, thanks for explaining. Some docs would be great! But I can't promise that I'll add new normalizers any time soon, so no worries if it's not a priority.
Right now, I am interested in using normalizers with sparse_distmat
. However it seems like in this context it makes more sense to prenormalize each signal instead of doing it online (like in the dtwnn
context). So I'm just doing
using StatsBase
function z_normalize!(X)
dt = fit(ZScoreTransform, X, dims=2)
StatsBase.transform!(dt, X)
end
z_normalize!.(y)
before passing y
to sparse_distmat
.
Yes, if the goal is not to operate on sliding windows of a long sequence, the normalizer types have little benefit and you'd be just as well off normalizing in advance.
The interface for the normalizers got overly complicated, but I couldn't see a straightforward way of improving it so it's left at being complicated :/
Note that sparse_distmat
is not super smart, and you might be able to improve upon the performance by clever use of some accelerating data structure. It actually computes all O(N^2) distances, but only stores a small amount of them. It does make use of some pruning and stuff like that, but something like a ball tree or a VPTree could potentially allow for even earlier termination or skipping some distance computations entirely.
It also doesn't use any threading which should be quite easy to add
Hi again @baggepinnen
I was wondering what
IsoZNormalizer
does, and if theundef
's here are expected:I was reading through https://www.cs.unm.edu/~mueen/DTW.pdf and they say that z-normalization is essential, and in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5668684/ they mention that in the multivariate case, each dimension should be z-normalized separately. Is that what
IsoZNormalizer
does?