Define, create and include synthetic datasets for different kinds of anomalies.
This is important for regressions, as the simple data can stress (at different difficulties) certain properties of HTM. I will also help to define concrete advantages and weak spots of HTM.
[ ] review papers on (synthetic) anomaly datasets/classes of anomalies
[ ] reach for review/ideas
[ ] ResearchGate
[ ] HTM ML
[ ] other forums, where???
[ ] collect and describe all "theoretical classes" of anomalies
1D
[ ] point anomaly
[ ] amplitude shift
[ ] phase shift
[ ] frequency shift
[ ] noisy
[ ] combination of the above
[ ] generating distribution change
nD
[ ] de/correlated variables (multi modal input)
by input
[ ] data with "holes"
[ ] "tricky" data, designed to look similar (overlapping sequences,..)
[ ] auto-tuning on "far data", eg each 1000th is A, then num 121000 is B instead of A.
[ ] generate data
[ ] synthetic data for each of the classes
[ ] a well known published dataset on given class, for comparison with other algs.
Theoretical challenges
[ ] Different kinds of anomalies
We want to detect all as anomalies, but we may want to differentiate among them. An examples is in the ECG MIT-BIH data, where there are _V_etricular anomalies (easy) and about 4 more types. This somewhat combines anomaly detection with classification of sequences.
[ ] Scope!
For example temperature. Measured every morning, 7am I get a relatively stable, slow changing pattern; measured every hour I get stable pattern with significant changes; measured every 7h it looks like random data.
So the question is, how can HTM "decide" optimal aggregation, focus scale? An example, GPS position reported every second, how do you scale?
[ ] Model auto-adaptation
Should all of these be part of one HTM/anomaly model? Or run as an ensemble of specific models?
[ ] Anomaly prediction
Yes, it's an oxymoron, but everybody wants it! :icecream: I think this is a core problem, my ideas include running combination HTM of different HTM models (with different scale) ...
Define, create and include synthetic datasets for different kinds of anomalies. This is important for regressions, as the simple data can stress (at different difficulties) certain properties of HTM. I will also help to define concrete advantages and weak spots of HTM.
Note: not accepted for NAB, so moving from there https://github.com/numenta/NAB/issues/217
A
, then num 121000 isB
instead of A.We want to detect all as anomalies, but we may want to differentiate among them. An examples is in the ECG MIT-BIH data, where there are _V_etricular anomalies (easy) and about 4 more types. This somewhat combines anomaly detection with
classification of sequences
.For example temperature. Measured every morning, 7am I get a relatively stable, slow changing pattern; measured every hour I get stable pattern with significant changes; measured every 7h it looks like random data.
So the question is, how can HTM "decide" optimal aggregation, focus scale? An example, GPS position reported every second, how do you scale?
Should all of these be part of one HTM/anomaly model? Or run as an ensemble of specific models?
Yes, it's an oxymoron, but everybody wants it! :icecream: I think this is a core problem, my ideas include running combination HTM of different HTM models (with different scale) ...