TimeEval / GutenTAG

GutenTAG is an extensible tool to generate time series datasets with and without anomalies; integrated with TimeEval.
MIT License
71 stars 13 forks source link

How to generate multiple ‘extremum’ outliers? #1

Closed Bigchicken98 closed 1 year ago

Bigchicken98 commented 1 year ago

image

CodeLionX commented 1 year ago

Hey,

You are trying to generate an extremum anomaly of length 15. This is not allowed because extreme values just consist of a single point. If you want to generate multiple extremum anomalies, just add multiple anomaly definitions:

timeseries:
- name: demo
  length: 1000
  base-oscillations:
  - kind: sine
    frequency: 4.0
    amplitude: 1.0
    variance: 0.05
  anomalies:
  # 1. extremum anomaly
  - position: beginning
    length: 1
    kinds:
    - kind: extremum
      min: true
      local: false
  # 2. extremum anomaly
  - &extremum-anomaly
    position: middle
    length: 1
    kinds:
    - kind: extremum
      min: false
      local: true
      context_window: 50
  # 3. extremum anomaly
  - <<: *extremum-anomaly
  # ...

Figure

Bigchicken98 commented 1 year ago

I adopted your method. I wanted to generate 50 extremum anomalies in base oscillation with length of 1000, but only 32 were generated

CodeLionX commented 1 year ago

50 extreme values in a time series of length 1000 is a contamination of 5%. Don't you think that this is too much for an anomaly detection use case?

Nevertheless, there is only so much that the automatic positioning algorithm in GutenTAG can do. With so many anomalies, you should specify their location manually and don't rely on the "beginning", "middle", "end" positions. You can do this with the exact-position-key:

timeseries:
- name: demo
  length: 1000
  base-oscillations:
  - kind: sine
    frequency: 4.0
    amplitude: 1.0
    variance: 0.05
  anomalies:
  - exact-position: 413  # index beginning from 0
    length: 1
    kinds:
    - kind: extremum
      min: true
      local: false
  # ...

I want to have a look why the anomalies disappear. Can you send me your configuration file?

Bigchicken98 commented 1 year ago

timeseries-declarations.txt Thank you!

CodeLionX commented 1 year ago

timeseries-declarations.txt Thank you!

There is actually a bug in the positioning algorithm that leads to multiple anomalies getting injected at the same position. We are working on a fix for that.

CodeLionX commented 1 year ago

We fixed the positioning algorithm in 801b0a316f2a829f0c2d539006cc6a12bb6bba82. GutenTAG should not miss any further anomalies from now on. You can use the current master to test this.

Bigchicken98 commented 1 year ago

Thank you for your help