TimeEval / GutenTAG

GutenTAG is an extensible tool to generate time series datasets with and without anomalies; integrated with TimeEval.
MIT License
71 stars 13 forks source link

Create new mode - ts_augmentation #35

Open DhavalRepo18 opened 1 year ago

DhavalRepo18 commented 1 year ago

We are user of this repo to create time series. We like to introduce new mode on a top of supervised and semi-supervised, call "ts-augmentation' where we produce

We can provide a small code.

CodeLionX commented 1 year ago

Thank you for using our library! We are happy to discuss extensions to GutenTAG.

For me, your description of the output sounds exactly like our semi-supervised output mode:

Can you elaborate a bit further? What exactly is the input and output of the newly proposed mode, and why is it currently not supported by GutenTAG?

DhavalRepo18 commented 1 year ago

The variance was an issue when we tested. So we want to avoid touching that stuff. There is one fix with variance, i.e., if we pass the same seed, it will meet the need even with a variance setting.

CodeLionX commented 1 year ago

That is precisely how GutenTAG should behave.

I still don't understand your use case, though.

DhavalRepo18 commented 1 year ago

Our use case is very simple. We wanted to generate

We tried using semi-supervised, but when we added variance cases, it change the base time series as well as the anomalous time series. We like to have a separate mode where it works with variance stuff too.

https://github.com/HPI-Information-Systems/gutentag/blob/main/gutenTAG/generator/timeseries.py#L52

Code is passing the new random seed, and we like to use the same seed. this way even with variance it remains the same. We do have some code written that can be made available.

CodeLionX commented 1 year ago

OK. I get your point.

Currently, you can achieve this by calling GutenTAG twice:

  1. Generate your base time series (TS) and store to disk.
  2. Load generated TS using custom-input-BO and generate your augmented TS with anomalies etc. (semi-supervised=False and supervised=False)

But if you want to generate many such TS, it's quite tedious.

How do you propose to solve this? It seems to me that the different seeding is the only thing that is preventing you from using the semi-supervised-mode. Is this true? If yes, then adding a third output-mode that just uses the same seed is everything we need to change.

sangy14 commented 1 year ago

@CodeLionX I am also working with @DhavalRepo18. Yes, as mentioned by you different seeds were causing the issue. So we did the following to generate time-series and same time-series + anomalies. image So as mentioned by @DhavalRepo18, we would like to have one more mode which would allow us to do so.

CodeLionX commented 1 year ago

Introducing another mode might work. However, the modes are not mutually exclusive and can be used together. This means that we would need to generate an additional time series with the same contents as the test time series — or just copy it? This implies adding a new TrainingType and a new filename. In addition, we need to change many parts within GutenTAG, we break backwards compatibility, and we lose compatibility with TimeEval (TimeEval would not support the new learning type; I would rather not encourage training on this new data format as well). Seems to be many drawbacks with limited usability, IMHO.

I would propose the following: We add another setting key exact-train-bo: bool = False and ensure that independent of whether semi-supervised or supervised is enabled, the BO of the training time series and testing time series are the same if enabled. Only the anomalies would differ in this case. This setting now applies to all existing learning types (TrainingType) and we can show a warning (training on this data might lead to overfitting / bad generalizability) if it is enabled. We still need to touch some parts in GutenTAG, but only additions in the config and input behavior while maintaining backwards compatibility.


Does this align with your requirements? Do you want to contribute such a feature?

DhavalRepo18 commented 1 year ago

@CodeLionX We agree with your suggestion. But we can test it once the feature is available. Meanwhile, we used our internal hack as you rightly pointed out the same conclusion as we had (code modification is though).

DhavalRepo18 commented 1 year ago

@CodeLionX Thanking you for your help.

DhavalRepo18 commented 7 months ago

@CodeLionX pls feel free to close/update the code. We may use the initial solution when you generate two times.

CodeLionX commented 7 months ago

Dear @DhavalRepo18,

currently, I don't have the time to work on this feature. However, I'll leave this issue open because I don't see a reason to not implement this as proposed in https://github.com/TimeEval/GutenTAG/issues/35#issuecomment-1665447863.

If somebody wants to try implementing this, they are welcome to do so and I can offer my support.

Thanks.