Open DhavalRepo18 opened 1 year ago
Thank you for using our library! We are happy to discuss extensions to GutenTAG.
For me, your description of the output sounds exactly like our semi-supervised output mode:
variance
) both timeseries (train and test) are equal despite the anomalyCan you elaborate a bit further? What exactly is the input and output of the newly proposed mode, and why is it currently not supported by GutenTAG?
The variance was an issue when we tested. So we want to avoid touching that stuff. There is one fix with variance, i.e., if we pass the same seed, it will meet the need even with a variance setting.
That is precisely how GutenTAG should behave.
I still don't understand your use case, though.
Our use case is very simple. We wanted to generate
We tried using semi-supervised, but when we added variance cases, it change the base time series as well as the anomalous time series. We like to have a separate mode where it works with variance stuff too.
https://github.com/HPI-Information-Systems/gutentag/blob/main/gutenTAG/generator/timeseries.py#L52
Code is passing the new random seed, and we like to use the same seed. this way even with variance it remains the same. We do have some code written that can be made available.
OK. I get your point.
Currently, you can achieve this by calling GutenTAG twice:
custom-input
-BO and generate your augmented TS with anomalies etc. (semi-supervised=False
and supervised=False
)But if you want to generate many such TS, it's quite tedious.
How do you propose to solve this?
It seems to me that the different seeding is the only thing that is preventing you from using the semi-supervised
-mode. Is this true? If yes, then adding a third output-mode that just uses the same seed is everything we need to change.
@CodeLionX I am also working with @DhavalRepo18. Yes, as mentioned by you different seeds were causing the issue. So we did the following to generate time-series and same time-series + anomalies. So as mentioned by @DhavalRepo18, we would like to have one more mode which would allow us to do so.
Introducing another mode might work. However, the modes are not mutually exclusive and can be used together. This means that we would need to generate an additional time series with the same contents as the test time series — or just copy it? This implies adding a new TrainingType
and a new filename. In addition, we need to change many parts within GutenTAG, we break backwards compatibility, and we lose compatibility with TimeEval (TimeEval would not support the new learning type; I would rather not encourage training on this new data format as well). Seems to be many drawbacks with limited usability, IMHO.
I would propose the following: We add another setting key exact-train-bo: bool = False
and ensure that independent of whether semi-supervised
or supervised
is enabled, the BO of the training time series and testing time series are the same if enabled. Only the anomalies would differ in this case.
This setting now applies to all existing learning types (TrainingType
) and we can show a warning (training on this data might lead to overfitting / bad generalizability) if it is enabled. We still need to touch some parts in GutenTAG, but only additions in the config and input behavior while maintaining backwards compatibility.
Does this align with your requirements? Do you want to contribute such a feature?
@CodeLionX We agree with your suggestion. But we can test it once the feature is available. Meanwhile, we used our internal hack as you rightly pointed out the same conclusion as we had (code modification is though).
@CodeLionX Thanking you for your help.
@CodeLionX pls feel free to close/update the code. We may use the initial solution when you generate two times.
Dear @DhavalRepo18,
currently, I don't have the time to work on this feature. However, I'll leave this issue open because I don't see a reason to not implement this as proposed in https://github.com/TimeEval/GutenTAG/issues/35#issuecomment-1665447863.
If somebody wants to try implementing this, they are welcome to do so and I can offer my support.
Thanks.
We are user of this repo to create time series. We like to introduce new mode on a top of supervised and semi-supervised, call "ts-augmentation' where we produce
We can provide a small code.