Open rruizdeaustri opened 3 years ago
Hi Roberto,
The dataset example-2_cpc_results.csv does not contain any negative points. Hence, tp=0
. The model also detects all points as negative. Hence, fp=0
.
The attached dataset is not the write one to evaluate the model (sorry for the unnecessary hurdle) since it does not contain any anomalous point. I need to update it with some other time series anomaly detection dataset. You can see here on using the code with other dataset.
Thanks, Arun
Hi Arun,
Ok, then I'll try with another dataset.
Thanks a lot !
Best, Rbt
Hi Arun,
I have labeled the nyc_taxi.csv dataset from NAB and I have a question about the split of the data used in your code. As it is, 70% of the data is used for training and 30% for testing but in this way the training data contain anomalies for this particular dataset. Since the method is unsupervised, shouldn't anomalies be excluded in the training process ? I guess we want to learn the distribution of the say normal samples, right ?
Thanks a lot !!
All the best, Roberto
Hi Roberto,
The anomalies are excluded in training process. The anomaly values are used only for evaluation process and not during training. Training uses the time series signals. The generator learns the distribution of normal samples.
Cheers, Arun.
Hi Arun,
Yes this is what I expect though in some blog about the model in Orion have seen they use the whole time series (including anomalous timesteps). That is why I got confused.
I will split the data and pickup just normal data and let you know whether the code works with this dataset as it does with the "official" implementation in Orion.
BTW, have you tried with this dataset ? I could send it to you with the right format for your code.
Thanks a lot !!
Best, Rbt
Hi Roberto,
Thanks for your interest. Training of GANs are highly unstable and it requires more computation power. Access to computation power is currently out of scope for me.
Best, Arun.
Hi Arun,
In fact I have been training the model and the performance is really poor for this dataset in comparison with what is reported in the Orion webpage for the say official version.
I have used the default hyperparameters which are identical to the ones used in the report by the Orion guys:
Accuracy 0.79 Precision 1.00 Recall 0.07 F1 Score 0.13
Any advice to improve this ?
Thanks a lot !!!
Rbt
Hi Rbt,
The same was the result observed in my scenario. But the loss value seems to improve in the right direction after successive epochs. I don't have any particular advice other than the following:
Best, Arun.
Can one of you send a CSV file that works with this source code? (I get the same error) I can't find any online.
Hi Arun,
Yes this is what I expect though in some blog about the model in Orion have seen they use the whole time series (including anomalous timesteps). That is why I got confused.
I will split the data and pickup just normal data and let you know whether the code works with this dataset as it does with the "official" implementation in Orion.
BTW, have you tried with this dataset ? I could send it to you with the right format for your code.
Thanks a lot !!
Best, Rbt
Hi, could you please send me your dataset. I'll try to use it in my diploma work. I have the same problem with datasets (I tried NAB too).
Hi Arun,
Maybe I can send you the data and you can add them to the repo ?
Rbt
Adding your data to repo will be great. You can make a pull request with that data and I will merge it.
Best, Arun.
On Mon, 12 Jul 2021, 15:59 rruizdeaustri, @.***> wrote:
Hi Arun,
Maybe I can send you the data and you can add them to the repo ?
Rbt
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arunppsg/TadGAN/issues/5#issuecomment-878163610, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJNAINKYCEVQSG7N62LAODTXK7W3ANCNFSM47DTU7NA .
Hi Arun,
I have created a branch called rruiz-branch where the file nyc_taxi_new.csv has been added and made a pull request. Could you pls merge it ?
Best, Rbt
You need to create a pull request. I don't see any pull request currently.
Hi @arunppsg,
Firstly thank you for this, its super cool. I am new to this and have a few questions, which I hope are not too stupid, if you can indulge me?
Looking through this I notice both this and the Orion examples only use a value and date column, it it possible to make this work with additional regressors/columns, so called Xregs i.e. temperature, sales price etc.
Secondly is it necessary to have the labelled anomalies? My anomaly labels (in my datasets) were achieved by using the deviation between a true value and predicted with an RNN, I am expecting tadGAN to be better. So it does not seem appropriate to measure the GAN performance by the results of the RNN, I was under the impression that tadGAN was unsupervised. All I really want is to get the anomaly scores. Does that mean I would need to delete the evaluation section of the code, or will it run regardless and output the outlier scores? Where can I get these?
Again, sorry if these are poor questions. I'm not sure I entirely understand the code.
Best August
Hello August,
test
function in anomaly_detection.py
for anomaly scores. To use it without labels, just create a dummy column called anomaly
or modify code in main.py
and anomaly_detection.py
Thanks!
Hi Arun,
I have labeled the nyc_taxi.csv dataset from NAB and I have a question about the split of the data used in your code. As it is, 70% of the data is used for training and 30% for testing but in this way the training data contain anomalies for this particular dataset. Since the method is unsupervised, shouldn't anomalies be excluded in the training process ? I guess we want to learn the distribution of the say normal samples, right ?
Thanks a lot !!
All the best, Roberto
Excuse me, can you send me your dataset? Thanks!
Hi,
I have tried to run the code with the current setup (number of epochs is 30) but I get
File TadGAN/anomaly_detection.py", line 129, in find_scores precision = tp / (tp + fp) ZeroDivisionError: division by zero
Any ideas about what is going on ?
With Kind Regards, Roberto