Reproducibility of GDN - Githubissues

mhkim9714 commented 1 year ago

Hello, I am very impressed by your work, and am trying to start my anomaly detection research based off of your work.

The first thing I am trying to do is to reproduce the results for the SWaT dataset given in Table 2. I followed the exact step that you provided in scripts/readme.md for SWaT preprocessing. After running process_swat.py, I got these statistics for the final data.

train.csv : (47520, 52)
test.csv : (44991, 52)

I noticed that it is slightly different from the data statistics given in Table 1. (5 extra data points exist in my processed data)

After creating train.csv, test.csv, and list.txt, I compared the created files with demo data (swat_train_demo.csv, swat_test_demo.csv) given in https://drive.google.com/drive/folders/1_4TlatKh-f7QhstaaY7YTSCs8D4ywbWc?usp=sharing. However, the first 999 rows of the data didn't match.

Finally, I tried to run your code with the same seed and data multiple times to see if the performance varies between each run. Unfortunately, fixing the seed didn't really work because the performance varied so much between each run. (For your understanding, I used the hyperparameter settings from https://github.com/d-ailin/GDN/issues/4) +) I also tried to run the code under cpu environment, but the results are still non-reproducible.

(1) F1 score: 0.8163308589607635 precision: 0.9778963414634146 recall: 0.7007099945385036 (2) F1 score: 0.7394631639063391 precision: 0.9926402943882244 recall: 0.5892954669579464 (3) F1 score: 0.8220572640509013 precision: 0.9845020325203252 recall: 0.7054432914618606 (4) F1 score: 0.8120639690887624 precision: 0.9895370128171593 recall: 0.6886947023484434

How did you evaluate your model when reporting to the paper? Have you come across this problem before?

My question can be arranged as follows.

Why does the difference in data statistics occur?
Why following the exact preprocessing step results in different data from the given demo data?
Why does fixing the seed not work in GDN? Is it something related to the atomic operations(non-deterministic operations) included in torch_scatter and torch_sparse?

The same thing happened for WADI as well.

The data statistics are different.
- train.csv : (102697,128)
- test.csv : (17280, 128)
The processed data and the demo data do not match.
The code is not reproducible with a fixed seed for WADI dataset as well.
The results are nowhere near the reported results in the paper.

Has anyone been succesful at reproducing the results for SWaT and WADI?

hamiid01 commented 1 year ago

would you be willing to share the code with me? i could not make gdn work with new versions of torch_geometric

DavidDong004 commented 1 year ago

I tried to reproduce the results of the WADI and SWAT datasets on my computer, but the results are much worse than the original and the results you got. If it is convenient for you, could you please send me a copy of the code according to. /scripts/readme.md file to me? Thank you very much. My email address is dzw1059169580@outlook.com.

KeepMovingXX commented 1 year ago

I meet the same question, did you solve it? The results are different with the same setting.

d-ailin / GDN

Reproducibility of GDN #68