Generated data distribution seems very different from the original one

Y-debug-sys / Diffusion-TS

[ICLR 2024] Official Implementation of "Diffusion-TS: Interpretable Diffusion for General Time Series Generation"

MIT License

184 stars 26 forks source link

Generated data distribution seems very different from the original one #46

Closed Awenega closed 3 months ago

Awenega commented 3 months ago

Hi, for my research project, I am trying to use your model to adapt it develop a diffusion model for NILM problem. Specifically, through the "Tutorial_0.ipynb" notebook, I trained the model and performed the unconditional data generation and later visualized the pca/t-sne/kernel plots. I wanted to ask your opinion regarding the result obtained, which does not seem to be generating correctly. The dataset shape is (60000, 128, 1) and below you can see the model configuration.

I cannot understand the reason why the generated samples seem to have a very different distribution from the original. Would you have any suggestions on how to improve the generation?

Y-debug-sys commented 3 months ago

Hi, please use normalized ground truth for Visualizations, which means you should load 'norm_truth.npy' for ori_data rather than 'ground_truth.npy'. (See commented code in "Tutorial_0.ipynb" notebook block [4]). Thanks!

Awenega commented 3 months ago

Hi, thank you. The ori_data is already equal to the 'norm_truth.npy', which is given by "unnormalize_to_zero_to_one(train_data)" in the "real_dataset.py"

Also, if i print ori_data and fake data, they are always between [0,1]

Y-debug-sys commented 3 months ago

My point is not about the generated data. Please uncomment the relevant code in "Tutorial_0.ipynb" notebook block [4] and find the corresponding norm_truth filename to fill in. It's clear that the range of the real data in your graph is not between 0 and 1, indicating that it is non-normalized data.

354606085-f277e674-1782-4e65-9076-f6a6b28c4cbf

Awenega commented 3 months ago

Looking at that image, i thought the same thing. Thank you for confirming my hypothesis. I will try to debug the code better, I probably did something wrong during the generation and preprocessing of the original dataset

Y-debug-sys commented 3 months ago

See About Data for instructions of saved files.

Awenega commented 3 months ago

Hi, you were right, the original distribution used a different scaler and thus "ruined" the original distribution.

Despite this, however, the new results are not satisfactory....The config used, is the same as before. In your experience, what could this be due to?

Y-debug-sys commented 3 months ago

Hi, it seems that the generated data is normalized for twice .....

Now 354698028-91bbd51a-007f-47bb-a99e-6a5317fc95fe Before 354606085-f277e674-1782-4e65-9076-f6a6b28c4cbf

The original processes of fake_data is OK. Do not change them. Thanks!

Y-debug-sys commented 3 months ago

My email is yxy5315@gmail.com. If you have any problems about the code, please let me know!