fjxmlzn / DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
http://arxiv.org/abs/1909.13403
BSD 3-Clause Clear License
296 stars 75 forks source link

Generating time series with negative values #39

Closed jankrans closed 1 year ago

jankrans commented 1 year ago

Hi,

I'm training a model on daily household electricty consumption. Because of solar power, negative values are present in the set. Is there any experience on how the model behaves with negative values when using the normalizing_per_sample? It seems that the model shifts the mean value to a negative value while it should be positive.

So I'm mostly wondering if their has been experience with negative values in time series, or if I rather should try to make everything positive and apply some post-processing after wards on the samples.

fjxmlzn commented 1 year ago

Negative values are not a problem. Before using DoppelGANger, you will need to normalize the values manually to -1~1 or 0~1, and indicate the normalization method you used in the "normalization" field in the PKL files. The generated data will be in the specified range, and you can manually re-normalize the data back to the original range.

normalizing_per_sample (controlled by self_norm flat) can be helpful when your data has very diverse ranges across samples (see Section 4.2 of the paper). I would suggest starting without enabling this feature. If it does not work well, then you can try enabling it.

fjxmlzn commented 1 year ago

Feel free to re-open the issue if you still have issues with it.

jankrans commented 1 year ago

Thank you for the quick response! I was wondering if you maybe could provide some more info about the metrics being tracked during training, still trying to learn everything about GANs etc.. I'm trying to interpret them with tensorboard, but trying to get a good grip on wich run is good or bad, is somewhat difficult for me...

Here are two pictures of the first two trainings (so improvements have to be made) image

image

Should some converge to 0, or should everything become as low as possible, even the graphs with negative values? These are some questions I was wondering about...

If you feel like answering maybe with a mail, which might be more appropriate, you can do so at jan.kranzen@gmail.com

fjxmlzn commented 1 year ago

In GANs, it is generally hard to read the sample quality from the loss. The most useful things here are d/gp and attr_d/gp: we want them to be stable and close to 0 throughout the training. If they suddenly go to very large values (e.g., > 10), that usually means the training goes to an unstable stage and the sample quality during that stage will usually be bad. It might or might not recover itself after that stage. You can see more information about these losses we are using here in this paper: https://arxiv.org/pdf/1704.00028.pdf