fjxmlzn / DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
http://arxiv.org/abs/1909.13403
BSD 3-Clause Clear License
296 stars 75 forks source link

Inference from attributes #47

Closed DanielDimanov closed 4 months ago

DanielDimanov commented 4 months ago

Dear @fjxmlzn,

Thank you for your great work! I was wondering if I can use the feature geenrator to generate new features out of given attributes. Based on what I've read in the paper it should be possible and I trained with my data, but I don't fully understand the generate data function. Can you please tell me how I can simply use it as some sort of inference. I want to pass the attributes (same format for the training) and then it spits a features vector conditioned by this set of attributes.

Please help.

fjxmlzn commented 4 months ago

Yes, you can do that! sample_from has a given_attribute parameter: https://github.com/fjxmlzn/DoppelGANger/blob/05f36ec6c3850863751d4f3f88d180e9b12cb3eb/gan/doppelganger.py#L670

Adding that when calling sample_from (e.g., https://github.com/fjxmlzn/DoppelGANger/blob/05f36ec6c3850863751d4f3f88d180e9b12cb3eb/example_generating_data/gan_generate_data_task.py#L171) should be good

Feel free to reopen the issue if you get any problems with it

DanielDimanov commented 4 months ago

Hi, thanks for the quick reply and I did manage to get a sample and it does look substantially different from my features, which might be because I didn't train for long enough, but what is stranger is the shape of my generated features, which is my sample_len squared and my array is almost fully empty. I'm slightly confused about the feature_latent_dim and attribute_latent_dim. I have 128 different attributes (after the normalisation the shape becomes (n_samples, 130) and then I have 1488 values for my features (31 days of half hourly data). So for generating 100 samples I use the following: features, attributes, gen_flags, lengths = gan.sample_from( real_attribute_input_noise=real_attribute_input_noise, # shape (100, 130) addi_attribute_input_noise=addi_attribute_input_noise, #shape (100, 130) feature_input_noise=feature_input_noise, # (100, 1488, 1) feature_input_data=feature_input_data, # given_attribute=data_attribute_raw[0:100,:128] # Your normalized and actual attributes ) The problem is that the default in https://github.com/fjxmlzn/DoppelGANger/blob/05f36ec6c3850863751d4f3f88d180e9b12cb3eb/gan/doppelganger.py#L30 is set to 5 for both, so when I train with main, then it asks the given_attribute to be (?,5) and the time series is also wierd dimension. I tried changing the 'attribute_latent_dim' to 130 (to account for the extra min and max added valeus to my 128 tensor) and 'feature_latent_dim' to 1488 but I was getting time series of 1488^2 and my attribute vector was also strange. In the example you provided there is no given_attribute.

Sorry for the long comment and sorry to bother you. Your help is well appreciated and I appologise if I have missed something silly.

fjxmlzn commented 4 months ago

feature_latent_dim and attribute_latent_dim mean the dimension of GAN input noise; it is not relevant to any shapes in the data: https://github.com/fjxmlzn/DoppelGANger/blob/05f36ec6c3850863751d4f3f88d180e9b12cb3eb/gan/doppelganger.py#L128-L131

About "when I train with main, then it asks the given_attribute to be (?,5) and the time series is also wierd dimension. ", would you mind showing the error messages you got and the shape of the time series?

Thanks, Zinan