fjxmlzn / DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
http://arxiv.org/abs/1909.13403
BSD 3-Clause Clear License
296 stars 75 forks source link

generated_samples #21

Closed fxctydfty closed 3 years ago

fxctydfty commented 3 years ago

I am able to run the training algorithm. But when I run the generating_data, it never creates the output in the "generated_samples" folder. I attached the worker log here. Could you please help me on that? Thanks in advance.

worker_generate_data.log

fjxmlzn commented 3 years ago

The logs look normal. How long have this been stuck without generated_samples folder?

fxctydfty commented 3 years ago

After it prints out "Finish Building". Nothing happened. I tried several times. Same thing.

fxctydfty commented 3 years ago

I am running Python Version 3.7.10 and Tensorflow 1.14.0.

fjxmlzn commented 3 years ago

Could you please share example_generating_data/config_generate_data.py and example_training/config.py you are using?

fxctydfty commented 3 years ago

***config_generate_data.py** config = { "scheduler_config": { "gpu": ["0"], "config_string_value_maxlen": 1000, "result_root_folder": "../results/", "scheduler_log_file_path": "scheduler_generate_data.log", "log_file": "worker_generate_data.log", "force_rerun": True },

"global_config": {
    "batch_size": 100,
    "vis_freq": 200,
    "vis_num_sample": 5,
    "d_rounds": 1,
    "g_rounds": 1,
    "num_packing": 1,
    "noise": True,
    "feed_back": False,
    "g_lr": 0.001,
    "d_lr": 0.001,
    "d_gp_coe": 10.0,
    "gen_feature_num_layers": 1,
    "gen_feature_num_units": 100,
    "gen_attribute_num_layers": 3,
    "gen_attribute_num_units": 100,
    "disc_num_layers": 5,
    "disc_num_units": 200,
    "initial_state": "random",

    "attr_d_lr": 0.001,
    "attr_d_gp_coe": 10.0,
    "g_attr_d_coe": 1.0,
    "attr_disc_num_layers": 5,
    "attr_disc_num_units": 200,

    "generate_num_train_sample": 50000,
    "generate_num_test_sample": 50000
},

"test_config": [
    {
        "dataset": ["web"],
        "epoch": [2],
        "run": [0, 1, 2],
        "sample_len": [1, 5],
        "extra_checkpoint_freq": [5],
        "epoch_checkpoint_freq": [1],
        "aux_disc": [False],
        "self_norm": [False]
    }
]

}

fxctydfty commented 3 years ago

**config.py config = { "scheduler_config": { "gpu": ["0","1"], "config_string_value_maxlen": 1000, "result_root_folder": "../results/" },

"global_config": {
    "batch_size": 100,
    "vis_freq": 200,
    "vis_num_sample": 5,
    "d_rounds": 1,
    "g_rounds": 1,
    "num_packing": 1,
    "noise": True,
    "feed_back": False,
    "g_lr": 0.001,
    "d_lr": 0.001,
    "d_gp_coe": 10.0,
    "gen_feature_num_layers": 1,
    "gen_feature_num_units": 100,
    "gen_attribute_num_layers": 3,
    "gen_attribute_num_units": 100,
    "disc_num_layers": 5,
    "disc_num_units": 200,
    "initial_state": "random",

    "attr_d_lr": 0.001,
    "attr_d_gp_coe": 10.0,
    "g_attr_d_coe": 1.0,
    "attr_disc_num_layers": 5,
    "attr_disc_num_units": 200,
},

"test_config": [
    {
        "dataset": ["web"],
        "epoch": [1],
        "run": [0, 1, 2],
        "sample_len": [1, 5],
        "extra_checkpoint_freq": [5],
        "epoch_checkpoint_freq": [1],
        "aux_disc": [False],
        "self_norm": [False]
    }
]

}

fjxmlzn commented 3 years ago

I see where the problem comes from. example_generating_data/gan_generate_data_task.py generates data for the mid-checkpoints. In your config.py, you train the model for only 1 epoch "epoch": [1],, and the frequency for saving mid-checkpoints is 5 "extra_checkpoint_freq": [5],, so the code didn't save any mid-checkpoints at all, thus it didn't generate samples.

If you want to generate data from the last checkpoint instead, you can delete these lines https://github.com/fjxmlzn/DoppelGANger/blob/e732a4d077ba1504e6e401df9c2d1048c8efb2a9/example_generating_data/gan_generate_data_task.py#L138-L151, reverse 4 spaces for the rest part of code, and set mid_checkpoint_dir = checkpoint_dir, and save_path = checkpoint_dir

fxctydfty commented 3 years ago

I increased the epoch size to 20. Now I have the different error while training. Could you please take a look the log file.

worker.log

fjxmlzn commented 3 years ago

It seems like you are running on a Windows system. Could you change https://github.com/fjxmlzn/DoppelGANger/blob/e732a4d077ba1504e6e401df9c2d1048c8efb2a9/example_training/config.py#L5 and https://github.com/fjxmlzn/DoppelGANger/blob/e732a4d077ba1504e6e401df9c2d1048c8efb2a9/example_generating_data/config_generate_data.py#L5 to "result_root_folder": "..\\results\\"and try again?

fxctydfty commented 3 years ago

Hey Its working now. thanks for your help.