fjxmlzn / DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
http://arxiv.org/abs/1909.13403
BSD 3-Clause Clear License
296 stars 75 forks source link

TF2 #3

Open firmai opened 4 years ago

firmai commented 4 years ago

Hi I just want to know whether you are perhaps planning on releasing a version for Tensor Flow 2, it would probably be around for the next few years and I think this is an interesting repository that could be used more in the near future. Thanks for your work!

Baukebrenninkmeijer commented 3 years ago

This, or a pytorch version would both be super great to have. TF 1.4 is kind of a bummer :(.

shaanchandra commented 3 years ago

Is there a Pytorch implementation available? Tensorflow is really hard to work with now. If anyone has worked or wants to collaborate on open-sourcing a Pytorch version of this, lemme know! I will be interested :)

fjxmlzn commented 3 years ago

Thank you all for the suggestions, and I agree that TF2 or PyTorch version of DoppelGANger would be very useful. Unfortunately, we do not have that so far. If/When you have a TF2 or PyTorch implementation, please let me know I'll add a link to it. Thank you!

yzion commented 2 years ago

did someone managed to update it to TF2?

chameleonzz commented 2 years ago

Hi, when I installed TensorFlow 1.4.0, pycharm warned that python 3.5 has reached its end-of-life date and it is no longer supported in pycharm. The DoppelGANger seemingly not worked normally. Is there any solution?

fjxmlzn commented 2 years ago

@chameleonzz Could you please post error messages or screenshots of the errors?

chameleonzz commented 2 years ago

Thank you for your answer. At first, I tried to install tf 1.4.0 with python 3.5. However, pycharm showed the warning as following.

Then, I tried to install tf 1.4.0 with python 3.6. And I tried to run example_training. But the gan_task.py warns as follows. I also looked for some ways to solve the problem "Unresolved reference '*' ". But all of them did not work.

At last, I tried install tf-cpu-2.5.0 with python 3.8.12, it also had the same problem. I wonder if I did something wrong? Or maye the tf version and python verion should be updated to run the DoppelGANger?

At 2022-07-12 10:19:24, "Zinan Lin" @.***> wrote:

@chameleonzz Could you please post error messages or screenshots of the errors?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

yzion commented 2 years ago

Hi @chameleonzz Try to run it with the TF2 branch. In this branch there is a support in tensorflow 2 so you can use later version of python and cuda. Please update here if it solved your problem

chameleonzz commented 2 years ago

It is feasible for DG with TF 2.1.0 and Python 3.7. However, when I tried to run the code gan_task.py of 'example_training', there was a warning "unresolved reference 'gan' ". I tried to pip install gan, but it seems no corresponding package names gan. How to solve the problem? Thank you for your help.

yzion commented 2 years ago

@chameleonzz can you share more information? what was the python command that you ran? can you share the full warning? is it running with this warning? the folder gan is part of the project so maybe there is an improt issue that you need to solve. gan_folder

chameleonzz commented 2 years ago

Thank you for your answer. I know how to solve the problem finally. If I want to use the DoppelGANger, there are three main steps. Firstly, a virtual environment is needed to be built, such as tf 2.1.0 + python 3.7. Secondly, pip some packages, such as gan, GPUTaskScheduler and Tensorflow-privacy. Gan package can be downloaded in DoppelGANger Github. Those packages can be downloaded in Github and installed with the command 'pip install -e path/package_file_name. Thirdly, open the entire DoppelGANger(DG) item with pycharm. But I was confused with another problem. In DG, there are some examples, the type of data files includes '.pkl' and '.npz'. If I want to create a similar data file with my data, how to decide the attributes and features of my data. If there are row data including goggle, web and FCC_MBA and an explanatory for how to decide the attributes and attributes of those data. Maybe more people can understand the work more easily. Besides, what outputs file will be generated after running each example project? At last, thank you very much for your enthusiastic answers each time. I wish the DG project can be used in more data-driven research. It is really a significant work.

yzion commented 2 years ago

Any time :) you can look for examples in the README file of the project. there are exmaples for the pkl files and also fot the npz files. if you need there are also links to download the dataset that was used in this project. so you can download it and read it with python to look on the structre. moreover, there are links to a number of blogposts so you can try used them. if you still struggling let me know and I will try to help Good luck

fjxmlzn commented 2 years ago

Thank @yzion for the help and the answers!

@chameleonzz Re: how to decide the attributes and features for your own data.

The definition of features and attributes can be very flexible, depending on the aspects of the data you want DoppelGANger to capture. More specifically, let's take a simple example. Let's say your original data is a table in the following format.

ColumnA ColumnB ColumnC
1 2 3
1 2 4
2 2 3
2 2 5
2 2 6

You can treat any (even several) columns as attributes (or metadata), and group the rows according to those attributes, and treat the rest of the columns as features (or time-series).

For example, you can choose to treat ColumnA and ColumnB as attributes, and ColumnC as the feature. You will get 2 samples: {attributes=(1,2), features=(3,4)}, {attributes=(2,2), features=(3,5,6)}. DoppelGANger (ideally) will be able to learn the temporal correlations of features that are associated with the same attribute (i.e., (3,4) in the first sample, and (3,5,6) in the second sample). But you can also choose to treat only ColumnA as the attributes, or any other combinations of the columns you want. In short, how to choose features/attributes depends on the context of your application, and which part you want DoppelGANger to model as temporal correlations.

Hope this clarification helps!

fjxmlzn commented 2 years ago

By the way, for future readers of this thread:

If you are looking for TF2 implementation of DoppelGANger, you can look at https://github.com/fjxmlzn/DoppelGANger/tree/TF2 by @yzion

If you are looking for PyTorch implementation of DoppelGANger, you can look at https://synthetics.docs.gretel.ai/en/stable/models/timeseries_dgan.html#timeseries-dgan by Gretel AI.

chameleonzz commented 2 years ago

By the way, for future readers of this thread:

If you are looking for TF2 implementation of DoppelGANger, you can look at https://github.com/fjxmlzn/DoppelGANger/tree/TF2 by @yzion

If you are looking for PyTorch implementation of DoppelGANger, you can look at https://synthetics.docs.gretel.ai/en/stable/models/timeseries_dgan.html#timeseries-dgan by Gretel AI.

Recently, I met with another problem. I tried to run main.py in the example_training file and main_generate_data.py in the example_generatingdata file. However, the result was that only a file named results was created. And in sub-files of 'results', there was only a worker*.log.txt. Q1: Why no synthetic datasets of [web/google/FCC_MBA] were generated? Snipaste_2022-07-24_23-13-48 I looked for whether there is a place in the code to specify the dataset path. But I found nothing.

Q2: When I know the attributes and features of my datasets, how to generate the four files including data_attribute_output.pkl, data_feature_output.pkl, data_test.npz and data_train.npz. Whether another codes need to be written to achieve this work?

At last, thank you for your continued patient answers.

fjxmlzn commented 2 years ago

Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?

Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.

chameleonzz commented 2 years ago

Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?

Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.

results

After running example_training/main.py, the content of worker.log was as follows.(The 'aux_disc-False,dataset-FCC_MBA,epoch-17000,epoch_checkpoint_freq-70,extra_checkpoint_freq-850,run-0,sample_len-1,self_norm-False,' file was taken as an example.) workerlog

After running example_generating_data/main_generate_data.py, the content of worker_generate_data.log was as follows. worker_generate_data_log

I wonder if the results of example_training/main.py and example_generating_data/main_generate_data.py only have those output files? If I want to generate synthetic data corresponding to real datasets (web/google/FCC_MBA), what should I do?

fjxmlzn commented 2 years ago

@chameleonzz No, there should be other files, and the content of worker.log or worker_generate_data.log should be more than this line.

Could you please delete results folder completely, and try running example_training/main.py again, and paste here the console output plus the content of worker.log again?

chameleonzz commented 2 years ago

@chameleonzz No, there should be other files, and the content of worker.log or worker_generate_data.log should be more than this line.

Could you please delete results folder completely, and try running example_training/main.py again, and paste here the console output plus the content of worker.log again?

Thanks for your answer. I have tried several times to delete results folder completely, and try running example_training/main.py again. But the output has also no change. It was the same as in the previous pictures. Should I change some places in example_training/main.py and run it again?

fjxmlzn commented 2 years ago

Could you please paste here the console (i.e., terminal) output?

fjxmlzn commented 2 years ago

@chameleonzz Also, we can move our future discussion of this question to #30 since the problem you see should likely not be due to TF2

JimmyZhan1213 commented 2 years ago

Hello, have you solved the problem of incomplete training and generated output now? I had a similar problem recently and I only had a worker.log under the folder I generated. image image

fjxmlzn commented 2 years ago

For the previous problem, please refer to #30. For this issue, would you mind creating a new issue? We can discuss it there. This is a different problem.

chameleonzz commented 2 years ago

Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?

Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.

Recently, the example_traning\main.py was re-run on a computer with intel i7-11800H CPU @2.30 GHz and 64 GB memory. It cost 5740 minutes to generate the results file named ‘dataset-google,epoch-400,run-0,sample-len-1’. And the results file named 'dataset-google,epoch-400,run-0,sample-len-5' is generating now? I have three questions now.

  1. When I know the attributes and features of my datasets, how to generate the four files including data_attribute_output.pkl, data_feature_output.pkl, data_test.npz and data_train.npz.
  2. How to decide parameters and Hyperparameters, such as epoch, extra_checkpoint_freq, and so on.
  3. Now I have not run the example_generating_data\main_generate_data.py, will a result file whose format is a CSV or Xls is generated?