fjxmlzn / DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
http://arxiv.org/abs/1909.13403
BSD 3-Clause Clear License
296 stars 75 forks source link

module 'zmq.backend.cython.socket' has no attribute 'get' #6

Closed dstan11 closed 4 years ago

dstan11 commented 4 years ago

I met some problems when I run scheduler.start(). It says module 'zmq.backend.cython.socket' has no attribute 'get' and Can't get attribute 'get' on <module 'zmq.backend.cython.socket' from 'E:\\Users\\shand\\anaconda3\\envs\\DoppelGANger\\lib\\site-packages\\zmq\\backend\\cython\\socket.cp35-win_amd64.pyd'> and Can't pickle <cyfunction Socket.get at 0x000001FCDFCC71B8>: it's not found as zmq.backend.cython.socket.get

fjxmlzn commented 4 years ago

I am not sure why you see these errors.

Could you please post here:

  1. The complete error log
  2. How you install the Python environment and the packages
  3. The list of the installed Python packages and versions

So that I can reproduce these errors and debug it?

Thanks!

dstan11 commented 4 years ago

I created a notebook which has the same content with main.py under DoppelGANger/DoppelGANger/example_training folder.

if __name__ == "__main__":
    from gan_task import GANTask
    from config import config
    from gpu_task_scheduler.gpu_task_scheduler import GPUTaskScheduler
    scheduler = GPUTaskScheduler(config=config, gpu_task_class=GANTask)
    scheduler.start() 
  1. error log error.txt
  2. python version 3.5.2 packages.txt
fjxmlzn commented 4 years ago

Thanks. Can you try directly executing it instead of from Jupiter notebook?

dstan11 commented 4 years ago

Yes. I tried python main.py under DoppelGANger/DoppelGANger/example_training folder through Terminal. It seems no error came up. However, the program is still running after 3 hours. I have no idea how long it supposed to be. By the way, GPU Performance didn't change after I run the program.

Thanks.

fjxmlzn commented 4 years ago

You can look at worker.log in subfolders of results folder for the training progress.

If the code isn't using GPU, then

  1. Make sure that you installed tensorflow-gpu instead of tensorflow
  2. You can check worker.log and see if there are any error messages about loading Cuda library.
dstan11 commented 4 years ago

Sorry to disturb you again. I didn't find results folder. Can you show me where it is?

Thanks!

fjxmlzn commented 4 years ago

It should be on the same level as example_training folder. It is configured in config.py: "result_root_folder": "../results/"

dstan11 commented 4 years ago

Thank you for the reply! I updated python version to 3.7 and tensorflow-gpu version to 1.1.4. Now the program works.

fjxmlzn commented 4 years ago

Great!!

dstan11 commented 4 years ago

It has a new error message.

Traceback (most recent call last):
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\Scripts\start_gpu_task-script.py", line 33, in <module>
    sys.exit(load_entry_point('GPUTaskScheduler', 'console_scripts', 'start_gpu_task')())
  File "f:\github clone folder\gputask\gputaskscheduler\gpu_task_scheduler\start_gpu_task.py", line 23, in main
    worker.main()
  File "F:\Github clone folder\DoppelGANger\DoppelGANger\example_training\gan_task.py", line 124, in main
    gan.train(restore=restore)
  File "..\gan\doppelganger.py", line 918, in train
    self.visualize(epoch_id, batch_id, global_id)
  File "..\gan\doppelganger.py", line 801, in visualize
    sub1(features, attributes, lengths, None, None, None, "free")
  File "..\gan\doppelganger.py", line 749, in sub1
    ground_truth_lengths=ground_truth_lengths)
  File "<__array_function__ internals>", line 6, in savez
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 645, in savez
    _savez(file, args, kwds, False)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 743, in _savez
    zipf = zipfile_factory(file, mode="w", compression=compression)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 119, in zipfile_factory
    return zipfile.ZipFile(file, *args, **kwargs)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\zipfile.py", line 1240, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '../results/aux_disc-False,dataset-google,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-False,\\sample\\epoch_id-0,batch_id-199,global_id-199,type-free,samples.npz'
fjxmlzn commented 4 years ago

Could you please try modifying "result_root_folder": "../results/" in config.py to "result_root_folder": "..\\results\\", since you are in windows and the directory separator should be \. And then delete results folder and run again.

Let me know if it doesn't work.

dstan11 commented 4 years ago

It doesn't work. It has the same error message.

fjxmlzn commented 4 years ago

I think another potential problem is that windows does not allow , in filenames. You can change , by adding test_config_string_separator="-" or others in scheduler_config section of config.py. (see https://github.com/fjxmlzn/GPUTaskScheduler for the detailed explanation.)

But I just want to double-check if there are other issues: could you please show me the directory structure of F:\Github clone folder\DoppelGANger\DoppelGANger\ after this error happens?

dstan11 commented 4 years ago

F:\Github clone folder\DoppelGANger\DoppelGANger\ folder F:\Github clone folder\DoppelGANger\DoppelGANger\results results F:\Github clone folder\DoppelGANger\DoppelGANger\results\aux_disc-False,dataset-google,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-False, 3

fjxmlzn commented 4 years ago

Thanks. Could you please email me the current code and worker.log and let me check it: zinanl AT andrew.cmu.edu