AntixK / PyTorch-VAE

A Collection of Variational Autoencoders (VAE) in PyTorch.
Apache License 2.0
6.44k stars 1.05k forks source link

Socket conflict bug when running vanilla_vae with celeba dataset #65

Open BWN133 opened 2 years ago

BWN133 commented 2 years ago

Hi, I am trying to run the vanilla_vae model with celeba dataset on my personal device but I am getting a weird error telling me that there is a socket conflict. The system will just hang after the error. Do some one has any idea how to solve this? For more detail please refere to: https://stackoverflow.com/questions/73215732/socket-conflict-while-running-vaes

Thanks!

AntixK commented 2 years ago

How does your config file look like?

If you are training on a single CPU, then simply set the gpu field in the config file empty or none.

BWN133 commented 2 years ago

Thanks a lot for replying!!! My original configs looks like the following:

model_params:
  name: 'VanillaVAE'
  in_channels: 3
  latent_dim: 128

data_params:
  data_path: "Data/"
  train_batch_size: 64
  val_batch_size:  64
  patch_size: 64
  num_workers: 4

exp_params:
  LR: 0.005
  weight_decay: 0.0
  scheduler_gamma: 0.95
  kld_weight: 0.00025
  manual_seed: 1265

trainer_params:
  gpus: [0]
  max_epochs: 100

logging_params:
  save_dir: "logs/"
  name: "VanillaVAE"

I tried to change the gpus to null or just directly delete it, there will be new error says:

Traceback (most recent call last):
  File "C:\Users\huklab\Desktop\odin\PyTorch-VAE\run.py", line 46, in <module>
    data = VAEDataset(**config["data_params"], pin_memory=len(config['trainer_params']['gpus']) != 0)
TypeError: object of type 'NoneType' has no len()

I tried to delete the prerequest of ['gpus'] and it provides me with the exact same error I am having before ( I am training on a machine with only one GPU not CPU so i believe the config shouldn't be problem)