facebookresearch / pytorch_GAN_zoo

A mix of GAN implementations including progressive growing
BSD 3-Clause "New" or "Revised" License
1.61k stars 270 forks source link

Connection refused & 'GNet' object has no attribute 'module' #103

Closed ziqizh closed 4 years ago

ziqizh commented 4 years ago

I am using Python 3.8 and torch 1.3.1

python train.py PGAN -c config_celeba_cropped.json --restart -n celeba_cropped
Setting up a new session...
Exception in user code:
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/anaconda3/envs/dlenv/lib/python3.8/site-packages/urllib3/connection.py", line 156, in _new_conn
    conn = connection.create_connection(
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 665, in urlopen
    httplib_response = self._make_request(
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/http/client.py", line 1230, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/http/client.py", line 1276, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/http/client.py", line 1225, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/http/client.py", line 1004, in _send_output
    self.send(msg)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/http/client.py", line 944, in send
    self.connect()
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/urllib3/connection.py", line 184, in connect
    conn = self._new_conn()
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/urllib3/connection.py", line 168, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7efe258c68e0>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 719, in urlopen
    retries = retries.increment(
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/urllib3/util/retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efe258c68e0>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/visdom/__init__.py", line 708, in _send
    return self._handle_post(
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/visdom/__init__.py", line 677, in _handle_post
    r = self.session.post(url, data=data)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/requests/sessions.py", line 581, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efe258c68e0>: Failed to establish a new connection: [Errno 111] Connection refused'))
[Errno 111] Connection refused
Running PGAN
size 10
202599 images found
202599 images detected
size (4, 4)
202599 images found
Changing alpha to 0.000
Traceback (most recent call last):
  File "train.py", line 137, in <module>
    GANTrainer.train()
  File "/home//code/pytorch_GAN_zoo/models/trainer/progressive_gan_trainer.py", line 235, in train
    status = self.trainOnEpoch(dbLoader, scale,
  File "/home//code/pytorch_GAN_zoo/models/trainer/gan_trainer.py", line 479, in trainOnEpoch
    inputs_real = self.inScaleUpdate(i, scale, inputs_real)
  File "/home//code/pytorch_GAN_zoo/models/trainer/progressive_gan_trainer.py", line 166, in inScaleUpdate
    self.model.updateAlpha(alpha)
  File "/home//code/pytorch_GAN_zoo/models/progressive_gan.py", line 134, in updateAlpha
    self.avgG.module.setNewAlpha(newAlpha)
  File "/home//anaconda3/envs/dlenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 575, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'GNet' object has no attribute 'module'
amoaxell commented 4 years ago

I had the same issue. Part of the problem was solved when loading nvidia and nvidia-uvm drivers. My nvidia drivers were disabled by some blacklist configuration, in /lib/modprobe.d/blacklist-nvidia.conf. I just commented all the lines there.

Also, you need to launch the visdom server in a new shell:

python3 -m visdom.server

But the problem still remains partly:

Setting up a new session...
Running PGAN
size 10
202599 images found
202599 images detected
size (4, 4)
202599 images found
Changing alpha to 0.000
Traceback (most recent call last):
  File "./train.py", line 151, in <module>
    GANTrainer.train()
  File "/home/user/pytorch-gan-zoo/pytorch_GAN_zoo/models/trainer/progressive_gan_trainer.py", line 237, in train
    maxIter=self.modelConfig.maxIterAtScale[scale])
  File "/home/user/pytorch-gan-zoo/pytorch_GAN_zoo/models/trainer/gan_trainer.py", line 479, in trainOnEpoch
    inputs_real = self.inScaleUpdate(i, scale, inputs_real)
  File "/home/user/pytorch-gan-zoo/pytorch_GAN_zoo/models/trainer/progressive_gan_trainer.py", line 166, in inScaleUpdate
    self.model.updateAlpha(alpha)
  File "/home/user/pytorch-gan-zoo/pytorch_GAN_zoo/models/progressive_gan.py", line 134, in updateAlpha
    self.avgG.module.setNewAlpha(newAlpha)
  File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'GNet' object has no attribute 'module'
amoaxell commented 4 years ago

I could solve the last part of the problem using the solution below. Per https://github.com/pytorch/pytorch/issues/28321 you need also to update your CUDA drivers. The error of torch.cuda.is_available() returning false is what provokes 'GNet' object has no attribute 'module'.

ziqizh commented 4 years ago

It turned out to be the display issue: my linux environment doesn't have a display and I solved this by commenting out the related code.

jyu-theartofml commented 3 years ago

python3 -m visdom.server worked for debugging my error.