I tried the IWSLT'14 machine translation example on pytorch docker image: pytorch/pytorch:0.4.1-cuda9-cudnn7-devel.
.And I got the following error at the end of training:
Exception ignored in: <bound method tqdm.__del__ of | epoch 002: 33%|▎| 368/1131 [01:03<02:02, 6.22it/s, loss=7.721, nll_loss=6.908, ppl=120.10, wps=21642, ups=5.8, wpb=3492, bsz=147, num_updates=1500, lr=0.000187563, gnorm=0.960, clip=0%, oom=0, wall=248, train_wall=225]>
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tqdm/_tqdm.py", line 931, in __del__
self.close()
File "/opt/conda/lib/python3.6/site-packages/tqdm/_tqdm.py", line 1133, in close
self._decr_instances(self)
File "/opt/conda/lib/python3.6/site-packages/tqdm/_tqdm.py", line 496, in _decr_instances
cls.monitor.exit()
File "/opt/conda/lib/python3.6/site-packages/tqdm/_monitor.py", line 52, in exit
self.join()
File "/opt/conda/lib/python3.6/threading.py", line 1053, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread
I used --max-update 1500 in the above example but I also got the same error when I run the script with --max-update 50000.
This error does not occur when I used --log-format json.
Hi,
I tried the IWSLT'14 machine translation example on pytorch docker image:
pytorch/pytorch:0.4.1-cuda9-cudnn7-devel
. .And I got the following error at the end of training:I used
--max-update 1500
in the above example but I also got the same error when I run the script with--max-update 50000
. This error does not occur when I used--log-format json
.I think these issues/PRs are related:
Especially, the second PR may resolve this.
Thanks,