Hello, I'm trying to run this on a single GPU and I'm getting a threading error. Do you know how to address it? I've changed gpu_id in tps_resnet_bilstm_attn.py to '0'. Terminal freezes after throwing the error 3 times and then the model seems to stop training.
Logs after running:
python tools/train.py configs/tps_resnet_bilstm_attn.py
2020-09-10 09:58:08,779 - INFO - Use GPU 0
2020-09-10 09:58:08,779 - INFO - Set cudnn deterministic False
2020-09-10 09:58:08,779 - INFO - Set cudnn benchmark True
2020-09-10 09:58:08,779 - INFO - Set seed 1111
2020-09-10 09:58:08,780 - INFO - Build model
2020-09-10 09:58:08,969 - INFO - GResNet init weights
2020-09-10 09:58:09,231 - INFO - AttHead init weights
2020-09-10 09:58:12,927 - INFO - current dataset length is 891924 in ./data/data_lmdb_release/training/MJ//MJ_test
2020-09-10 09:58:14,264 - INFO - current dataset length is 802731 in ./data/data_lmdb_release/training/MJ/MJ_valid
2020-09-10 09:58:26,377 - INFO - current dataset length is 7224586 in ./data/data_lmdb_release/training/MJ/MJ_train
2020-09-10 09:58:35,422 - INFO - current dataset length is 5522807 in ./data/data_lmdb_release/training/ST/
2020-09-10 09:58:47,099 - INFO - current dataset length is 6992 in ./data/data_lmdb_release/validation/
2020-09-10 09:58:47,100 - INFO - Start train...
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/evan/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/evan/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/evan/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/evan/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
running on:
ubuntu 20.04
pytorch == 1.6.0
torchvision == 0.7.0
Hello, I'm trying to run this on a single GPU and I'm getting a threading error. Do you know how to address it? I've changed
gpu_id
in tps_resnet_bilstm_attn.py to '0'. Terminal freezes after throwing the error 3 times and then the model seems to stop training.Logs after running: python tools/train.py configs/tps_resnet_bilstm_attn.py
2020-09-10 09:58:08,779 - INFO - Use GPU 0 2020-09-10 09:58:08,779 - INFO - Set cudnn deterministic False 2020-09-10 09:58:08,779 - INFO - Set cudnn benchmark True 2020-09-10 09:58:08,779 - INFO - Set seed 1111 2020-09-10 09:58:08,780 - INFO - Build model 2020-09-10 09:58:08,969 - INFO - GResNet init weights 2020-09-10 09:58:09,231 - INFO - AttHead init weights 2020-09-10 09:58:12,927 - INFO - current dataset length is 891924 in ./data/data_lmdb_release/training/MJ//MJ_test 2020-09-10 09:58:14,264 - INFO - current dataset length is 802731 in ./data/data_lmdb_release/training/MJ/MJ_valid 2020-09-10 09:58:26,377 - INFO - current dataset length is 7224586 in ./data/data_lmdb_release/training/MJ/MJ_train 2020-09-10 09:58:35,422 - INFO - current dataset length is 5522807 in ./data/data_lmdb_release/training/ST/ 2020-09-10 09:58:47,099 - INFO - current dataset length is 6992 in ./data/data_lmdb_release/validation/ 2020-09-10 09:58:47,100 - INFO - Start train... Exception in thread Thread-1: Traceback (most recent call last): File "/home/evan/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/evan/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/home/evan/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/multiprocessing/resource_sharer.py", line 139, in _serve signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG)) File "/home/evan/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/signal.py", line 60, in pthread_sigmask sigs_set = _signal.pthread_sigmask(how, mask) ValueError: signal number 32 out of range
running on: ubuntu 20.04 pytorch == 1.6.0 torchvision == 0.7.0