SwinTransformer / Transformer-SSL

This is an official implementation for "Self-Supervised Learning with Swin Transformers".
https://arxiv.org/abs/2105.04553
MIT License
629 stars 67 forks source link

dataloader error #8

Open niutransWZY opened 3 years ago

niutransWZY commented 3 years ago

When I used moby_main for training, Linux memory grew until it crashed. What is the reason and how to solve it

The error is: Traceback (most recent call last): File "moby_main.py", line 236, in main(config) File "moby_main.py", line 121, in main train_one_epoch(config, model, data_loader_train, optimizer, epoch, lr_scheduler) File "moby_main.py", line 151, in train_one_epoch scaled_loss.backward() File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 2605) is killed by signal: Killed.