Closed string-ellipses closed 7 months ago
I just use remote-pdb instead of pdb, and the problem has been solved.
Hi @hannah-zhangzz,
Great that you have found the solution. If you do debugging, often it's also a good idea to set the number of workers to 0 and, using one GPU when you try to isolate a problem is probably also helpful.
In pycharm or visual studio you can do remote debugging.
Hi @hannah-zhangzz,
Great that you have found the solution. If you do debugging, often it's also a good idea to set the number of workers to 0 and, using one GPU when you try to isolate a problem is probably also helpful.
In pycharm or visual studio you can do remote debugging.
Thanks a lot for your advice. I will give it a try!
I'm always frustrated when debugging code within this code framework. For example, when I insert import pdb; pdb.set_trace() into the code in recurrentvarnet.py to inspect some variables, I get the following error, which seems to be a conflict between pdb and the multiprocessing module of torch. I would like to know how to debug this multi-process code effectively. Here are some relevant error messages:
-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/direct-1.0.5.dev0-py3.9-linux-x86_64.egg/direct/launch.py", line 174, in _distributed_worker main_func(args) File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/direct-1.0.5.dev0-py3.9-linux-x86_64.egg/direct/train.py", line 270, in setup_train env.engine.train( File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/direct-1.0.5.dev0-py3.9-linux-x86_64.egg/direct/engine.py", line 645, in train self.training_loop( File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/direct-1.0.5.dev0-py3.9-linux-x86_64.egg/direct/engine.py", line 300, in training_loop iteration_output = self._do_iteration(data, loss_fns, regularizer_fns=regularizer_fns) File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/direct-1.0.5.dev0-py3.9-linux-x86_64.egg/direct/nn/mri_models.py", line 129, in _do_iteration output_image, output_kspace = self.forward_function(data) File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/direct-1.0.5.dev0-py3.9-linux-x86_64.egg/direct/nn/recurrentvarnet/recurrentvarnet_engine.py", line 39, in forward_function output_kspace = self.model( File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 963, in forward output = self.module(*inputs[0], *kwargs[0]) File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/direct-1.0.5.dev0-py3.9-linux-x86_64.egg/direct/nn/recurrentvarnet/recurrentvarnet.py", line 300, in forward kspace_prediction, previous_state = block( File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/direct-1.0.5.dev0-py3.9-linux-x86_64.egg/direct/nn/recurrentvarnet/recurrentvarnet.py", line 404, in forward kspace_error = torch.where( File "/home/miniconda3/envs/score_SDE/lib/python3.9/site-packages/direct-1.0.5.dev0-py3.9-linux-x86_64.egg/direct/nn/recurrentvarnet/recurrentvarnet.py", line 404, in forward kspace_error = torch.where( File "/home/miniconda3/envs/score_SDE/lib/python3.9/bdb.py", line 88, in trace_dispatch return self.dispatch_line(frame) File "/home/miniconda3/envs/score_SDE/lib/python3.9/bdb.py", line 113, in dispatch_line if self.quitting: raise BdbQuit bdb.BdbQuit .
Looking forward to and appreciating your response!