cabooster / SRDTrans

SRDTrans: Spatial redundancy transformer for self-supervised fluorescence image denoising
GNU General Public License v3.0
64 stars 6 forks source link

Rntime error: An attempt has been made to start a new process before the current process has finished its bootstrapping phase #4

Open nicolapapp opened 11 months ago

nicolapapp commented 11 months ago

Installed as from instructions. When running train.py as per instructions: python -u train.py --datasets_folder noisy --datasets_path datasets/ --n_epochs 30 --GPU 0 --train_datasets_size 6000 --patch_x 128 --patch_t 128

the run terminates with error: _**An attempt has been made to start a new process before the current process has finished its bootstrapping phase.**_

I have attached the output log as well as the output from: conda list I suspect that the code has problems with versions of torch or other libraries newer than when the project was created. Could you please publish the output from your conda installation with: conda list Thanks

conda_list.txt train_run..pdf

Huxiaowan commented 11 months ago

Your pytorch version may be too high, it is recommended to use pytorch==1.7.1

abdelneuhaus commented 7 months ago

Hi, I have tried to use SRDTrans, but I have the same issue. In my case, it is working for the testing step (using a pre-trained model), however, the training crashes and it shows the same message as above. Here is my torch/torchvision version (as suggested):

image

Is it possible to have a list of the environment dependencies/libraries ?

Best

Huxiaowan commented 7 months ago

We provide detailed and the latest environment configurations as follow: conda_list.xlsx

davidcorcoran545 commented 5 months ago

I've had the same issue. I tried with different versions of torch: 1.8.0 (as in the install instructions), 1.7.1 (as mentioned in the above comment from Aug 5th 2023), and 2.0.1 (as in the above excel file). In each case I verified pytorch was installed correctly, and that the GPU driver and CUDA are enabled and accessible by pytorch following the instructions here.

Huxiaowan commented 5 months ago

It seems to be a multi-thread conflict issue, you can try the following solutions:

  1. If you are running on the Windows system, the subprocesses will import (i.e. execute) the main module at start. You need to insert an “if name == 'main': ” guard in the main module to avoid creating. Because multiple child processes can be used to load data in Linux systems, but not in Windows systems.

  2. Do not use multi-threading, and set num_workers=0. This is a suboptimal solution because it will slow down the loading. It is recommended to run on a Linux system and check the environment.

abdelneuhaus commented 2 weeks ago

Adding if name==main just before for epoch in range(0, opt.n_epochs) worked for me (on Windows) to launch training