Open nicolapapp opened 11 months ago
Your pytorch version may be too high, it is recommended to use pytorch==1.7.1
Hi, I have tried to use SRDTrans, but I have the same issue. In my case, it is working for the testing step (using a pre-trained model), however, the training crashes and it shows the same message as above. Here is my torch/torchvision version (as suggested):
Is it possible to have a list of the environment dependencies/libraries ?
Best
We provide detailed and the latest environment configurations as follow: conda_list.xlsx
I've had the same issue. I tried with different versions of torch: 1.8.0 (as in the install instructions), 1.7.1 (as mentioned in the above comment from Aug 5th 2023), and 2.0.1 (as in the above excel file). In each case I verified pytorch was installed correctly, and that the GPU driver and CUDA are enabled and accessible by pytorch following the instructions here.
It seems to be a multi-thread conflict issue, you can try the following solutions:
If you are running on the Windows system, the subprocesses will import (i.e. execute) the main module at start. You need to insert an “if name == 'main': ” guard in the main module to avoid creating. Because multiple child processes can be used to load data in Linux systems, but not in Windows systems.
Do not use multi-threading, and set num_workers=0. This is a suboptimal solution because it will slow down the loading. It is recommended to run on a Linux system and check the environment.
Adding if name==main just before for epoch in range(0, opt.n_epochs)
worked for me (on Windows) to launch training
Installed as from instructions. When running train.py as per instructions: python -u train.py --datasets_folder noisy --datasets_path datasets/ --n_epochs 30 --GPU 0 --train_datasets_size 6000 --patch_x 128 --patch_t 128
the run terminates with error: _**An attempt has been made to start a new process before the current process has finished its bootstrapping phase.**_
I have attached the output log as well as the output from: conda list I suspect that the code has problems with versions of torch or other libraries newer than when the project was created. Could you please publish the output from your conda installation with: conda list Thanks
conda_list.txt train_run..pdf