eliahuhorwitz / DeepSIM

Official PyTorch implementation of the paper: "DeepSIM: Image Shape Manipulation from a Single Augmented Training Sample" (ICCV 2021 Oral)
421 stars 50 forks source link

Infinite Loop Error (keeps starting train.py for some reason) #14

Open civiliangame opened 2 years ago

civiliangame commented 2 years ago

Hello, I haven't made any modifications to the code. I cloned it, installed the requirements, and ran the script for training. No other options, no custom data.

As you can see, it ran train.py twice for some reason and then got out with a broken pipe error. I've included the error output below. (Output 1)

I then tried debugging this on another machine with the if name == main modification to prevent train.py from calling itself.

It seemed like line 74 from train.py was causing this issue: image

This too caused an error, albeit a different one. I've included that one as well (Output 2)

Thank you.

Here's the error Output 1: (deepsim2) D:\DeepSIM>python ./train.py --dataroot ./datasets/car --primitive seg --no_instance --tps_aug 1 --name DeepSIMCar C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}") name DeepSIMCar [0] ------------ Options ------------- affine_aug: none batchSize: 1 beta1: 0.5 canny_aug: 0 canny_color: 0 canny_sigma_l_bound: 1.2 canny_sigma_step: 0.3 canny_sigma_u_bound: 3 checkpoints_dir: ./checkpoints continue_train: False cutmix_aug: 0 cutmix_max_size: 96 cutmix_min_size: 32 data_type: 32 dataroot: ./datasets/car debug: False display_freq: 100 display_winsize: 512 feat_num: 3 fineSize: 256 fp16: False gpu_ids: [0] input_nc: 3 instance_feat: False isTrain: True label_feat: False label_nc: 0 lambda_feat: 10.0 loadSize: 256 load_features: False load_pretrain: local_rank: 0 lr: 0.0002 max_dataset_size: inf model: pix2pixHD nThreads: 2 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: DeepSIMCar ndf: 64 nef: 16 netG: global ngf: 64 niter: 8000 niter_decay: 8000 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_instance: True no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 output_nc: 3 phase: train pool_size: 0 primitive: seg print_freq: 100 resize_or_crop: none save_epoch_freq: 20000 save_latest_freq: 20000 serial_batches: False test_canny_sigma: 2 tf_log: False tps_aug: 1 tps_percent: 0.99 tps_points_per_dim: 3 use_dropout: False verbose: False which_epoch: latest -------------- End ---------------- ./train.py:11: DeprecationWarning: fractions.gcd() is deprecated. Use math.gcd() instead. def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0 CustomDatasetDataLoader dataset [AlignedDataset] was created

training images = 1

C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torchvision\models_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, " C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1. You can also use weights=VGG19_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) create web directory ./checkpoints\DeepSIMCar\web... display_delta 0 print_delta 0.0 save_delta 0 C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}") name DeepSIMCar [0] ------------ Options ------------- affine_aug: none batchSize: 1 beta1: 0.5 canny_aug: 0 canny_color: 0 canny_sigma_l_bound: 1.2 canny_sigma_step: 0.3 canny_sigma_u_bound: 3 checkpoints_dir: ./checkpoints continue_train: False cutmix_aug: 0 cutmix_max_size: 96 cutmix_min_size: 32 data_type: 32 dataroot: ./datasets/car debug: False display_freq: 100 display_winsize: 512 feat_num: 3 fineSize: 256 fp16: False gpu_ids: [0] input_nc: 3 instance_feat: False isTrain: True label_feat: False label_nc: 0 lambda_feat: 10.0 loadSize: 256 load_features: False load_pretrain: local_rank: 0 lr: 0.0002 max_dataset_size: inf model: pix2pixHD nThreads: 2 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: DeepSIMCar ndf: 64 nef: 16 netG: global ngf: 64 niter: 8000 niter_decay: 8000 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_instance: True no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 output_nc: 3 phase: train pool_size: 0 primitive: seg print_freq: 100 resize_or_crop: none save_epoch_freq: 20000 save_latest_freq: 20000 serial_batches: False test_canny_sigma: 2 tf_log: False tps_aug: 1 tps_percent: 0.99 tps_points_per_dim: 3 use_dropout: False verbose: False which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [AlignedDataset] was created

training images = 1

C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torchvision\models_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, " C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1. You can also use weights=VGG19_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) create web directory ./checkpoints\DeepSIMCar\web... display_delta 0 print_delta 0.0 save_delta 0 Traceback (most recent call last): File "", line 1, in Traceback (most recent call last): File "./train.py", line 74, in File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\spawn.py", line 105, in spawn_main for i, data in enumerate(dataset, start=epoch_iter): exitcode = _main(fd) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torch\utils\data\dataloader.py", line 438, in iter File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main__") File "C:\Users\Public\Anaconda\envs\deepsim2\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "D:\DeepSIM\train.py", line 74, in for i, data in enumerate(dataset, start=epoch_iter): File "C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torch\utils\data\dataloader.py", line 438, in iter return self._get_iterator() return self._get_iterator() File "C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torch\utils\data\dataloader.py", line 384, in _get_iterator File "C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torch\utils\data\dataloader.py", line 384, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torch\utils\data\dataloader.py", line 1048, in init return _MultiProcessingDataLoaderIter(self) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\site-packages\torch\utils\data\dataloader.py", line 1048, in init w.start() File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\process.py", line 112, in start w.start() File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) self._popen = self._Popen(self) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\context.py", line 223, in _Popen File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\context.py", line 322, in _Popen File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) return Popen(process_obj) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\popen_spawn_win32.py", line 89, in init File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\popen_spawn_win32.py", line 46, in init__ reduction.dump(process_obj, to_child) prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\reduction.py", line 60, in dump File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "C:\Users\Public\Anaconda\envs\deepsim2\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main ForkingPickler(file, protocol).dump(obj) is not going to be frozen to produce an executable.''') BrokenPipeError: [Errno 32] Broken pipe RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Here is (Error Output 2) (deepsim) PS E:\JM\GAN\deepsim> python ./train.py --dataroot ./datasets/car --primitive seg --no_instance --tps_aug 1 --name DeepSIMCar C:\Users\TWiM.conda\envs\deepsim\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}") name DeepSIMCar [0] ------------ Options ------------- affine_aug: none batchSize: 1 beta1: 0.5 canny_aug: 0 canny_color: 0 canny_sigma_l_bound: 1.2 canny_sigma_step: 0.3 canny_sigma_u_bound: 3 checkpoints_dir: ./checkpoints continue_train: False cutmix_aug: 0 cutmix_max_size: 96 cutmix_min_size: 32 data_type: 32 dataroot: ./datasets/car debug: False display_freq: 100 display_winsize: 512 feat_num: 3 fineSize: 256 fp16: False gpu_ids: [0] input_nc: 3 instance_feat: False isTrain: True label_feat: False label_nc: 0 lambda_feat: 10.0 loadSize: 256 load_features: False load_pretrain: local_rank: 0 lr: 0.0002 max_dataset_size: inf model: pix2pixHD nThreads: 2 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: DeepSIMCar ndf: 64 nef: 16 netG: global ngf: 64 niter: 8000 niter_decay: 8000 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_instance: True no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 output_nc: 3 phase: train pool_size: 0 primitive: seg print_freq: 100 resize_or_crop: none save_epoch_freq: 20000 save_latest_freq: 20000 serial_batches: False test_canny_sigma: 2 tf_log: False tps_aug: 1 tps_percent: 0.99 tps_points_per_dim: 3 use_dropout: False verbose: False which_epoch: latest -------------- End ---------------- ./train.py:16: DeprecationWarning: fractions.gcd() is deprecated. Use math.gcd() instead. def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0 CustomDatasetDataLoader dataset [AlignedDataset] was created

training images = 1

C:\Users\TWiM.conda\envs\deepsim\lib\site-packages\torchvision\models_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, " C:\Users\TWiM.conda\envs\deepsim\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1. You can also use weights=VGG19_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) create web directory ./checkpoints\DeepSIMCar\web... display_delta 0 print_delta 0.0 save_delta 0 C:\Users\TWiM.conda\envs\deepsim\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}") Traceback (most recent call last): File "", line 1, in Traceback (most recent call last): File "C:\Users\TWiM.conda\envs\deepsim\lib\multiprocessing\spawn.py", line 105, in spawn_main File "./train.py", line 197, in exitcode = _main(fd) main() File "./train.py", line 81, in main File "C:\Users\TWiM.conda\envs\deepsim\lib\multiprocessing\spawn.py", line 115, in _main for i, data in enumerate(dataset, start=epoch_iter): self = reduction.pickle.load(from_parent) File "C:\Users\TWiM.conda\envs\deepsim\lib\site-packages\torch\utils\data\dataloader.py", line 438, in iter EOFError: Ran out of input return self._get_iterator() File "C:\Users\TWiM.conda\envs\deepsim\lib\site-packages\torch\utils\data\dataloader.py", line 384, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\TWiM.conda\envs\deepsim\lib\site-packages\torch\utils\data\dataloader.py", line 1048, in init w.start() File "C:\Users\TWiM.conda\envs\deepsim\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\TWiM.conda\envs\deepsim\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\TWiM.conda\envs\deepsim\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\TWiM.conda\envs\deepsim\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "C:\Users\TWiM.conda\envs\deepsim\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'CustomDatasetDataLoader.initialize..'

civiliangame commented 2 years ago

I found the problem and it seems like it's fixed for now. Documenting for the future adventurers.

Go to DeepSIM/data/custom_dataset_data_loader.py in your repository.


class CustomDatasetDataLoader(BaseDataLoader):

def name(self):
    return 'CustomDatasetDataLoader'

def initialize(self, opt):
    BaseDataLoader.initialize(self, opt)
    if opt.isTrain:
        self.dataset = CreateDataset(opt)
        self.dataset = CreateDataset_test(opt)
    self.dataloader = torch.utils.data.DataLoader(
        shuffle=not opt.serial_batches,
    worker_init_fn=lambda _: np.random.seed())

def load_data(self):
    return self.

Replace the entirety of line 38 with one parentheses. It seems like the multithreaded thing is messing with everything.

Then, go to train.py and put everything in an

if name == "main": main()

def main:

literally all of train.py here

This worked for me and at least the code is running.

civiliangame commented 2 years ago

For test.py, you should also put everything in an if name == "main" loop