XingangPan / GAN2Shape

Code for GAN2Shape (ICLR2021 oral)
https://arxiv.org/abs/2011.00844
MIT License
573 stars 101 forks source link

!sh scripts/run_car.sh throws erros !!!! #49

Open dellshan opened 1 year ago

dellshan commented 1 year ago

/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

warnings.warn( WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


/usr/local/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /usr/local/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /usr/local/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /usr/local/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /usr/local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs warn(f"Failed to load image Python extension: {e}") /usr/local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs warn(f"Failed to load image Python extension: {e}") /usr/local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs warn(f"Failed to load image Python extension: {e}") /usr/local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs warn(f"Failed to load image Python extension: {e}") Load config from yml file: configs/car.yml Loading configs from configs/car.yml Load config from yml file: configs/car.ymlLoad config from yml file: configs/car.yml Loading configs from configs/car.yml Load config from yml file: configs/car.yml Loading configs from configs/car.yml

Loading configs from configs/car.yml {'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True} {'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True} {'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True} {'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True} Setting up Perceptual loss... /usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) Setting up Perceptual loss... /usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) Setting up Perceptual loss... /usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) Setting up Perceptual loss... /usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) Loading model from: /content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth Loading model from: /content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth Traceback (most recent call last): File "run.py", line 31, in trainer = Trainer(cfgs, GAN2Shape) File "/content/GAN2Shape/gan2shape/trainer.py", line 23, in init self.model = model(cfgs) File "/content/GAN2Shape/gan2shape/model.py", line 88, in init self.PerceptualLoss = PerceptualLoss( File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/init.py", line 21, in init self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace, spatial=self.spatial, gpu_ids=gpu_ids) File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/dist_model.py", line 75, in initialize self.net.load_state_dict(torch.load(model_path, kw), strict=False) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 795, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1012, in _legacy_load result = unpickler.load() File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 958, in persistent_load wrap_storage=restore_location(obj, location), File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1055, in restore_location return default_restore_location(storage, str(map_location)) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 215, in default_restore_location result = fn(storage, location) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 182, in _cuda_deserialize device = validate_cuda_device(location) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 173, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on CUDA device ' RuntimeError: Attempting to deserialize object on CUDA device 2 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device. Loading model from: /content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth Traceback (most recent call last): File "run.py", line 31, in trainer = Trainer(cfgs, GAN2Shape) File "/content/GAN2Shape/gan2shape/trainer.py", line 23, in init self.model = model(cfgs) File "/content/GAN2Shape/gan2shape/model.py", line 88, in init self.PerceptualLoss = PerceptualLoss( File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/init.py", line 21, in init self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace, spatial=self.spatial, gpu_ids=gpu_ids) File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/dist_model.py", line 75, in initialize self.net.load_state_dict(torch.load(model_path, kw), strict=False) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 795, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1012, in _legacy_load result = unpickler.load() File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 958, in persistent_load wrap_storage=restore_location(obj, location), File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1055, in restore_location return default_restore_location(storage, str(map_location)) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 215, in default_restore_location result = fn(storage, location) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 182, in _cuda_deserialize device = validate_cuda_device(location) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 173, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on CUDA device ' RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device. Loading model from: /content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth Traceback (most recent call last): File "run.py", line 31, in trainer = Trainer(cfgs, GAN2Shape) File "/content/GAN2Shape/gan2shape/trainer.py", line 23, in init self.model = model(cfgs) File "/content/GAN2Shape/gan2shape/model.py", line 88, in init self.PerceptualLoss = PerceptualLoss( File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/init.py", line 21, in init self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace, spatial=self.spatial, gpu_ids=gpu_ids) File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/dist_model.py", line 75, in initialize self.net.load_state_dict(torch.load(model_path, kw), strict=False) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 795, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1012, in _legacy_load result = unpickler.load() File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 958, in persistent_load wrap_storage=restore_location(obj, location), File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1055, in restore_location return default_restore_location(storage, str(map_location)) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 215, in default_restore_location result = fn(storage, location) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 182, in _cuda_deserialize device = validate_cuda_device(location) File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 173, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on CUDA device ' RuntimeError: Attempting to deserialize object on CUDA device 3 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device. WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 42184 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 42185) of binary: /usr/local/bin/python Traceback (most recent call last): File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in main() File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main launch(args) File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch run(args) File "/usr/local/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/usr/local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

run.py FAILED

Failures: [1]: time : 2023-03-05_04:31:36 host : 1cdb853b957d rank : 2 (local_rank: 2) exitcode : 1 (pid: 42186) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2023-03-05_04:31:36 host : 1cdb853b957d rank : 3 (local_rank: 3) exitcode : 1 (pid: 42187) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-03-05_04:31:36 host : 1cdb853b957d rank : 1 (local_rank: 1) exitcode : 1 (pid: 42185) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html