Open kelisiya opened 4 years ago
My suggestion is to modify the .toml file instead. Config.py only has default values and will be overwrited by .toml.
I think the code will only print config details once by default, but in your logs there are two different configurations. Does that mean the code print config details twice at the same time?
It will be helpful if you can provide more information about how you modified config.py.
It looks like you assigned values to global variables like 'gpu': [0, 1]
. The value of these global variables will be overwritten by the code at runtime, so you dont need to set these values.
This problem looks like about GUPs, my computer have 2 gpus , but I can't use torch.distributed.launch. If I want to train the model , how to modify the code ?
I found the error from docker . But I have a new question, why do I train the model has 302.7MB , but gca-dist is 101.3MB?
Yes, the checkpoint will be ~300MB if you saved all the states in your optimizer by default. I removed the Adam state_dict from the pretrained model provided in this repository to save some space.
when I modified *toml file , I have a same problem about the FileNotFoundError: [Errno 2] No such file or directory: '/home/liyaoyi/dataset/coco_bg' . Can you tell me some suggestions to solve this problem ? thanks a lot. @kelisiya @
When I want to modify the config.py in my datasets , the error is
FileNotFoundError: [Errno 2] No such file or directory: '/home/liyaoyi/dataset/coco_bg'
But I modified the path and printed this function:{'data': {'augmentation': True, 'crop_size': 512, 'random_interp': False, 'test_alpha': '/data/Adobe_Dataset/Test_set/comp/alpha', 'test_merged': '/data/Adobe_Dataset/Test_set/comp/image', 'test_trimap': '/data/Adobe_Dataset/Test_set/comp/trimap', 'train_alpha': '/data/Adobe_Dataset/Training_set/all/alpha', 'train_bg': '/data/Adobe_Dataset/background_all', 'train_fg': '/data/Adobe_Dataset/Training_set/all/fg', 'workers': 0}, 'dist': False, 'gpu': [0, 1], 'is_default': True, 'local_rank': 0, 'log': {'checkpoint_path': './checkpoints', 'checkpoint_step': 10000, 'logging_level': 'DEBUG', 'logging_path': './logs/stdout', 'logging_step': 10, 'tensorboard_image_step': 500, 'tensorboard_path': './logs/tensorboard', 'tensorboard_step': 100}, 'model': {'arch': {'decoder': 'res_shortcut_decoder_22', 'discriminator': None, 'encoder': 'res_shortcut_encoder_29'}, 'batch_size': 16, 'imagenet_pretrain': True, 'imagenet_pretrain_path': './pretrain/gca-dist.pth', 'trimap_channel': 3}, 'phase': 'train', 'test': {'alpha': None, 'alpha_path': None, 'batch_size': 1, 'checkpoint': 'best_model', 'cpu': False, 'fast_eval': True, 'merged': None, 'scale': 'origin', 'trimap': None}, 'train': {'G_lr': 0.001, 'beta1': 0.5, 'beta2': 0.999, 'clip_grad': True, 'comp_weight': 0, 'gabor_weight': 0, 'grad_weight': 0, 'rec_weight': 1, 'reset_lr': False, 'resume_checkpoint': None, 'smooth_l1_weight': 0, 'total_step': 100000, 'val_step': 1000, 'warmup_step': 5000}, 'version': 'baseline', 'world_size': 1} {'data': {'augmentation': True, 'crop_size': 512, 'random_interp': False, 'test_alpha': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/alpha_copy', 'test_merged': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/merged', 'test_trimap': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/trimaps', 'train_alpha': '/home/liyaoyi/dataset/Adobe/train/alpha', 'train_bg': '/home/liyaoyi/dataset/coco_bg', 'train_fg': '/home/liyaoyi/dataset/Adobe/train/fg', 'workers': 4}, 'dist': True, 'gpu': [0, 1], 'is_default': False, 'local_rank': 0, 'log': {'checkpoint_path': './checkpoints', 'checkpoint_step': 2000, 'logging_level': 'INFO', 'logging_path': './logs/stdout', 'logging_step': 10, 'tensorboard_image_step': 2000, 'tensorboard_path': './logs/tensorboard', 'tensorboard_step': 100}, 'model': {'arch': {'decoder': 'res_gca_decoder_22', 'discriminator': None, 'encoder': 'resnet_gca_encoder_29'}, 'batch_size': 10, 'imagenet_pretrain': True, 'imagenet_pretrain_path': 'pretrain/model_best_resnet34_En_nomixup.pth', 'trimap_channel': 3}, 'phase': 'train', 'test': {'alpha': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/alpha_copy', 'alpha_path': 'prediction', 'batch_size': 1, 'checkpoint': 'gca-dist', 'cpu': False, 'fast_eval': True, 'merged': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/merged', 'scale': 'origin', 'trimap': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/trimaps'}, 'train': {'G_lr': 0.0004, 'beta1': 0.5, 'beta2': 0.999, 'clip_grad': True, 'comp_weight': 0, 'gabor_weight': 0, 'grad_weight': 0, 'rec_weight': 1, 'reset_lr': False, 'resume_checkpoint': None, 'smooth_l1_weight': 0, 'total_step': 200000, 'val_step': 2000, 'warmup_step': 5000}, 'version': 'gca-dist', 'world_size': 1}
There are two different result in path , what's the error ?