some error in config.py

kelisiya commented 4 years ago

When I want to modify the config.py in my datasets , the error is FileNotFoundError: [Errno 2] No such file or directory: '/home/liyaoyi/dataset/coco_bg' But I modified the path and printed this function: {'data': {'augmentation': True, 'crop_size': 512, 'random_interp': False, 'test_alpha': '/data/Adobe_Dataset/Test_set/comp/alpha', 'test_merged': '/data/Adobe_Dataset/Test_set/comp/image', 'test_trimap': '/data/Adobe_Dataset/Test_set/comp/trimap', 'train_alpha': '/data/Adobe_Dataset/Training_set/all/alpha', 'train_bg': '/data/Adobe_Dataset/background_all', 'train_fg': '/data/Adobe_Dataset/Training_set/all/fg', 'workers': 0}, 'dist': False, 'gpu': [0, 1], 'is_default': True, 'local_rank': 0, 'log': {'checkpoint_path': './checkpoints', 'checkpoint_step': 10000, 'logging_level': 'DEBUG', 'logging_path': './logs/stdout', 'logging_step': 10, 'tensorboard_image_step': 500, 'tensorboard_path': './logs/tensorboard', 'tensorboard_step': 100}, 'model': {'arch': {'decoder': 'res_shortcut_decoder_22', 'discriminator': None, 'encoder': 'res_shortcut_encoder_29'}, 'batch_size': 16, 'imagenet_pretrain': True, 'imagenet_pretrain_path': './pretrain/gca-dist.pth', 'trimap_channel': 3}, 'phase': 'train', 'test': {'alpha': None, 'alpha_path': None, 'batch_size': 1, 'checkpoint': 'best_model', 'cpu': False, 'fast_eval': True, 'merged': None, 'scale': 'origin', 'trimap': None}, 'train': {'G_lr': 0.001, 'beta1': 0.5, 'beta2': 0.999, 'clip_grad': True, 'comp_weight': 0, 'gabor_weight': 0, 'grad_weight': 0, 'rec_weight': 1, 'reset_lr': False, 'resume_checkpoint': None, 'smooth_l1_weight': 0, 'total_step': 100000, 'val_step': 1000, 'warmup_step': 5000}, 'version': 'baseline', 'world_size': 1} {'data': {'augmentation': True, 'crop_size': 512, 'random_interp': False, 'test_alpha': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/alpha_copy', 'test_merged': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/merged', 'test_trimap': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/trimaps', 'train_alpha': '/home/liyaoyi/dataset/Adobe/train/alpha', 'train_bg': '/home/liyaoyi/dataset/coco_bg', 'train_fg': '/home/liyaoyi/dataset/Adobe/train/fg', 'workers': 4}, 'dist': True, 'gpu': [0, 1], 'is_default': False, 'local_rank': 0, 'log': {'checkpoint_path': './checkpoints', 'checkpoint_step': 2000, 'logging_level': 'INFO', 'logging_path': './logs/stdout', 'logging_step': 10, 'tensorboard_image_step': 2000, 'tensorboard_path': './logs/tensorboard', 'tensorboard_step': 100}, 'model': {'arch': {'decoder': 'res_gca_decoder_22', 'discriminator': None, 'encoder': 'resnet_gca_encoder_29'}, 'batch_size': 10, 'imagenet_pretrain': True, 'imagenet_pretrain_path': 'pretrain/model_best_resnet34_En_nomixup.pth', 'trimap_channel': 3}, 'phase': 'train', 'test': {'alpha': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/alpha_copy', 'alpha_path': 'prediction', 'batch_size': 1, 'checkpoint': 'gca-dist', 'cpu': False, 'fast_eval': True, 'merged': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/merged', 'scale': 'origin', 'trimap': '/home/liyaoyi/dataset/Adobe/Combined_Dataset/Test_set/trimaps'}, 'train': {'G_lr': 0.0004, 'beta1': 0.5, 'beta2': 0.999, 'clip_grad': True, 'comp_weight': 0, 'gabor_weight': 0, 'grad_weight': 0, 'rec_weight': 1, 'reset_lr': False, 'resume_checkpoint': None, 'smooth_l1_weight': 0, 'total_step': 200000, 'val_step': 2000, 'warmup_step': 5000}, 'version': 'gca-dist', 'world_size': 1} There are two different result in path , what's the error ?

Yaoyi-Li commented 4 years ago

My suggestion is to modify the .toml file instead. Config.py only has default values and will be overwrited by .toml. I think the code will only print config details once by default, but in your logs there are two different configurations. Does that mean the code print config details twice at the same time? It will be helpful if you can provide more information about how you modified config.py. It looks like you assigned values to global variables like 'gpu': [0, 1]. The value of these global variables will be overwritten by the code at runtime, so you dont need to set these values.

kelisiya commented 4 years ago

This problem looks like about GUPs, my computer have 2 gpus , but I can't use torch.distributed.launch. If I want to train the model , how to modify the code ?

kelisiya commented 4 years ago

I found the error from docker . But I have a new question, why do I train the model has 302.7MB , but gca-dist is 101.3MB?

Yaoyi-Li commented 4 years ago

Yes, the checkpoint will be ~300MB if you saved all the states in your optimizer by default. I removed the Adam state_dict from the pretrained model provided in this repository to save some space.

stevesrh commented 3 years ago

when I modified *toml file , I have a same problem about the FileNotFoundError: [Errno 2] No such file or directory: '/home/liyaoyi/dataset/coco_bg' . Can you tell me some suggestions to solve this problem ? thanks a lot. @kelisiya @

Yaoyi-Li / GCA-Matting

some error in config.py #2