Closed Chegva closed 1 year ago
dict(type='RAWNormalize', blc=0, saturate=1024, key='img_raw'),
dict(
type='LoadImageFromFile',
io_backend='disk',
key='img_rgb',
flag='color'),
dict(
type='Resize',
keys=['img_raw', 'img_rgb'],
scale=(1280, 960),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=['img_rgb']),
dict(
type='Normalize',
keys=['img_rgb'],
to_rgb=True,
mean=[0, 0, 0],
std=[1, 1, 1]),
dict(type='ImageToTensor', keys=['img_raw', 'img_rgb']),
dict(
type='Collect',
keys=['img_raw', 'img_rgb'],
meta_keys=['img_raw_path', 'img_rgb_path'])
]))
DATASET = 'bdd100k' exp_name = 'unpaired_cycler2r_bdd100k_rgb2oneplus_raw' work_dir = './work_dirs/experiments/unpaired_cycler2r_bdd100k_rgb2oneplus_raw' gpu_ids = range(0, 1)
2023-01-16 15:39:48,774 - mmgen - INFO - Set random seed to 2021, deterministic: False
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git
上面这里一直报CUDA error: out of memory,但显存确实还有很多,实在查不出什么问题,能帮忙看看吗
您好,感谢您的关注,请问您的 GPU 配置是? 可以尝试在 config 里减小 batch size
samples_per_gpu=2,
workers_per_gpu=4,
解决了,现在单卡模式下能跑了
我在尝试用多卡训练时,设置如下: parser = argparse.ArgumentParser(description='Train a GAN model') parser.add_argument('--config',default='/workspace/rho-vision-main/configs/unpaired_cycler2r/unpaired_cycler2r_in_bdd100k_rgb2oneplus_raw_20k.py', help='train config file path') parser.add_argument('--work-dir', help='the dir to save logs and models') parser.add_argument( '--resume-from',help='the checkpoint file to resume from') parser.add_argument( '--no-validate', action='store_true', help='whether not to evaluate the checkpoint during training') group_gpus = parser.add_mutually_exclusive_group() group_gpus.add_argument( '--gpus', default=5, type=int, help='number of gpus to use ' '(only applicable to non-distributed training)') group_gpus.add_argument( '--gpu-ids', default=[0,1,2,3,4], type=int, nargs='+', help='ids of gpus to use ' '(only applicable to non-distributed training)') parser.add_argument('--seed', type=int, default=2021, help='random seed') parser.add_argument( '--deterministic', action='store_true', help='whether to set deterministic options for CUDNN backend.') parser.add_argument( '--cfg-options', nargs='+', action=DictAction, help='override some settings in the used config, the key-value pair ' 'in xxx=yyy format will be merged into config file.') parser.add_argument( '--launcher', choices=['none', 'pytorch', 'slurm', 'mpi'], default='pytorch', help='job launcher') parser.add_argument('--local_rank', type=int, default=0) args = parser.parse_args() if 'LOCAL_RANK' not in os.environ: os.environ['LOCAL_RANK'] = str(args.local_rank) 最开始,遇到了KeyError: 'RANK',然后我按网上的方法加了os.environ['RANK']='0'。
之后又遇到ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set,我参照之前的做法,加上了os.environ['WORLD_SIZE']='5',因为我用了os.environ['CUDA_VISIBLE_DEVICES'] = '1,3,4,5,6'命令指定了5张卡。
最后又遇到了ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_ADDR expected, but not set,依然按网上的办法,加了os.environ['MASTER_ADDR'] = 'localhost' 和 os.environ['MASTER_PORT'] = '5678' 这两个命令。
结果是程序是不报错了,但运行后,终端一直没有任何输出,似乎卡住了,请问你那边有遇到过这种情况吗,还是说我的设置方法存在什么问题呢,可以的话,望告知
您好,关于多卡训练,您可以参考 mmgeneration,但我们并没有进行过测试。
请问你们用的是python多少版本呢,我现在跑train.py一直报环境的错误