Runtime Error - Githubissues

june1819 commented 3 years ago

RuntimeError: CUDA out of memory. Tried to allocate 210.00 MiB (GPU 0; 10.91 GiB total capacity; 9.59 GiB already allocated; 85.56 MiB free; 9.81 GiB reserved in total by PyTorch)

I decrease order of priority in default_config.py. but it doesn't work. I set: batch_size = 2; latent_channels = 44; n_residual_blocks = 3; crop_size = 64; image_dims = (3,64,64); latent_dims = (latent_channels,8,8)

My GPU is GeForce GTX 1080P

Justin-Tan commented 3 years ago

I'm surprised you're still getting OOM errors with a batch size of 2 and a 64x64 crop size, even on a 1080. Can you monitor the memory usage by watch -n 0.5 nvidia-smi and post the results here?

june1819 commented 3 years ago

before before run train.py

june1819 commented 3 years ago

after after run train.py. I use gpu 1.

june1819 commented 3 years ago

my default_config.py:

class DatasetPaths(object): OPENIMAGES = '/media/zsf/June/L3C-PyTorch-master/src/DATA_DIR1/' CITYSCAPES = '' JETS = ''

class directories(object): experiments = 'experiments'

class args(object): """ Shared config """ name = 'hific_v0.1' silent = True n_epochs = 8 n_steps = 1e6 batch_size = 2 log_interval = 1000 save_interval = 50000 gpu = 1 multigpu = True dataset = Datasets.OPENIMAGES dataset_path = DatasetPaths.OPENIMAGES shuffle = True

# GAN params
discriminator_steps = 0
model_mode = ModelModes.TRAINING
sample_noise = False
#noise_dim = 32
noise_dim = 16

# Architecture params - Table 3a) of [1]
#latent_channels = 220
latent_channels = 44
#n_residual_blocks = 9            # Authors use 9 blocks, performance saturates at 5
n_residual_blocks = 3
lambda_B = 2**(-4)              # Loose rate
k_M = 0.075 * 2**(-5)           # Distortion
k_P = 1.                        # Perceptual loss
beta = 0.15                     # Generator loss
use_channel_norm = True
likelihood_type = 'gaussian'    # Latent likelihood model
normalize_input_image = False   # Normalize inputs to range [-1,1]

# Shapes
#crop_size = 256
#image_dims = (3,256,256)
crop_size = 64
image_dims = (3,64,64)
#latent_dims = (latent_channels,16,16)
latent_dims = (latent_channels,8,8)

# Optimizer params
learning_rate = 1e-4
weight_decay = 1e-6

# Scheduling
lambda_schedule = dict(vals=[2., 1.], steps=[50000])
lr_schedule = dict(vals=[1., 0.1], steps=[500000])
target_schedule = dict(vals=[0.20/0.14, 1.], steps=[50000])  # Rate allowance
ignore_schedule = False

# match target rate to lambda_A coefficient
regime = 'low'  # -> 0.14
target_rate_map = dict(low=0.14, med=0.3, high=0.45)
lambda_A_map = dict(low=2**1, med=2**0, high=2**(-1))
target_rate = target_rate_map[regime]
lambda_A = lambda_A_map[regime]

# DLMM
use_latent_mixture_model = False
mixture_components = 4
#latent_channels_DLMM = 64
latent_channels_DLMM = 32

Justin-Tan commented 3 years ago

Note the command line arguments (including the defaults) can override the settings in default_config.py, try passing in the same values via the command line and see if that helps, e.g.

python3 train.py ... --batch_size 2 --n_residual_blocks 0

june1819 commented 3 years ago

thank you! The problem is solved.

Justin-Tan / high-fidelity-generative-compression

Runtime Error #18