CUDA out of memory - Githubissues

41968 commented 11 months ago

Hi, I am using alpha-beta-CROWN to certify a model, called CNN7, on CIFAR10 dataset. However, I met "CUDA out of memory error":

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.11 GiB (GPU 0; 23.65 GiB total capacity; 8.18 GiB already allocated; 8.91 GiB free; 13.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

As shown above, the memory of my GPU card is 24GB.

The structure of CNN7 is as follows:

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU()
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU()
  (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (5): ReLU()
  (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): ReLU()
  (8): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): ReLU()
  (10): Flatten(start_dim=1, end_dim=-1)
  (11): Linear(in_features=32768, out_features=512, bias=True)
  (12): ReLU()
  (13): Linear(in_features=512, out_features=10, bias=True)
)

I am considering if CNN7 model is too big to be certified by alpha-beta-CROWN.

So is there any way for me to certify CNN7 using alpha-beta-CROWN (without a GPU card with larger memory)?

41968 commented 11 months ago

The config file is as follows:

# This is an example configuration file that contains most useful parameter settings.
model:
  name: cnn7  # This model is defined in model_defs.py. Add your own model definitions there.
  path: ***  # Path to PyTorch checkpoint.
data:
  dataset: CIFAR  # Dataset name. This is just the standard CIFAR-10 test set defined in the "load_verification_dataset()" function in utils.py
  mean: [0.4914, 0.4822, 0.4465]  # Mean for normalization.
  std: [0.2023, 0.1994, 0.2010]  # Std for normalization.
  start: 0  # First example to verify in dataset.
  end: 100  # Last example to verify in dataset. We verify 100 examples in this test.
specification:
  norm: .inf  # Linf norm (can also be 2 or 1).
  # epsilon: 0.00784313725490196  # epsilon=2./255.
  epsilon: 0.03137254901960784 # epsilon=8./255
attack:  # Currently attack is only implemented for Linf norm.
  pgd_steps: 100  # Increase for a stronger attack. A PGD attack will be used before verification to filter on non-robust data examples.
  pgd_restarts: 30  # Increase for a stronger attack.
solver:
  batch_size: 1  # Number of subdomains to compute in parallel in bound solver. Decrease if you run out of memory.
  alpha-crown:
    iteration: 100   # Number of iterations for alpha-CROWN optimization. Alpha-CROWN is used to compute all intermediate layer bounds before branch and bound starts.
    lr_alpha: 0.1    # Learning rate for alpha in alpha-CROWN. The default (0.1) is typically ok.
  beta-crown:
    lr_alpha: 0.01  # Learning rate for optimizing the alpha parameters, the default (0.01) is typically ok, but you can try to tune this parameter to get better lower bound.
    lr_beta: 0.05  # Learning rate for optimizing the beta parameters, the default (0.05) is typically ok, but you can try to tune this parameter to get better lower bound.
    iteration: 20  # Number of iterations for beta-CROWN optimization. 20 is often sufficient, 50 or 100 can also be used.
bab:
  timeout: 0  # Timeout threshold for branch and bound. Increase for verifying more points.
  branching:  # Parameters for branching heuristics.
    reduceop: min  # Reduction function for the branching heuristic scores, min or max. Using max can be better on some models.
    method: kfsb  # babsr is fast but less accurate; fsb is slow but most accurate; kfsb is usually a balance.
    candidates: 3  # Number of candidates to consider in fsb and kfsb. More leads to slower but better branching. 3 is typically good enough.

batch_size is already 1.

shizhouxing commented 11 months ago

@41968 It may depend on whether this model contains many unstable ReLU neurons. If most ReLU neurons are unstable, this model can be large for the complete verifiers. How did you obtain the CNN7 model?

41968 commented 11 months ago

@shizhouxing Trained by myself using a slightly modified IBP.

dataset: CIFAR10
perturbation: 0.00784313725
accuracy: 0.6870
certified accuracy (by IBP): 0.3299

shizhouxing commented 11 months ago

@41968 I see in your configuration above epsilon: 0.03137254901960784 but the model is trained with eps=2/255?

41968 commented 11 months ago

@shizhouxing Sorry for that, but the mismatch error did not occur in my experiment.

In practice, I trained with eps=2/255 and certified with epsilon=0.00784313725490196. That is, the epsilon in the config will be modified to 0.00784313725490196 when certifying.

Do you often encounter cuda out of memory issue when verifying CNN7?

shizhouxing commented 11 months ago

@41968 I see. It may imply that there are too many unstable ReLU neurons in the model.

For me, I have tried models trained with https://github.com/shizhouxing/Fast-certified-robust-training on eps=8/255, which worked fine, but I'm not sure about the GPU memory I used. Those models can mostly be verified by IBP though.

Verified-Intelligence / alpha-beta-CROWN

CUDA out of memory #46