Running out of GPU resource

Hi developer.

When using GradientInversion_Attack.group_attack(), my GPU memory usage keeps accumulating (11441MiB) until my program is killed by itself. Based on my understanding, the function should release allocated GPU memory once the attacker finished one attack. Is there any way to prevent this issue?

@iamyifan

I'm sorry for not getting back to you sooner. I really appreciate your continues interest in this project!!! I have also confirmed this problem. I tried "torch.cuda.empty_cache()", but it didn't work. Have you found any solution?

Code

from aijack.attack.inversion import GradientInversion_Attack

# torch.cuda.empty_cache()

net = LeNet(channel=channel, hideen=hidden, num_classes=num_classes).to(device)
pred = net(x_batch.to(device))
loss = criterion(pred, y_batch.to(device))
received_gradients = torch.autograd.grad(loss, net.parameters())
received_gradients = [cg.detach() for cg in received_gradients]

print("Before Attack:\n", torch.cuda.memory_summary())

gradinversion = GradientInversion_Attack(
    net,
    (1, 28, 28),
    num_iteration=1000,
    lr=1e2,
    log_interval=0,
    optimizer_class=torch.optim.SGD,
    distancename="l2",
    optimize_label=False,
    bn_reg_layers=[net.body[1], net.body[4], net.body[7]],
    group_num=3,
    tv_reg_coef=0.00,
    l2_reg_coef=0.0001,
    bn_reg_coef=0.001,
    gc_reg_coef=0.001,
    device=device
)

result = gradinversion.group_attack(received_gradients, batch_size=batch_size)

print("After Attack:\n", torch.cuda.memory_summary())

fig = plt.figure(figsize=(3, 2))
for bid in range(batch_size):
    ax1 = fig.add_subplot(1, batch_size, bid + 1)
    ax1.imshow((sum(result[0]) / len(result[0])).detach().cpu().numpy()[bid][0], cmap="gray")
    ax1.axis("off")
plt.tight_layout()
plt.show()

Result

Before Attack:
 |===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  125440 B  |    1184 KB |    1414 KB |    1291 KB |
|       from large pool |       0 B  |       0 KB |       0 KB |       0 KB |
|       from small pool |  125440 B  |    1184 KB |    1414 KB |    1291 KB |
|---------------------------------------------------------------------------|
| Active memory         |  125440 B  |    1184 KB |    1414 KB |    1291 KB |
|       from large pool |       0 B  |       0 KB |       0 KB |       0 KB |
|       from small pool |  125440 B  |    1184 KB |    1414 KB |    1291 KB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 B  |
|       from small pool |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |    1925 KB |    2046 KB |    3338 KB |    1412 KB |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 KB |
|       from small pool |    1925 KB |    2046 KB |    3338 KB |    1412 KB |
|---------------------------------------------------------------------------|
| Allocations           |      39    |      44    |      71    |      32    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      39    |      44    |      71    |      32    |
|---------------------------------------------------------------------------|
| Active allocs         |      39    |      44    |      71    |      32    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      39    |      44    |      71    |      32    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       1    |       1    |       1    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       1    |       1    |       1    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       5    |       6    |      19    |      14    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       5    |       6    |      19    |      14    |
|---------------------------------------------------------------------------|
| Oversize allocations  |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize GPU segments |       0    |       0    |       0    |       0    |
|===========================================================================|

After Attack:
 |===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |     820 MB |     821 MB |   12370 MB |   11549 MB |
|       from large pool |       0 MB |       0 MB |       0 MB |       0 MB |
|       from small pool |     820 MB |     821 MB |   12370 MB |   11549 MB |
|---------------------------------------------------------------------------|
| Active memory         |     820 MB |     821 MB |   12370 MB |   11549 MB |
|       from large pool |       0 MB |       0 MB |       0 MB |       0 MB |
|       from small pool |     820 MB |     821 MB |   12370 MB |   11549 MB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |     822 MB |     822 MB |     822 MB |       0 B  |
|       from large pool |       0 MB |       0 MB |       0 MB |       0 B  |
|       from small pool |     822 MB |     822 MB |     822 MB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |    1430 KB |    2344 KB |   14663 MB |   14662 MB |
|       from large pool |       0 KB |       0 KB |       0 MB |       0 MB |
|       from small pool |    1430 KB |    2344 KB |   14663 MB |   14662 MB |
|---------------------------------------------------------------------------|
| Allocations           |  156071    |  156120    |    1554 K  |    1398 K  |
|       from large pool |       0    |       0    |       0 K  |       0 K  |
|       from small pool |  156071    |  156120    |    1554 K  |    1398 K  |
|---------------------------------------------------------------------------|
| Active allocs         |  156071    |  156120    |    1554 K  |    1398 K  |
|       from large pool |       0    |       0    |       0 K  |       0 K  |
|       from small pool |  156071    |  156120    |    1554 K  |    1398 K  |
|---------------------------------------------------------------------------|
| GPU reserved segments |     411    |     411    |     411    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |     411    |     411    |     411    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |      11    |      17    |  679371    |  679360    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      11    |      17    |  679371    |  679360    |
|---------------------------------------------------------------------------|
| Oversize allocations  |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize GPU segments |       0    |       0    |       0    |       0    |
|===========================================================================|

@iamyifan

Code

from aijack.attack.inversion import GradientInversion_Attack

# torch.cuda.empty_cache()

net = LeNet(channel=channel, hideen=hidden, num_classes=num_classes).to(device)
pred = net(x_batch.to(device))
loss = criterion(pred, y_batch.to(device))
received_gradients = torch.autograd.grad(loss, net.parameters())
received_gradients = [cg.detach() for cg in received_gradients]

print("Before Attack:\n", torch.cuda.memory_summary())

gradinversion = GradientInversion_Attack(
    net,
    (1, 28, 28),
    num_iteration=1000,
    lr=1e2,
    log_interval=0,
    optimizer_class=torch.optim.SGD,
    distancename="l2",
    optimize_label=False,
    bn_reg_layers=[net.body[1], net.body[4], net.body[7]],
    group_num=3,
    tv_reg_coef=0.00,
    l2_reg_coef=0.0001,
    bn_reg_coef=0.001,
    gc_reg_coef=0.001,
    device=device
)

result = gradinversion.group_attack(received_gradients, batch_size=batch_size)

print("After Attack:\n", torch.cuda.memory_summary())

fig = plt.figure(figsize=(3, 2))
for bid in range(batch_size):
    ax1 = fig.add_subplot(1, batch_size, bid + 1)
    ax1.imshow((sum(result[0]) / len(result[0])).detach().cpu().numpy()[bid][0], cmap="gray")
    ax1.axis("off")
plt.tight_layout()
plt.show()

Result

Before Attack:
 |===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  125440 B  |    1184 KB |    1414 KB |    1291 KB |
|       from large pool |       0 B  |       0 KB |       0 KB |       0 KB |
|       from small pool |  125440 B  |    1184 KB |    1414 KB |    1291 KB |
|---------------------------------------------------------------------------|
| Active memory         |  125440 B  |    1184 KB |    1414 KB |    1291 KB |
|       from large pool |       0 B  |       0 KB |       0 KB |       0 KB |
|       from small pool |  125440 B  |    1184 KB |    1414 KB |    1291 KB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 B  |
|       from small pool |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |    1925 KB |    2046 KB |    3338 KB |    1412 KB |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 KB |
|       from small pool |    1925 KB |    2046 KB |    3338 KB |    1412 KB |
|---------------------------------------------------------------------------|
| Allocations           |      39    |      44    |      71    |      32    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      39    |      44    |      71    |      32    |
|---------------------------------------------------------------------------|
| Active allocs         |      39    |      44    |      71    |      32    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      39    |      44    |      71    |      32    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       1    |       1    |       1    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       1    |       1    |       1    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       5    |       6    |      19    |      14    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       5    |       6    |      19    |      14    |
|---------------------------------------------------------------------------|
| Oversize allocations  |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize GPU segments |       0    |       0    |       0    |       0    |
|===========================================================================|

After Attack:
 |===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |     820 MB |     821 MB |   12370 MB |   11549 MB |
|       from large pool |       0 MB |       0 MB |       0 MB |       0 MB |
|       from small pool |     820 MB |     821 MB |   12370 MB |   11549 MB |
|---------------------------------------------------------------------------|
| Active memory         |     820 MB |     821 MB |   12370 MB |   11549 MB |
|       from large pool |       0 MB |       0 MB |       0 MB |       0 MB |
|       from small pool |     820 MB |     821 MB |   12370 MB |   11549 MB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |     822 MB |     822 MB |     822 MB |       0 B  |
|       from large pool |       0 MB |       0 MB |       0 MB |       0 B  |
|       from small pool |     822 MB |     822 MB |     822 MB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |    1430 KB |    2344 KB |   14663 MB |   14662 MB |
|       from large pool |       0 KB |       0 KB |       0 MB |       0 MB |
|       from small pool |    1430 KB |    2344 KB |   14663 MB |   14662 MB |
|---------------------------------------------------------------------------|
| Allocations           |  156071    |  156120    |    1554 K  |    1398 K  |
|       from large pool |       0    |       0    |       0 K  |       0 K  |
|       from small pool |  156071    |  156120    |    1554 K  |    1398 K  |
|---------------------------------------------------------------------------|
| Active allocs         |  156071    |  156120    |    1554 K  |    1398 K  |
|       from large pool |       0    |       0    |       0 K  |       0 K  |
|       from small pool |  156071    |  156120    |    1554 K  |    1398 K  |
|---------------------------------------------------------------------------|
| GPU reserved segments |     411    |     411    |     411    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |     411    |     411    |     411    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |      11    |      17    |  679371    |  679360    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      11    |      17    |  679371    |  679360    |
|---------------------------------------------------------------------------|
| Oversize allocations  |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize GPU segments |       0    |       0    |       0    |       0    |
|===========================================================================|

Not yet. I tried to add as much torch.cuda.empty_cache() as I could in relevant codes and functions but it seems like not work for me.

I am facing a similar issue. Here are the specifications:

GPU memory - 15.78GB
Method - gradinversion_attacker.attack(...)
Batch shape - (2, 3, 768, 768)

This is how I created the attacker object:

gradinversion_attacker = GradientInversion_Attack(
        model,
        image_shape,
        num_iteration=2,
        lr=1.0,
        log_interval=0,
        distancename="l2",
        optimize_label=False,
        group_num=1,
        tv_reg_coef=0.001,
        l2_reg_coef=0.0001,
        bn_reg_coef=0.001,
        gc_reg_coef=0.001,
        device=device,
        lossfunc=criterion,
        optimizer_class=torch.optim.SGD
    )
    reconstructed_x, _ = gradinversion_attacker.attack(
        received_gradients, labels=labels, batch_size=2)

When I initialize the attacker with num_iterations=1 , it works but the reconstructed image is just random noise. When I increased the num_iterations to 2, it doesn't work and the process kills itself with the error CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 15.78 GiB total capacity; 14.19 GiB already allocated; 6.94 MiB free; 14.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@Koukyosyumei Do you have any fixes for this? It would be really helpful if this is fixed. Thanks

@lokeshn011101 I'm so sorry for the late reply. We are still investigating this issue. If you have time, please help us to fix this.

Sure, I can help with this. I tried debugging this a few days back and I found that this line is where the memory keeps on increasing for every iteration.

This was because of the create_graph field set to true for the backward prop to happen here. I tried setting create_graph to False when the fake gradients were being generated and to True where the distance was backpropagated, but this didn't seem to reduce the distance at all and even after 50 iterations, it didn't seem to converge.

@Koukyosyumei Do you know why this is happening? I am not an expert in Torch, but my hunch is that since the graph is not created, the graph does not have all the connections, and when the distance is backpropagated, it doesn't know the edges properly and hence doesn't propagate it up to the fake_x and fake_labels. But I am not sure why the distance remains almost constant even when create_graph is set to True in distance.backward(). Any thoughts on this?

@lokeshn011101

Thank you very much for your suggestion! I finally figured out that retain_graph=True in this line caused a kind of memory leakage since it prevents PyTorch from automatically freeing memory.

I created a patch for this issue (#136 ).

GPU status without the patch (current main)

before attack

 |===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  16762 KiB |  16882 KiB |  18054 KiB |   1291 KiB |
|       from large pool |  16640 KiB |  16640 KiB |  16640 KiB |      0 KiB |
|       from small pool |    122 KiB |   1184 KiB |   1414 KiB |   1291 KiB |
|---------------------------------------------------------------------------|
| Active memory         |  16762 KiB |  16882 KiB |  18054 KiB |   1291 KiB |
|       from large pool |  16640 KiB |  16640 KiB |  16640 KiB |      0 KiB |
|       from small pool |    122 KiB |   1184 KiB |   1414 KiB |   1291 KiB |
|---------------------------------------------------------------------------|
| Requested memory      |  16745 KiB |  16864 KiB |  18027 KiB |   1281 KiB |
|       from large pool |  16640 KiB |  16640 KiB |  16640 KiB |      0 KiB |
|       from small pool |    105 KiB |   1169 KiB |   1387 KiB |   1281 KiB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |  22528 KiB |  22528 KiB |  22528 KiB |      0 B   |
|       from large pool |  20480 KiB |  20480 KiB |  20480 KiB |      0 B   |
|       from small pool |   2048 KiB |   2048 KiB |   2048 KiB |      0 B   |
|---------------------------------------------------------------------------|

after attack

After Attack:
 |===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    836 MiB |    837 MiB |  11708 MiB |  10871 MiB |
|       from large pool |     16 MiB |     16 MiB |     16 MiB |      0 MiB |
|       from small pool |    820 MiB |    821 MiB |  11692 MiB |  10871 MiB |
|---------------------------------------------------------------------------|
| Active memory         |    836 MiB |    837 MiB |  11708 MiB |  10871 MiB |
|       from large pool |     16 MiB |     16 MiB |     16 MiB |      0 MiB |
|       from small pool |    820 MiB |    821 MiB |  11692 MiB |  10871 MiB |
|---------------------------------------------------------------------------|
| Requested memory      |    779 MiB |    780 MiB |  11215 MiB |  10436 MiB |
|       from large pool |     16 MiB |     16 MiB |     16 MiB |      0 MiB |
|       from small pool |    763 MiB |    763 MiB |  11199 MiB |  10436 MiB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    842 MiB |    842 MiB |    842 MiB |      0 B   |
|       from large pool |     20 MiB |     20 MiB |     20 MiB |      0 B   |
|       from small pool |    822 MiB |    822 MiB |    822 MiB |      0 B   |
|---------------------------------------------------------------------------|

GPU status with the patch

before attack

 |===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  16762 KiB |  16882 KiB |  18054 KiB |   1291 KiB |
|       from large pool |  16640 KiB |  16640 KiB |  16640 KiB |      0 KiB |
|       from small pool |    122 KiB |   1184 KiB |   1414 KiB |   1291 KiB |
|---------------------------------------------------------------------------|
| Active memory         |  16762 KiB |  16882 KiB |  18054 KiB |   1291 KiB |
|       from large pool |  16640 KiB |  16640 KiB |  16640 KiB |      0 KiB |
|       from small pool |    122 KiB |   1184 KiB |   1414 KiB |   1291 KiB |
|---------------------------------------------------------------------------|
| Requested memory      |  16745 KiB |  16864 KiB |  18027 KiB |   1281 KiB |
|       from large pool |  16640 KiB |  16640 KiB |  16640 KiB |      0 KiB |
|       from small pool |    105 KiB |   1169 KiB |   1387 KiB |   1281 KiB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |  22528 KiB |  22528 KiB |  22528 KiB |      0 B   |
|       from large pool |  20480 KiB |  20480 KiB |  20480 KiB |      0 B   |
|       from small pool |   2048 KiB |   2048 KiB |   2048 KiB |      0 B   |

after attack

After Attack:
 |===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  16912 KiB |  18011 KiB |  11690 MiB |  11673 MiB |
|       from large pool |  16640 KiB |  16640 KiB |     16 MiB |      0 MiB |
|       from small pool |    272 KiB |   1371 KiB |  11673 MiB |  11673 MiB |
|---------------------------------------------------------------------------|
| Active memory         |  16912 KiB |  18011 KiB |  11690 MiB |  11673 MiB |
|       from large pool |  16640 KiB |  16640 KiB |     16 MiB |      0 MiB |
|       from small pool |    272 KiB |   1371 KiB |  11673 MiB |  11673 MiB |
|---------------------------------------------------------------------------|
| Requested memory      |  16885 KiB |  17979 KiB |  11201 MiB |  11184 MiB |
|       from large pool |  16640 KiB |  16640 KiB |     16 MiB |      0 MiB |
|       from small pool |    245 KiB |   1339 KiB |  11185 MiB |  11184 MiB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |  22528 KiB |  22528 KiB |  22528 KiB |      0 B   |
|       from large pool |  20480 KiB |  20480 KiB |  20480 KiB |      0 B   |
|       from small pool |   2048 KiB |   2048 KiB |   2048 KiB |      0 B   |

@lokeshn011101 @iamyifan

Your contributions to this issue are enormous, so I would like to make the PR (#136 ) co-authored with you if it is not inconvenient for you.

@Koukyosyumei Thank you so much for finding and raising a fix for this!

And yes, it would be really great if you can add me as a co-author! Would love to contribute more to this repo in the future as well. My email to mention - lokesh19055@cse.ssn.edu.in

I think adding Co-authored-by: lokeshn011101 <lokesh19055@cse.ssn.edu.in> in the text box before squash-merging should add me and @iamyifan (with his email) as the co-authors.

Thanks again!

@lokeshn011101 @iamyifan

Your contributions to this issue are enormous, so I would like to make the PR (#136 ) co-authored with you if it is not inconvenient for you.

Appreciate your working : ) I literally did nothing but just found out this bug. You can put my GitHub ID or email on whatever you like. Thanks a lot.

Koukyosyumei / AIJack

Running out of GPU resource #133