NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.43k stars 360 forks source link

Function ME.utils.sparse_quantize causes memory leak #411

Open lilanxiao opened 2 years ago

lilanxiao commented 2 years ago

Describe the bug The ME.utils.sparse_quantize function seems to cause a slow memory leak. The RAM (not GPU RAM) usage increases gradually and ends up with OOM in a long training schedule.

When the following code runs, the RAM usage increases slowly. This code prints the percentage of RAM usage. You need to wait for a while to see the difference (15~20 minutes should be enough). Note that this code monitor the RAM usage of the entire system, you should not manipulate your computer when it runs.

Here I use Dataloader to accelerate the process. But you can reproduce this behavior without the Dataloader (i.g. iterate through the dataset directly).


To Reproduce

import MinkowskiEngine as ME
import numpy as np
import psutil
import os
from torch.utils.data import DataLoader, Dataset

def main():
    ITERS = 100000
    BS = 8
    ds = FooDataset(size=ITERS*BS, num_pts=100000)
    dl = DataLoader(ds, batch_size=BS, num_workers=8, shuffle=True)
    for i, data in enumerate(dl):
        if i % 100 == 0:
            print("{:d}, memory used:".format(i), psutil.virtual_memory()[2])

class FooDataset(Dataset):
    def __init__(self, size, num_pts) -> None:
        super().__init__()
        self.num_pts = int(num_pts)
        self.size = int(size)

    def __len__(self):
        return self.size

    def __getitem__(self, index):
        coords = np.random.rand(self.num_pts, 3) * np.random.randint(1, 10)
        feats = np.random.rand(self.num_pts, 3) * np.random.randint(1, 10)
        # NOTE: the RAM usage increases slower if data has smaller variance
        # coords = np.random.rand(self.num_pts, 3)
        # feats = np.random.rand(self.num_pts, 3)
        vox, feats = ME.utils.sparse_quantize(coords, feats, quantization_size=0.02)
        vox = vox.numpy()
        feats = feats + 0.1
        return vox[:10, ...], feats[:10, ...]

if __name__ == "__main__":
    main()

On my machine with 32GB RAM, the RAM usage linearly increases when this code runs. The RAM usage is related to the variance of data. A larger variance brings a faster usage increase. For instance, you see a slower increase with the two commented lines. If I use real point clouds instead of random numbers, the code can reach OOM if runs for an extremely long time.


Expected behavior The RAM usage is steady.


Desktop (please complete the following information):


Additional context The memory leak comes probably from the C/C++ extension side, as I cannot trace it with tracemalloc from the Python side.

chrischoy commented 2 years ago

It's unlikely to think that there's memory leak when over 10k iterations, the memory fluctuates from 21.2-21.7.

When there's a memory leak, the memory consumption CONSTANTLY increases. Can you share your log?

# sparse_quantize
0, memory used: 21.2
100, memory used: 21.2
200, memory used: 21.5
300, memory used: 21.3
400, memory used: 21.4
500, memory used: 21.4
600, memory used: 21.5
700, memory used: 21.5
800, memory used: 21.5
900, memory used: 21.5
1000, memory used: 21.4
1100, memory used: 21.4
1200, memory used: 21.6
1300, memory used: 21.7
1400, memory used: 21.4
1500, memory used: 21.5
1600, memory used: 21.5
1700, memory used: 21.6
1800, memory used: 21.7
1900, memory used: 21.6
2000, memory used: 21.7
...
9800, memory used: 21.8
9900, memory used: 21.6
10000, memory used: 21.7
10100, memory used: 21.6
10200, memory used: 21.6
lilanxiao commented 2 years ago

@chrischoy thank you for your reply!

That is really strange because I see different behavior on my machine. As you can see, the RAM usage indeed increases CONSTANTLY. Maybe you can try more iterations?

0, memory used: 12.6
100, memory used: 12.7
200, memory used: 12.7
300, memory used: 12.7
400, memory used: 12.8
500, memory used: 12.8
600, memory used: 12.7
700, memory used: 12.8
800, memory used: 12.9
900, memory used: 12.9
1000, memory used: 12.6
1100, memory used: 12.6
1200, memory used: 12.7
1300, memory used: 12.6
1400, memory used: 12.7
1500, memory used: 12.6
1600, memory used: 12.8
1700, memory used: 12.9
1800, memory used: 12.9
1900, memory used: 13.0
.........
16900, memory used: 13.9
17000, memory used: 13.8
17100, memory used: 13.8
17200, memory used: 13.9
17300, memory used: 13.8
17400, memory used: 13.9
17500, memory used: 13.8
17600, memory used: 13.8
17700, memory used: 13.9
17800, memory used: 14.0
17900, memory used: 13.9
18000, memory used: 13.9
18100, memory used: 14.0
18200, memory used: 13.9
18300, memory used: 13.9
18400, memory used: 13.9
18500, memory used: 13.9
18600, memory used: 14.0
18700, memory used: 14.0
18800, memory used: 13.9
18900, memory used: 14.0
19000, memory used: 13.9
19100, memory used: 13.9
19200, memory used: 14.0
19300, memory used: 14.0
19400, memory used: 14.2
19500, memory used: 14.2
19600, memory used: 14.2
19700, memory used: 14.3
19800, memory used: 14.3
19900, memory used: 14.1
20000, memory used: 14.2
20100, memory used: 14.1
20200, memory used: 14.2
20300, memory used: 14.1
....
chrischoy commented 2 years ago

I've never seen a memory leak that doesn't leak until 10k iterations but starts to leak after 10k iterations. This looks more like OS level memory management and garbage collection rather than a memory leak, but I'll do more analysis.

lilanxiao commented 2 years ago

ok, perhaps it's not proper to describe the problem as a memory leak. Maybe I can provide more information. The demo code doesn't allocate more and more RAM if I remove the sparse_quantize. To create a comparable baseline, I replace

vox, feats = ME.utils.sparse_quantize(coords, feats, quantization_size=0.02)
vox = vox.numpy()

with NumPy functions, which do similar things but are not as efficient as sparse_quantize.

vox = np.floor(coords/0.02).astype(np.int32)
vox, index = np.unique(vox, axis=0, return_index=True)
feats = feats[index]

The RAM usage looks like this: Figure_1

I run the two versions on the same machine. So, the sparse_quantize indeed seems to have some strange behavior. With sparse_quantize, the RAM usage is increased by 1.3 GB after 1e5 iterations, i.g. 13 KB per iteration.

With NumPy, the RAM usage doesn't increase. (I don't know why it drops BTW, maybe due to some system activities in the background?)


Update: I ran even more iterations and can confirm the RAM usage increases almost linearly: Figure_2