Function ME.utils.sparse_quantize causes memory leak

lilanxiao commented 2 years ago

Describe the bug The ME.utils.sparse_quantize function seems to cause a slow memory leak. The RAM (not GPU RAM) usage increases gradually and ends up with OOM in a long training schedule.

When the following code runs, the RAM usage increases slowly. This code prints the percentage of RAM usage. You need to wait for a while to see the difference (15~20 minutes should be enough). Note that this code monitor the RAM usage of the entire system, you should not manipulate your computer when it runs.

Here I use Dataloader to accelerate the process. But you can reproduce this behavior without the Dataloader (i.g. iterate through the dataset directly).

To Reproduce

import MinkowskiEngine as ME
import numpy as np
import psutil
import os
from torch.utils.data import DataLoader, Dataset

def main():
    ITERS = 100000
    BS = 8
    ds = FooDataset(size=ITERS*BS, num_pts=100000)
    dl = DataLoader(ds, batch_size=BS, num_workers=8, shuffle=True)
    for i, data in enumerate(dl):
        if i % 100 == 0:
            print("{:d}, memory used:".format(i), psutil.virtual_memory()[2])

class FooDataset(Dataset):
    def __init__(self, size, num_pts) -> None:
        super().__init__()
        self.num_pts = int(num_pts)
        self.size = int(size)

    def __len__(self):
        return self.size

    def __getitem__(self, index):
        coords = np.random.rand(self.num_pts, 3) * np.random.randint(1, 10)
        feats = np.random.rand(self.num_pts, 3) * np.random.randint(1, 10)
        # NOTE: the RAM usage increases slower if data has smaller variance
        # coords = np.random.rand(self.num_pts, 3)
        # feats = np.random.rand(self.num_pts, 3)
        vox, feats = ME.utils.sparse_quantize(coords, feats, quantization_size=0.02)
        vox = vox.numpy()
        feats = feats + 0.1
        return vox[:10, ...], feats[:10, ...]

if __name__ == "__main__":
    main()

On my machine with 32GB RAM, the RAM usage linearly increases when this code runs. The RAM usage is related to the variance of data. A larger variance brings a faster usage increase. For instance, you see a slower increase with the two commented lines. If I use real point clouds instead of random numbers, the code can reach OOM if runs for an extremely long time.

Expected behavior The RAM usage is steady.

Desktop (please complete the following information):

OS: Ubuntu 18.04
Python version: 3.7.10
Pytorch version: 1.8.1
CUDA version: 10.2
NVIDIA Driver version: 470.63
Minkowski Engine version: 0.5.4
Output of the following command. (If you installed the latest MinkowskiEngine, paste the output of python -c "import MinkowskiEngine as ME; ME.print_diagnostics()". Otherwise, paste the output of the following command.) ==========System========== Linux-5.4.0-87-generic-x86_64-with-debian-buster-sid DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS" 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0] ==========Pytorch========== 1.8.1 torch.cuda.is_available(): True ==========NVIDIA-SMI========== /usr/bin/nvidia-smi Driver Version 470.63.01 CUDA Version 11.4 VBIOS Version 90.04.76.40.91 Image Version G001.0000.02.04 GSP Firmware Version N/A ==========NVCC========== /usr/local/cuda-10.2/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 ==========CC========== /usr/bin/c++ c++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ==========MinkowskiEngine========== 0.5.4 MinkowskiEngine compiled with CUDA Support: True NVCC version MinkowskiEngine is compiled: 10020 CUDART version MinkowskiEngine is compiled: 10020

Additional context The memory leak comes probably from the C/C++ extension side, as I cannot trace it with tracemalloc from the Python side.

chrischoy commented 2 years ago

It's unlikely to think that there's memory leak when over 10k iterations, the memory fluctuates from 21.2-21.7.

When there's a memory leak, the memory consumption CONSTANTLY increases. Can you share your log?

# sparse_quantize
0, memory used: 21.2
100, memory used: 21.2
200, memory used: 21.5
300, memory used: 21.3
400, memory used: 21.4
500, memory used: 21.4
600, memory used: 21.5
700, memory used: 21.5
800, memory used: 21.5
900, memory used: 21.5
1000, memory used: 21.4
1100, memory used: 21.4
1200, memory used: 21.6
1300, memory used: 21.7
1400, memory used: 21.4
1500, memory used: 21.5
1600, memory used: 21.5
1700, memory used: 21.6
1800, memory used: 21.7
1900, memory used: 21.6
2000, memory used: 21.7
...
9800, memory used: 21.8
9900, memory used: 21.6
10000, memory used: 21.7
10100, memory used: 21.6
10200, memory used: 21.6

lilanxiao commented 2 years ago

@chrischoy thank you for your reply!

That is really strange because I see different behavior on my machine. As you can see, the RAM usage indeed increases CONSTANTLY. Maybe you can try more iterations?

0, memory used: 12.6
100, memory used: 12.7
200, memory used: 12.7
300, memory used: 12.7
400, memory used: 12.8
500, memory used: 12.8
600, memory used: 12.7
700, memory used: 12.8
800, memory used: 12.9
900, memory used: 12.9
1000, memory used: 12.6
1100, memory used: 12.6
1200, memory used: 12.7
1300, memory used: 12.6
1400, memory used: 12.7
1500, memory used: 12.6
1600, memory used: 12.8
1700, memory used: 12.9
1800, memory used: 12.9
1900, memory used: 13.0
.........
16900, memory used: 13.9
17000, memory used: 13.8
17100, memory used: 13.8
17200, memory used: 13.9
17300, memory used: 13.8
17400, memory used: 13.9
17500, memory used: 13.8
17600, memory used: 13.8
17700, memory used: 13.9
17800, memory used: 14.0
17900, memory used: 13.9
18000, memory used: 13.9
18100, memory used: 14.0
18200, memory used: 13.9
18300, memory used: 13.9
18400, memory used: 13.9
18500, memory used: 13.9
18600, memory used: 14.0
18700, memory used: 14.0
18800, memory used: 13.9
18900, memory used: 14.0
19000, memory used: 13.9
19100, memory used: 13.9
19200, memory used: 14.0
19300, memory used: 14.0
19400, memory used: 14.2
19500, memory used: 14.2
19600, memory used: 14.2
19700, memory used: 14.3
19800, memory used: 14.3
19900, memory used: 14.1
20000, memory used: 14.2
20100, memory used: 14.1
20200, memory used: 14.2
20300, memory used: 14.1
....

chrischoy commented 2 years ago

I've never seen a memory leak that doesn't leak until 10k iterations but starts to leak after 10k iterations. This looks more like OS level memory management and garbage collection rather than a memory leak, but I'll do more analysis.

lilanxiao commented 2 years ago

ok, perhaps it's not proper to describe the problem as a memory leak. Maybe I can provide more information. The demo code doesn't allocate more and more RAM if I remove the sparse_quantize. To create a comparable baseline, I replace

vox, feats = ME.utils.sparse_quantize(coords, feats, quantization_size=0.02)
vox = vox.numpy()

with NumPy functions, which do similar things but are not as efficient as sparse_quantize.

vox = np.floor(coords/0.02).astype(np.int32)
vox, index = np.unique(vox, axis=0, return_index=True)
feats = feats[index]

The RAM usage looks like this: Figure_1

I run the two versions on the same machine. So, the sparse_quantize indeed seems to have some strange behavior. With sparse_quantize, the RAM usage is increased by 1.3 GB after 1e5 iterations, i.g. 13 KB per iteration.

With NumPy, the RAM usage doesn't increase. (I don't know why it drops BTW, maybe due to some system activities in the background?)

Update: I ran even more iterations and can confirm the RAM usage increases almost linearly: Figure_2

NVIDIA / MinkowskiEngine

Function ME.utils.sparse_quantize causes memory leak #411