drprojects / point_geometric_features

Python wrapper around C++ utility to compute local geometric features of a point cloud
MIT License
56 stars 5 forks source link

Memory leak of pgeof #8

Closed gardiens closed 7 months ago

gardiens commented 8 months ago

Hello, as mentionned in this issue I think pgeof has a memory leaks too: If we take the demo.py script with a slightly faster knn :

from pgeof import pgeof
import numpy as np
import tracemalloc
tracemalloc.start()
num_iter=200
marqueur=0
snapshot=tracemalloc.take_snapshot()
import gc 
from scipy.spatial import cKDTree
def get_x(n_points=4000000):
    import torch
    x_min=[-15,46]
    y_min=[-45,30]
    z_min=[0,7]
    num_class=9
    x=torch.rand((n_points,3))
    x[:,0]=x[:,0]*(x_min[1]-x_min[0])+x_min[0]
    x[:,1]=x[:,1]*(y_min[1]-y_min[0])+y_min[0]
    x[:,2]=x[:,2]*(z_min[1]-z_min[0])+z_min[0]

    return x

def Query_CPU(
        xyz_query, xyz_search, K,r):

    kdtree=cKDTree(xyz_search)
    distances, neighbors = kdtree.query(xyz_query, k=K, distance_upper_bound=r,workers=-1)
    neighbors[distances==float('inf')]=-1
    return distances, neighbors 

for j in range(num_iter):
    # Generate a random synthetic point cloud
    num_points = int(1e5)
    xyz=get_x(num_points).numpy()

    # Manually generating random neighbors in CSR format
    nn_ptr = np.r_[0, np.random.randint(low=0, high=30, size=num_points).cumsum()]
    nn = np.random.randint(low=0, high=num_points, size=nn_ptr[-1])

    # Converting k-nearest neighbors to CSR format
    from sklearn.neighbors import NearestNeighbors
    k = 20
    kneigh = Query_CPU(xyz, xyz, k,20)
    nn_ptr = np.arange(num_points + 1) * k
    nn = kneigh[1].flatten()

    # Converting radius neighbors to CSR format
    # from sklearn.neighbors import NearestNeighbors
    # radius = 0.1
    # rneigh = NearestNeighbors(radius=radius).fit(xyz).radius_neighbors(xyz)
    # nn_ptr = np.r_[0, np.array([x.shape[0] for x in rneigh[1]]).cumsum()]
    # nn = np.concatenate(rneigh[1])

    # Make sure xyz are float32 and nn and nn_ptr are uint32
    xyz = xyz.astype('float32')
    nn_ptr = nn_ptr.astype('uint32')
    nn = nn.astype('uint32')

    # Make sure arrays are contiguous (C-order) and not Fortran-order
    xyz = np.ascontiguousarray(xyz)
    nn_ptr = np.ascontiguousarray(nn_ptr)
    nn = np.ascontiguousarray(nn)

    geof = pgeof(
    xyz, nn, nn_ptr, k_min=10, k_step=1, k_min_search=15,
        verbose=True)  
    marqueur+=1
    gc.collect()

    if marqueur>1e1:
        snapshot2=tracemalloc.take_snapshot()
        marqueur=0
        Top_stats=snapshot2.compare_to(snapshot,'lineno')
        print("TOP 10 differneces")
        for stat in Top_stats[:10]:
            print(stat)
        current,peak=tracemalloc.get_traced_memory()
        print(f"Current memory usage is {current/10**6}MB; Peak was {peak/10**6}MB")
        top_stats=snapshot2.statistics('traceback')
        stat=top_stats[0]
        print("%s memory blocks: %.1f in MiB: " % (stat.count,stat.size/1024**2))
        for line in stat.traceback.format():
            print(line)

the memory is slightly increasing each iteration and I tried to run the script with valgrind and I indeed go some memory definitely lsot

rjanvier commented 7 months ago

Hi I plan to propose a new version with some new features and based on pybind11 in the coming week(s?). That should fix this issue.

drprojects commented 7 months ago

Looking forward to @rjanvier's proposed update then !

rjanvier commented 7 months ago

Hi I decided to fix it because I'm not ready to push the "big" update. leak is confirmed and has same root than the leaks from parrallel-cut-pursuit. it's fixed in #9.

drprojects commented 7 months ago

I just reviewed and approved your PR, thanks once again for the great work @rjanvier 🙏