Multi-processing ComputeC2C distance?

xiangtaoxu commented 2 years ago

Hi,

I am currently set maxSearchDist in lots of C2C distance computations, which will not be compatible with multi-threading. I was wondering whether I can improve performance by multi-processing in python? I have tried some test using python's standard multiprocessing library but the total CPU usage is never higher than 100%.

Any suggestions on whether and how I can combine multi-processing with C2C distance?

prascle commented 2 years ago

Hello,

In the CloudCompare C++ code, the comments indicate that setting maxSearchDist to a value >0 invalidates in-process parallelism (single thread). A quick investigation of the C++ code did not allow me to understand everything about the handling of parallelism here, and how single thread is forced. A priori, in this case, the parallelism uses QtConcurrent and I think there is a global management of the resource, such as a single semaphore.

I guess you tried to handle in parallel several independent problems of distance calculation between two clouds with maxSearchDist > 0, each problem using one thread. If I understood correctly, you did not succeed in getting a parallel execution of these problems: if my interpretation above is correct, this behavior is normal.

The only way I can imagine to run several independent problems in parallel would be, in this case, to run as many separate processes as there are problems, so that each process has its separate memory space. I think this is feasible in Python, but it will consume more memory.

Paul

xiangtaoxu commented 2 years ago

Thanks Paul. I have been trying some basic embarrassing parallelization using multiprocessing in python.

The challenge I have is that the CC objects are not picklable so I can't pass them easily to each worker process. Currently, I can only get around with that through I/O, i.e., save the objects to disks and then read them again in each worker process.

See below for a simple example, where CheckConnection_IO calls multiple C2C distance computation methods. This method reduces ~50% of running time (100s to 50s) using 6 cores. If you know anyway to get rid of the IO part. I would appreciate very much!

Xiangtao

tic = time.time()
N_PROC = 6
mp_array = np.array_split(ConnComp_list,6)
# save the results
fnames = []
for i, CC_array in enumerate(mp_array):
    fname = temp_dir + f'CC_array_{i}.bin'
    cc.SaveEntities(CC_array.tolist(),fname)
    fnames.append(fname)

cluster_fname = temp_dir + f'Clusters.bin'
cc.SaveEntities(cluster_clouds,fname)

def CheckConnection_IO(cloud_fname,ref_fname):
    meshes, clouds = cc.importFile(cloud_fname)
    meshes, ref_clouds = cc.importFile(ref_fname)
    connection_array = []

    for cloud_idx, CC_cloud in enumerate(clouds):
        connected_ref_clouds = []
        for ref_idx, cloud_ref in enumerate(cluster_clouds):
            if CheckConnection(CC_cloud, cloud_ref,
                             maxSearchDist=SUBSAMPLE_RES*2.,
                                    ):
                connected_ref_clouds.append(ref_idx)

        connection_array.append(connected_ref_clouds)

    return connection_array

pool = mp.Pool(N_PROC)

input_list = [(fnames[i],cluster_fname) for i in range(len(fnames))]

result = pool.starmap(CheckConnection_IO,input_list)

pool.close()
pool.join()
toc = time.time()
print(toc - tic)

xiangtaoxu commented 2 years ago

Update:

I get around the IO by using global variables, which gets me the expected performance boost but might not be ideal....

prascle commented 2 years ago

Hello Xiangtao,

I had prepared an answer, but I forgot to send it... I was focused on multithreading and did not know well the possibilities of the multiprocessing library, which answers well to my previous remark !

If you managed to avoid IO it seems to me excellent and I don't see how I could help you to improve. I would be curious to know the solution you have chosen (at the level of global variables) and I would like to add an example in the documentation of CloudComPy, because it can be useful in all situations of algorithms which are not parallel.

Paul

xiangtaoxu commented 2 years ago

Thanks, Paul. I am still testing the global variable solution, which seems to create some deadlocks... Will post some example codes if I figured it out.

CloudCompare / CloudComPy

Multi-processing ComputeC2C distance? #45