3dgeo-heidelberg / py4dgeo

py4dgeo - A Python library for change analysis in 4D point clouds
https://py4dgeo.readthedocs.io
MIT License
76 stars 12 forks source link

Long running time of M3C2 on entire point cloud compared to CloudCompare #128

Open chrise96 opened 2 years ago

chrise96 commented 2 years ago

With the help of this git issue I'm able to run M3C2 algorithm with (I think) the same params used in CloudCompare. However, the M3C2 algorithm takes roughly 100x times longer than the implementation of CloudCompare.

Cloudcompare M3C2: ~2 seconds py4dgeo M3C2: 271 seconds

I put all the files to recreate the experiment here.

Is the time difference caused by a param that I forgot to configure in the py4dgeo implementation? Here is the config file (default settings exported from CloudCompare): m3c2_params.txt

dokempf commented 2 years ago

Thanks for providing all the configuration and data. I will reproduce and investigate this next week.

dokempf commented 2 years ago

Hey @chrise96 Thanks again for providing test data and configuration, this has been really helpful. I found a few things that went in favor of CloudCompare in your comparison - some can be fixed, some can be documented and some will need future work in py4dgeo (remember we are in early dev):

Here is a modified version of your notebook. It does the same thing, only that it splits py4dgeos application of M3C2 into a few substeps: Searchtree construction, Normal Calculation, Distance Calculation. All of these have always been performed, but lazily evaluated during run().

Can you run again on your end and see how performance compares?

chrise96 commented 2 years ago

Thank you for the very detailed update!

I see now indeed that the config .txt file I provided contains SubsampleEnabled=true, this must be false... I updated the branch with this change. It takes now 19.5 seconds in CloudCompare.

I didn't know about this UseSinglePass4Depth option (In the advanced tab in M3C2 CloudCompare "Do not use multiple pass for depth").

Here a complete screenshot of the modified notebook run (dividing radii by 2 really speeds it up):

Schermafbeelding 2022-02-14 om 14 40 02
dokempf commented 2 years ago

I already feared it would not be as easy as the downsampling setting :disappointed:.

I am assuming you run this on Windows - correct? I made some tests between Linux and Windows on the same machine (dual boot, no virtualization) and found the results to be quite surprising:

Setup py4dgeo Normals CC Normals py4dgeo Distances CC Distances
Windows 6 Threads 13s 6s 127s 30s
Windows 1 Thread 28.4s -- 275s --
Windows 6 Threads (Blocking) 10s -- 97s --
Linux 6 Threads 2.7s -- 34s --
Linux 1 Thread 14s -- 201s --

I conclude that we have a toolchain issue on Windows that introduces a significant performance penalty. There is a multithreading related aspect to it (Linux scales roughly optimal, Windows not at all), but sequential performance is also clearly affected. The Blocking variant in above table lets OpenMPs dynamic scheduler work on chunks of 128 corepoints. My next experiments will be to vary the Windows toolchain to get a better understanding of where the problem might be.

chrise96 commented 2 years ago

Oke, I run on macOS.

chrise96 commented 2 years ago

The M3C2 distance results in py4dgeo are very different compared to the CloudCompare results. Points in some static objects, for example a street sign in the provided point clouds, do not come close to the 0 value for the M3C2 distance. How do you choose the best configuration params for py4dgeo?