Open chrise96 opened 2 years ago
Thanks for providing all the configuration and data. I will reproduce and investigate this next week.
Hey @chrise96 Thanks again for providing test data and configuration, this has been really helpful. I found a few things that went in favor of CloudCompare in your comparison - some can be fixed, some can be documented and some will need future work in py4dgeo
(remember we are in early dev):
CloudCompareM3C2
class in #129, but you can also just divide your radii by 2.SubsampleEnabled=true
which means that you are not using the input cloud as the set of core points, but a downsampled version of it that contains only a fraction of points (in my testing with CC and your data, only 1%). You are explicitly telling py4dgeo
to use the entire point cloud though with corepoints = epoch1.cloud
. The M3C2 algorithm is linear in the number of corepoints which makes this one particularly important. Can you doublecheck the number of corepoints from the CC logs?UseSinglePass4Depth
that if set to false
enables a performance optimization that py4dgeo
has not (yet) implemented (see #88 ). You might want to set that to true
to better compare against py4dgeo
s current state.Here is a modified version of your notebook. It does the same thing, only that it splits py4dgeo
s application of M3C2 into a few substeps: Searchtree construction, Normal Calculation, Distance Calculation. All of these have always been performed, but lazily evaluated during run()
.
Can you run again on your end and see how performance compares?
Thank you for the very detailed update!
I see now indeed that the config .txt file I provided contains SubsampleEnabled=true
, this must be false... I updated the branch with this change. It takes now 19.5 seconds in CloudCompare.
I didn't know about this UseSinglePass4Depth
option (In the advanced tab in M3C2 CloudCompare "Do not use multiple pass for depth").
Here a complete screenshot of the modified notebook run (dividing radii by 2 really speeds it up):
I already feared it would not be as easy as the downsampling setting :disappointed:.
I am assuming you run this on Windows - correct? I made some tests between Linux and Windows on the same machine (dual boot, no virtualization) and found the results to be quite surprising:
Setup | py4dgeo Normals | CC Normals | py4dgeo Distances | CC Distances |
---|---|---|---|---|
Windows 6 Threads | 13s | 6s | 127s | 30s |
Windows 1 Thread | 28.4s | -- | 275s | -- |
Windows 6 Threads (Blocking) | 10s | -- | 97s | -- |
Linux 6 Threads | 2.7s | -- | 34s | -- |
Linux 1 Thread | 14s | -- | 201s | -- |
I conclude that we have a toolchain issue on Windows that introduces a significant performance penalty. There is a multithreading related aspect to it (Linux scales roughly optimal, Windows not at all), but sequential performance is also clearly affected. The Blocking
variant in above table lets OpenMPs dynamic scheduler work on chunks of 128 corepoints. My next experiments will be to vary the Windows toolchain to get a better understanding of where the problem might be.
Oke, I run on macOS.
The M3C2 distance results in py4dgeo are very different compared to the CloudCompare results. Points in some static objects, for example a street sign in the provided point clouds, do not come close to the 0 value for the M3C2 distance. How do you choose the best configuration params for py4dgeo?
With the help of this git issue I'm able to run M3C2 algorithm with (I think) the same params used in CloudCompare. However, the M3C2 algorithm takes roughly 100x times longer than the implementation of CloudCompare.
Cloudcompare M3C2: ~2 seconds py4dgeo M3C2: 271 seconds
I put all the files to recreate the experiment here.
Is the time difference caused by a param that I forgot to configure in the py4dgeo implementation? Here is the config file (default settings exported from CloudCompare): m3c2_params.txt