Open jackjansen opened 2 years ago
The problem may be that I "invert" the affine transform (the extrinsic
matrix) by transposing it. That would work for a normal 3x3 matrix in our case (because we know all the matrices are size and shape preserving) but of course it doesn't work for an affine transform, I have to use another vector as the fourth column (and clear the fourth row).
Here is an explanation: http://negativeprobability.blogspot.com/2011/11/affine-transformations-and-their.html
And I found https://stackoverflow.com/questions/2624422/efficient-4x4-matrix-inverse-affine-transform which says a similar thing. Also, looking at the source code for open3d PointCloud.create_from_depth_image
it seems to be that they're indeed doing this, but they have the eigen3d affine transform which has an invert. I guess I'll have to create that by hand.
Automatic coarse calibration based on Aruco markers is working. It's actually working so well that I have merged back into master
, at a1b2855, so this can now be considered production-ready.
Multi-marker coarse alignment, for camera positions where not all cameras can see the (0, 0, 0)
origin aruco marker so we use auxiliary markers, is also working good enough that I've merged it into master
, at b923190
So, back to fine calibration or actually first to the analysis of the current calibration.
I'm working with the offline-boxes
capture, because that shows the problem most clearly. I've created plots for all analysers that we have (one2all
, one2all-filtered
, one2all-reverse
, one2all-reverse-filtered
and pairwise
). All the graphs are in cwipc_test
.
The most informative graph for this dataset (but note the italics) is the pairwise graph:
The "camera numbers" here are the or
of the two contributing cameras. As a human we can easily see that camera 1 is opposite camera 4 and camera 2 is opposite camera 8. And I also know this is correct, because I know that the cameras are placed in the order 1-2-4-8 clockwise. We can probably detect this algorithmically if we want.
As a human we can estimate the correspondences of the pairs:
We can also see that the correspondence errors that the current "algorithm" (but really "quick hack" is a better term) has come up with are wildly wrong. Not surprising: the current "algorithm" works by finding the peak in the histogram and then move right until we get below a value that is less than 0.5*peak
.
I will experiment with mean
and stddev
to see if I can get some more decent numbers. Then, if they work for this dataset, they should also be tried for the captured Jack dataset.
Mean and stddev by themselves are not going to work. Here are the results for the graph above:
camera 3: mean=0.02291982490598397, std=0.024645493357004753, peak=0.002320735082210554, corr=0.007816463821634046
camera 5: mean=0.04401491186517467, std=0.028213745113280272, peak=0.012704797435346417, corr=0.058097526853423245
camera 9: mean=0.016755202242697633, std=0.026204388020447566, peak=0.0018051266662951993, corr=0.002804512231245586
camera 6: mean=0.015378887548555181, std=0.023343767740899458, peak=0.001824420385722404, corr=0.003573071088354493
camera 10: mean=0.048489777693837444, std=0.028316377093057312, peak=0.0021639993720285154, corr=0.08227788490063927
camera 12: mean=0.0341007288961789, std=0.023926849570313418, peak=0.01910221049639875, corr=0.04007440525818792
The mean for the "good pairs" (6 and 9) is far too high.
And that is pretty logical, when you think about it: the long tails have an inordinate effect on the mean.
Next thing to try: first compute mean and stddev. then throw away all distances that are larger than (wild guess) mean+stddev
, or maybe 2*mean
. Then compute mean and stddev on the points that remain.
Edit: another thing to try is to keep only the points in the range
[mean-stddev, mean+stddev]
.
The idea is that for the "bad pairs" this will throw away less of the points, but for the "good pairs" it will throw away more points.
Tried that. Also tried running the filtering multiple times, to see how the mean and stddev behave. Used the bracketing filter [mean-stddev, mean+stddev]
on two premises:
std > mean
so we don't throw away any "good points", while for the "bad pairs" std < mean
so we throw away points on both sides, so running the filter successively should not change mean
too much (while for the "good pairs" it will lower mean).Here are the results:
camera 3: peak=0.002320735082210554, corr=0.007816463821634046
camera 3: 0 filters: mean=0.02291982490598397, std=0.024645493357004753, nPoint=81262
camera 3: 1 filters: mean=0.012951448749072387, std=0.010969866908687875, nPoint=66992
camera 3: 2 filters: mean=0.009752436971104068, std=0.005658494330190411, nPoint=52946
camera 3: 3 filters: mean=0.009181153840633022, std=0.0032857770409335132, nPoint=32361
camera 5: peak=0.012704797435346417, corr=0.058097526853423245
camera 5: 0 filters: mean=0.04401491186517467, std=0.028213745113280272, nPoint=44935
camera 5: 1 filters: mean=0.04137415890726912, std=0.016101593626257675, nPoint=26525
camera 5: 2 filters: mean=0.04045734373520693, std=0.009445048042243229, nPoint=15533
camera 5: 3 filters: mean=0.040051235318581284, std=0.005529610745340443, nPoint=8873
camera 9: peak=0.0018051266662951993, corr=0.002804512231245586
camera 9: 0 filters: mean=0.016755202242697633, std=0.026204388020447566, nPoint=81399
camera 9: 1 filters: mean=0.006063694689290081, std=0.008691325161748644, nPoint=67977
camera 9: 2 filters: mean=0.003177805834766034, std=0.0024780598363417766, nPoint=59915
camera 9: 3 filters: mean=0.002515016672254931, std=0.0011297240188485047, nPoint=52181
camera 6: peak=0.001824420385722404, corr=0.003573071088354493
camera 6: 0 filters: mean=0.015378887548555181, std=0.023343767740899458, nPoint=93198
camera 6: 1 filters: mean=0.006850735824829795, std=0.00853339253154322, nPoint=79885
camera 6: 2 filters: mean=0.003812802111714147, std=0.0027496213518108494, nPoint=69301
camera 6: 3 filters: mean=0.003089897111975529, std=0.00140128710337592, nPoint=56900
camera 10: peak=0.0021639993720285154, corr=0.08227788490063927
camera 10: 0 filters: mean=0.048489777693837444, std=0.028316377093057312, nPoint=43545
camera 10: 1 filters: mean=0.04792050129221857, std=0.016245794338419692, nPoint=25355
camera 10: 2 filters: mean=0.04781840036709909, std=0.00938153612816915, nPoint=14709
camera 10: 3 filters: mean=0.04774644484942719, std=0.005421145457380683, nPoint=8482
camera 12: peak=0.01910221049639875, corr=0.04007440525818792
camera 12: 0 filters: mean=0.0341007288961789, std=0.023926849570313418, nPoint=73397
camera 12: 1 filters: mean=0.028625500662925147, std=0.011632248990853532, nPoint=52026
camera 12: 2 filters: mean=0.027763304899151773, std=0.0064978872262983645, nPoint=33259
camera 12: 3 filters: mean=0.027708708994364492, std=0.003789831220465292, nPoint=19513
This seems to be going in the right direction: the "bad pairs" (opposing cameras) have their mean
staying put at high values. The "good pairs" have their mean going down significantly, towards what appears to be a correct value. The "not so good pairs" (3 and 12) also seem to end up at decent values.
Partial success. That is to say: this works pretty well for camera-pair measurements on the boxes:
These are pretty beleivabe numbers!
Unfortunately it does not work well at all for the one-to-all-others measurements:
I think the problem is that this algorithm throws away any points that it can't match (which, in case of this dataset, includes the mismatched "edges that are sticking out".
Let's first check how the pair-wise measurements work on the other datasets.
That didn't work very well. I've now made the pair-wise measurement symmetric but this needs work: at the moment it is far too expensive.
And it is also too aggressive in trying to put as many points into the overlapping set as it can. Can be seen with the loot datasets.
We should somehow re-enable the max_distance
topping of the kdtree distance finder (I disabled it for now) but still count the points that go over it.
For future reference: when we get back to finding the "best" algorithm to align the pointclouds we should look at point-to-plane ICP with a robust kernel. From https://www.open3d.org/docs/latest/tutorial/pipelines/robust_kernels.html#Vanilla-ICP-vs-Robust-ICP I get the impression that the robust kernel is a way to deal with noise. The referenced page uses generated noise, but of course our sensors are also noisy...
Copied from Slack:
Folks, in your research of registration algorithms, have you come across any that allow "pinning" of one of the variables? I.e. ask the algorithm to find the optimal transformation but specifying, for example, that the y-translation must be zero? Because if that exists then we could do fine calibration in two steps:
Actually, thinking a bit more, we not only want to pin the y-translation to 0 but also the x-rotation and z-rotation. So the only free variables should be y-rotation, x-translation and z-translation.
We might want to change the loss function as to only consider a 2D error - which would effectively mean that in every iteration, the algorithm would be forced to change only the parameters that are considered in the loss function, because the others would have no impact in the error. There might be other ways of writingit as an optimization problem, though; we should check it out
The fixer managed to make them all upright, but the's where the good news stops. They're still off by quite a bit, and moreover (and worse): the analysis algorithm produces way too optimistic values.
Inspecting this issue again after half a year of inactivity, but a lot of actually using the current registration setup in production. The comment quoted above (from 11-Dec-2023) seems to be the main thing that is bothering us most at the moment: Often, when running cwipc_register --fine
, the script will report that it has managed to align all point clouds to within a few millimeters. But actual inspection of the captured point cloud clearly shows that some areas (and often important areas like the head) are off by 5-10cm.
The "solution" we are currently using is to simply try again with the subject human in a different pose, and hoping for the best.
Fixing this, or at least showing the operator something (for example a graph of the p2p distance distribution) from which they can tell this has happened is at the moment of paramount importance.
Once we have addressed the issue above (the bad numbers coming out of our analysis) my feeling is that we should move to a "mixed" upper strategy. Right now our upper strategy is either pairwise or one-to-all-others, but maybe we should first do one round of pairwise, and after that a round of one-to-all-others.
If we do the pairwise round in the right order (i.e. most overlapping pair first) I think that should get us out of the "local minimum problem" with the boxes.
The right order should be easy to compute: for each pair, compute the upper bound of the percentage/fraction of points that could possibly overlap. High-to-low is the right order.
We need to fix the alignment once and for all.
The original issue is https://gitlab.com/VRTogether_EU/cwipc/cwipc_util/-/issues/41but this link is dead now.There is an old repo for experiments at https://github.com/cwi-dis/pointclouds-alignment
New experiment data is at https://github.com/cwi-dis/cwipc_test/tree/master/pointcloud-registration-test
Current plan of attack:
cameraconfig.json
. Also record there the current misalignment, because it is a good value for voxelization (later, during production).