about real-time camera input

amiltonwong commented 10 years ago

Hi, Paul and all other contributors,

Thank you so much for giving so great efforts to implement the DTAM algorithm. And I think this is an important step for letting more computer vision communities access to the dense reconstruction domain. As I had initially installed and practiced the OpenDTAM code, I hope that you can answer me some questions: In your preliminary code, it uses the dataset (traj_30_seconds) to give some demos. How about using a live camera input? It’ll be a very practical scenario. And I am trying to write that interface. But I’m not so familiar with the architecture of your OpenDTAM, would you suggest me which part I should first work on and something worth noting?

Best regards, Milton

anuranbaka commented 10 years ago

Absolutely! Thank you so much, I'm actually being a bit swamped by the level of interest this has been generating over the last week.

A live camera would be really nice and I think it would not be too hard to do. The basic problem is that DTAM has a chicken-and-egg issue: The tracking system needs a depthmap in order to find the pose of the camera, but the pose needs to be known to build a depthmap. I dodged the issue by having the real poses on the synthetic data.

In the original paper PTAM is used to generate the initial two poses, which produces a reconstruction accurate up to scale (monocular reconstruction inherently cannot recover scale without an outside reference or a trick like depth from focus). So to use real camera input, we need some equivalent way to start up the system. After the two initial poses, the system should be stable in the sense that it should produce reconstructions accurate up to scale.

How to actually do this (1 and 2 are not implemented, so if you can help with that it would be great):

Calibrate a camera and undistort images before further processing.
Somehow find the relative pose between the first few frames ( this should be in the usual OpenCV convention: point_inpixels=cameraMatrix[R|T]_point_in_world_coordinates, with the appropriate homogeneous<--->euclidean transformations on input and output). My code automatically rearranges the matrices before use to work in inverse depth space, but that should all be hidden from the user.

Fill in R(3x3 double), T(3x1 double), cameraMatrix(3x3 double), and RGB pixel data and initialize a CostVolume with the first image.

CostVolume cv(  initialImage, 
               (FrameID)<some unique number>,
               <# of layers, 32 is good>, 
               <1/near distance>, 
               <1/far distance>, 
               initialR, 
               initialT, 
               cameraMatrix);

Use the appropriate R,T, RGB data to update the cost volume with at least one more frame (right now more than two additional frames is a bad plan because of occlusion, but I have a fix for that coming in the next few days).
```
cv.updateCost(image, R, T);
```

Attach an optimizer to the CostVolume (syntax for this will likely change eventually).

Optimizer optimizer(cv);//signature may change to empty
optimizer.initOptimization();//this line may be removed

(Optional) Optimize the map:

Ptr<DepthmapDenoiseWeightedHuber> dp = 
   createDepthmapDenoiseWeightedHuber(cv.baseImageGray,cv.cvStream);
DepthmapDenoiseWeightedHuber& denoise=*dp;
GpuMat a,d;
a.create(cv.loInd.size(),cv.loInd.type());
//cv.loInd.copyTo(a, cv.cvStream);
cv.cvStream.enqueueCopy(cv.loInd,a);

bool doneOptimizing;
do{
   for (int i = 0; i < 10; i++) {
       d=denoise(a,optimizer.epsilon,optimizer.getTheta());
   }
   doneOptimizing=optimizer.optimizeA(d,a);//signature may change to optimizeA(a, d, cv)
}while(!doneOptimizing);

Attach a tracker to the cost volume and track the next frame (this whole bit may be simplified in the near future):
```
Track tracker(cv);
tracker.depth=optimizer.depthMap();
```

Align frames until we have enough for the next cost volume (maybe 3 times?)

tracker.addFrame(nextImage);
tracker.align();

Mat nextR, nextT;
LieToRT(tracker.pose,nextR,nextT);

Repeat from step 3

This should be enough to get the basics going in a real world non-realtime situation.

Doing it in real time requires cutting some corners right now though, since I haven't written a gpu version of the tracker. The "OpenDTAM" class is meant to be a scheduler that achieves this intelligently, but it is incomplete, so for now you will need to roll your own if you want realtime performance.

The expected eventual interfaces for the OpenDTAM class are: Automatic mode (will require a bootstrap procedure to be written) :

VideoCapture camera;
OpenDTAM odm(camera,cameraMatrix, calibrationParams);
odm.start(cpuRange(2,4)); //launches a bunch of threads that take over 
                          //cores 2 and 3 of the cpu and do all the work

Manual Mode:

OpenDTAM odm(camera,cameraMatrix);
odm.addFrameWithPose(image0, R0, T0); //threads start implicitly
odm.addFrameWithPose(image1, R1, T1);
while(1){
    camera >> nextImage;
    odm.addFrame(nextImage);
}

Hopefully I'll get a paper published in the next year on how the internals of OpenDTAM actually work. Until then I'll probably just document things when people express interest or as I get to it.

amiltonwong commented 10 years ago

Thank you very much, Paul. I'll try on it.

Tsugikuni-Yoriichi commented 5 years ago

Thanks for your detail information on this page. It is very helpful. I think your conversation has already figured out a way to solve the initial problem. But I still have some problem. Can you give some suggestions and I will thank you for your help.

(1) I think you mentioned about the initial R and T estimation with feature-based method. Do you mean that I only need to estimate 2 frames, and from your steps 4-8 I can solve the initial problem.

(2) I found the function "cv.cvStream.enqueueCopy(cv.loInd,a);" has no reference and it is red in CLION. Has it been replaced by some other functions?

Thank again for your great work on this project.

anuranbaka / OpenDTAM

about real-time camera input #12