Open amiltonwong opened 10 years ago
Absolutely! Thank you so much, I'm actually being a bit swamped by the level of interest this has been generating over the last week.
A live camera would be really nice and I think it would not be too hard to do. The basic problem is that DTAM has a chicken-and-egg issue: The tracking system needs a depthmap in order to find the pose of the camera, but the pose needs to be known to build a depthmap. I dodged the issue by having the real poses on the synthetic data.
In the original paper PTAM is used to generate the initial two poses, which produces a reconstruction accurate up to scale (monocular reconstruction inherently cannot recover scale without an outside reference or a trick like depth from focus). So to use real camera input, we need some equivalent way to start up the system. After the two initial poses, the system should be stable in the sense that it should produce reconstructions accurate up to scale.
How to actually do this (1 and 2 are not implemented, so if you can help with that it would be great):
Fill in R(3x3 double), T(3x1 double), cameraMatrix(3x3 double), and RGB pixel data and initialize a CostVolume with the first image.
CostVolume cv( initialImage,
(FrameID)<some unique number>,
<# of layers, 32 is good>,
<1/near distance>,
<1/far distance>,
initialR,
initialT,
cameraMatrix);
Use the appropriate R,T, RGB data to update the cost volume with at least one more frame (right now more than two additional frames is a bad plan because of occlusion, but I have a fix for that coming in the next few days).
cv.updateCost(image, R, T);
Attach an optimizer to the CostVolume (syntax for this will likely change eventually).
Optimizer optimizer(cv);//signature may change to empty
optimizer.initOptimization();//this line may be removed
(Optional) Optimize the map:
Ptr<DepthmapDenoiseWeightedHuber> dp =
createDepthmapDenoiseWeightedHuber(cv.baseImageGray,cv.cvStream);
DepthmapDenoiseWeightedHuber& denoise=*dp;
GpuMat a,d;
a.create(cv.loInd.size(),cv.loInd.type());
//cv.loInd.copyTo(a, cv.cvStream);
cv.cvStream.enqueueCopy(cv.loInd,a);
bool doneOptimizing;
do{
for (int i = 0; i < 10; i++) {
d=denoise(a,optimizer.epsilon,optimizer.getTheta());
}
doneOptimizing=optimizer.optimizeA(d,a);//signature may change to optimizeA(a, d, cv)
}while(!doneOptimizing);
Attach a tracker to the cost volume and track the next frame (this whole bit may be simplified in the near future):
Track tracker(cv);
tracker.depth=optimizer.depthMap();
Align frames until we have enough for the next cost volume (maybe 3 times?)
tracker.addFrame(nextImage);
tracker.align();
Mat nextR, nextT;
LieToRT(tracker.pose,nextR,nextT);
This should be enough to get the basics going in a real world non-realtime situation.
Doing it in real time requires cutting some corners right now though, since I haven't written a gpu version of the tracker. The "OpenDTAM" class is meant to be a scheduler that achieves this intelligently, but it is incomplete, so for now you will need to roll your own if you want realtime performance.
The expected eventual interfaces for the OpenDTAM class are: Automatic mode (will require a bootstrap procedure to be written) :
VideoCapture camera;
OpenDTAM odm(camera,cameraMatrix, calibrationParams);
odm.start(cpuRange(2,4)); //launches a bunch of threads that take over
//cores 2 and 3 of the cpu and do all the work
Manual Mode:
OpenDTAM odm(camera,cameraMatrix);
odm.addFrameWithPose(image0, R0, T0); //threads start implicitly
odm.addFrameWithPose(image1, R1, T1);
while(1){
camera >> nextImage;
odm.addFrame(nextImage);
}
Hopefully I'll get a paper published in the next year on how the internals of OpenDTAM actually work. Until then I'll probably just document things when people express interest or as I get to it.
Thank you very much, Paul. I'll try on it.
Thanks for your detail information on this page. It is very helpful. I think your conversation has already figured out a way to solve the initial problem. But I still have some problem. Can you give some suggestions and I will thank you for your help.
(1) I think you mentioned about the initial R and T estimation with feature-based method. Do you mean that I only need to estimate 2 frames, and from your steps 4-8 I can solve the initial problem.
(2) I found the function "cv.cvStream.enqueueCopy(cv.loInd,a);" has no reference and it is red in CLION. Has it been replaced by some other functions?
Thank again for your great work on this project.
Hi, Paul and all other contributors,
Thank you so much for giving so great efforts to implement the DTAM algorithm. And I think this is an important step for letting more computer vision communities access to the dense reconstruction domain. As I had initially installed and practiced the OpenDTAM code, I hope that you can answer me some questions: In your preliminary code, it uses the dataset (traj_30_seconds) to give some demos. How about using a live camera input? It’ll be a very practical scenario. And I am trying to write that interface. But I’m not so familiar with the architecture of your OpenDTAM, would you suggest me which part I should first work on and something worth noting?
Best regards, Milton