alicevision / Meshroom

3D Reconstruction Software
http://alicevision.org
Other
11.13k stars 1.08k forks source link

Remove input link in SfM, set input file cameraInit.sfm instead, result in less output images in PrepareDenseScene [question] #1167

Closed SamMaoYS closed 3 years ago

SamMaoYS commented 3 years ago

Describe the problem I am working on reconstructing a scene with known camera poses, I followed the link Using known camera positions but it doesn't work for me.

I tested the pipeline, by running the entire default pipeline, which successfully goes through the entire pipeline and returned a mesh.

I then copied the input file path in the SfM node input attribute, removed the input link of Structure from Motion node, and paste back the input file path in the input attribute field, and run the SfM node again. The SfM node works fine, and then the next node PrepareDenseScene only goes from rangStart 0 with rangSize 40. And output only has 0.log, 0.statistics, 0.status and some images. Whereas, originally it will have 0.log, 0.statistics, 0.status, and 1.log, 1.statistics, 1.status.

Which makes the node DepthMap reports an error. Cannot find image file coresponding to the view ...

Screenshots Screenshot from 2020-11-27 10-12-03

Log

Program called with the following parameters:
 * downscale = 2
 * exportIntermediateResults = 0
 * imagesFolder = "/home/sam/Development/Lab/ScanNet/multiscan/Meshroom-2020.1.0/cache/MeshroomCache/PrepareDenseScene/be0e32caa28e9c1fb254f2491bfdf65df70903ec"
 * input = "/home/sam/Development/Lab/ScanNet/multiscan/Meshroom-2020.1.0/cache/MeshroomCache/StructureFromMotion/63b8b60f94cb20221543f6a6db0f6083f0e7188b/sfm.abc"
 * maxViewAngle = 70
 * minViewAngle = 2
 * nbGPUs = 0
 * output = "/home/sam/Development/Lab/ScanNet/multiscan/Meshroom-2020.1.0/cache/MeshroomCache/DepthMap/cf48f551759adf18dc2818c6e500d06cade9acb8"
 * rangeSize = 3
 * rangeStart = 0
 * refineGammaC = 15.5
 * refineGammaP = 8
 * refineMaxTCams = 6
 * refineNDepthsToRefine = 31
 * refineNSamplesHalf = 150
 * refineNiters = 100
 * refineSigma = 15
 * refineUseTcOrRcPixSize = 0
 * refineWSH = 3
 * sgmGammaC = 5.5
 * sgmGammaP = 8
 * sgmMaxTCams = 10
 * sgmWSH = 4
 * verboseLevel = "info"

[09:51:17.704761][info] CUDA-Enabled GPU.
Device information:
    - id:                      0
    - name:                    GeForce RTX 2060 SUPER
    - compute capability:      7.5
    - total device memory:     7973 MB 
    - device memory available: 6550 MB 
    - per-block shared memory: 49152
    - warp size:               32
    - max threads per block:   1024
    - max threads per SM(X):   1024
    - max block sizes:         {1024,1024,64}
    - max grid sizes:          {2147483647,65535,65535}
    - max 2D array texture:    {131072,65536}
    - max 3D array texture:    {16384,16384,16384}
    - max 2D linear texture:   {131072,65000,2097120}
    - max 2D layered texture:  {32768,32768,2048}
    - number of SM(x)s:        34
    - registers per SM(x):     65536
    - registers per block:     65536
    - concurrent kernels:      yes
    - mapping host memory:     yes
    - unified addressing:      yes
    - texture alignment:       512 byte
    - pitch alignment:         32 byte

[09:51:17.870226][info] Supported CUDA-Enabled GPU detected.
[09:51:17.878596][fatal] Cannot find image file coresponding to the view '1072481432' in folder '/home/sam/Development/Lab/ScanNet/multiscan/Meshroom-2020.1.0/cache/MeshroomCache/PrepareDenseScene/be0e32caa28e9c1fb254f2491bfdf65df70903ec/'.

Desktop:

This is very confusing to me, since I didn't change anything. I simply just remove an input link and set back the input file path. Which makes the pipeline fails. I would like to ask, is the input link have some additional functionalities except for setting the input file path?

Thank you!

natowi commented 3 years ago

It is not as easy as just adding a new input path to the sfm node. It requires some tinkering to get it to work with your own camera poses. You need to convert your known camera poses from ScanNet format to the Meshroom sfm (json) format. "Cannot find image file corresponding to the view" suggests that the images are not correctly referenced (image view ids/names do not match).

I then copied the input file path in the SfM node input attribute, removed the input link of Structure from Motion node, and paste back the input file path in the input attribute field, and run the SfM node again.

That is not what the documentation says. Carefully read the second part.

SamMaoYS commented 3 years ago

Thank you @natowi, I did the camera poses conversion to Meshroom sfm format. What I did is first run the CameraInit node to get an initial sfm file, then add corresponding poses for each view ID.

Following the Use known pose wiki, I disconnect the link from CameraInit to FeatureExtraction and set the input file of FeatureExtraction to my new generated camera sfm file. But a problem occurs in ImageMatching node.

I think due to motion blur or some other issues, not all input images will be used, and the output of FeatureExtraction node doesn't have descriptors for all images in the cameraInit.sfm file. So ImageMatching will report an error about Can't find descriptor of view 353573065 in ....

Is this expected when doing the reconstruction with known poses if some images are dropped? One thing I will test is to delete some views in the sfm file, so it only contains the images which are in the FeatureExtraction output. But the original CameraInit output file contains all the input images, and it will not crush the ImageMatching. So I am not sure if I am heading the right direction.

As you mentioned there are some tinkering to get it to work, do you mean the tinkering mentioned in the wiki Using known camera positions, or there are some additional steps I need to go through?

Thank you very much!

Log in ImageMatching

Program called with the following parameters:
 * featuresFolders =  = [/home/sam/Development/Lab/ScanNet/multiscan/Meshroom-2020.1.0/cache/MeshroomCache/FeatureExtraction/7522b1f5228652063423763dd116e2ac69aa1ec2]
 * input = "/media/sam/HDD-Ubuntu/Meshroom/log2sfm/test/cameras.sfm"
 * matchingMode = "a/a" (default)
 * maxDescriptors = 500
 * method =  Unknown Type "20EImageMatchingMethod"
 * minNbImages = 200
 * nbMatches = 50
 * nbNeighbors = 50 (default)
 * output = "/home/sam/Development/Lab/ScanNet/multiscan/Meshroom-2020.1.0/cache/MeshroomCache/ImageMatching/3c2ba0193dcf101767cfe2af2a6b5f21e96cf85c/imageMatches.txt"
 * outputCombinedSfM = "" (default)
 * tree = "/home/sam/Development/Lab/ScanNet/multiscan/Meshroom-2020.1.0/aliceVision/share/aliceVision/vlfeat_K80L3.SIFT.tree"
 * verboseLevel = "info"
 * weights = ""

[18:26:46.296359][fatal] Can't find descriptor of view 353573065 in:
    - /home/sam/Development/Lab/ScanNet/multiscan/Meshroom-2020.1.0/cache/MeshroomCache/FeatureExtraction/7522b1f5228652063423763dd116e2ac69aa1ec2

view 353573065 in generated cameras.sfm, and doesn't have descriptor in FeatureExtraction output.

natowi commented 3 years ago

I think due to motion blur or some other issues, not all input images will be used

That is exactly what I mean. The first default test run in Meshroom is to make sure all images have enough features and matches to get reconstructed. All cameras should be reconstructed.

The success for the known camera positions method described can vary, depending on your use case. There can be different issues depending on whether you use your own algorithm to calculate the positions, use known poses from a robot (accuracy problem) or use a fixed rig for which you calculated the camera positions in a first run and now use the known positions to speed up the process (that´s the most reliable/tested use case for now).

There should be an option to handle unmatched images but I´d have to check what works.

Best start your test with removing the blurred views.

-- It would be great if you could share your conversion script on Github.

Can you share (public link or via private mailing list) your dataset with the new .SFM file so I can test it on my own computer? It is easier the guessing issues into the blue.

SamMaoYS commented 3 years ago

I generated a new .sfm file by replacing poses in cameras.sfm in StructureFromMotion node built in the first Meshroom pipeline run with my own camera poses, so the dropped images will not have pose data.

But the same problem still exists. I also notice that the output of FeatureExtraction changed quite a lot, as shown in the screenshots below. Originally, it will go through a much broader image range, output multiple .status files, and a lot more .desc descriptor files. However, when I disconnect the input link from CameraInit node, and set the input to my own generated sfm file, it seems it will only loop once the rangeSize which is 40, so only 40 descriptor files are generated by FeatureExtraction, but I have around 500 valid images in input. This will make the ImageMatching node stop continuing, and with the error Can't find descriptor of view xxx in ....

Screenshots FeatureExtraction output in original pipeline without known poses
Screenshot from 2020-11-28 18-33-03

FeatureExtraction output with known poses generated sfm file Screenshot from 2020-11-28 18-33-24

I also sent supportive materials to the private mailing list. Here is the script I used to convert my own camera poses to the sfm file. https://github.com/SamMaoYS/SfmConverter.git The generated .sfm files with known camera poses are inside the data folder.

Thank you very much for helping!

natowi commented 3 years ago

Thanks, I´ll take a look and report back.

natowi commented 3 years ago

What camera did you use?

natowi commented 3 years ago

Ok, I reduced the sfm file to 40 images to make it manageable. Without known poses, only 4/40 images were reconstructed with the default settings (using the sfm file without the poses, replacing the cameraInit). With the cameraInit+knownposes and default node settings, 13/40 images were reconstructed. Enabling "Matching from Known camera poses" produces an error, most likely there are not enough features to match. (The dataset is quite bad: blurry images, images taken close together, not many features in the environment to detect.)

pos2

I did not figure out why there is the limit in the rangesize of 40

SamMaoYS commented 3 years ago

What camera did you use?

I use an iPad Pro 2nd Gen camera, I hold the iPad and walk around in the scene to record a video about the room. Then I extract frames from the video, which will result in blurry images, images were taken close together as you mentioned.

The limitation of the rangesize of 40, could you offer me some comments or suggestions about the potential cause? So I know where to look for, I am just a beginner on Meshroom pipeline, and not familiar with the source code structure. It will also be great if you have time to help me with this rangesize problem.

Thank you very much for your time and helping!

SamMaoYS commented 3 years ago

@natowi, Hi thank you for answering my questions. I notice that #1240 posted that "_FeatureExtraction step generates 40 x .desc and 40 x .feat. Only 1 chunk is done." After these days of trying to reconstruct scenes with known camera poses, I found that the problem is in the class DynamicNodeSize. In CameraInit node, the node size is computed as size = desc.DynamicNodeSize('viewpoints'), where viewpoints has ListAttribute type. In FeatureExtraction the size is size = desc.DynamicNodeSize('input'), where input has File type. In CameraInit the input type is ListAttribute, the size will be set as the total number of the viewpoints. However, in FeatureExtraction the size is set as the same as the previous node, in Using-known-camera-positions we will disconnect the CameraInit and FeatureExtraction. Thus the size of FeatureExtraction is wrong, result in only one chunk (40 images) is done. The number 40 is from Parallelization, the default blockSize is 40, and it only iterate once with node size is 0. Currently, my workaround is to set the blockSize larger than the total number of input images. The problem I am facing is in node FeatureMatching. If I set Match From Known Camera Poses to True, I will face the following error messages.

Log

[07:37:04.176247][info] Number of pairs: 5565 [07:37:04.176569][info] There are 106 views and 5565 image pairs. [07:37:04.176573][info] Load features and descriptors Loading regions 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| [07:37:04.192841][info] Supported CUDA-Enabled GPU detected.


[07:37:05.782409][info] Putative matches from known poses: 2926 image pairs. Compute pairwise fundamental guided matching: 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| Bus error (core dumped)

In StructureFromMotion "Lock Scene previously reconstructed" and "Force Lock of all Intrinsic Camera Parameters" works fine, if I don't use "Match From Known Camera Poses" in node FeatureMatching. I am not sure how much it will influence the reconstruction quality if I still doesn't use known camera poses in FeatureMatching.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue is closed due to inactivity. Feel free to re-open if new information is available.