alicevision / Meshroom

3D Reconstruction Software
http://alicevision.org
Other
11.24k stars 1.09k forks source link

Different performance between Meshrooms Windows and Linux #2124

Open zell180 opened 1 year ago

zell180 commented 1 year ago

Hello all, we are experiencing very different behavior regarding the StructureFromMotion step in different environments.

We use the same dataset in the attached example. On Windows we use Meshroom 2023.1.0 and Windows 11 with Intel i9 12900KF as test environment

In production environment we use Ubuntu 20.04.04 LTS with 4x Intel Xeon Gold 5218

Operations in general are slightly slower on Xeon as per attached schematic but why does SFM take 10 times longer in production? <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

NODE | XEON GOLD 5218 | INTEL i9 12900KF | DELTA -- | -- | -- | -- CameraInit | 0,13 | 0,07 | 0,05 FeatureExtraction | 8,77 | 7,34 | 1,43 ImageMatching | 0,08 | 0,06 | 0,02 FeatureMatching | 0,51 | 0,49 | 0,02 StructureFromMotion | 76,39 | 7,13 | 69,26 SfMAlignment | 0,10 | 0,09 | 0,01 PrepareDenseScene | 5,17 | 4,84 | 0,33 DepthMap | 17,97 | 11,69 | 6,28 DepthMapFilter | 8,85 | 7,63 | 1,22 Meshing | 31,81 | 25,23 | 6,58 MeshFiltering | 3,50 | 2,99 | 0,51 Texturing | 27,97 | 15,86 | 12,10

I attach the logs of the two environments. Is there any way to optimize the operation?

Thank you

LOG SVIL `[2023-07-24 11:07:08.363085] [0x000010a8] [trace] Embedded OCIO configuration file: 'C:\Users\user\Desktop\Meshroom-2023.1.0\aliceVision/share/aliceVision/config.ocio' found. Program called with the following parameters:

Hardware : Detected core count : 8 OpenMP will use 8 cores Detected available memory : 52967 Mo

[11:07:08.367085][warning] The number of intrinsics is incoherent: [11:07:08.367085][warning] 2 intrinsics declared and 1 intrinsics used. Loading features 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[11:07:09.436085][info] Fuse matches into tracks:

0 | 1.5e+04 0.5 | 1.1e+04 1 | 5.1e+03 1.5 | 2e+03 2 | 8.1e+02 2.5 | 4.2e+02 3 | 1.8e+02 3.5 | 97 4 | 38 4.5 | 31 5 | 19 5.5 | 9 6 | 8 6.5 | 6 7 | 6 7.5 | 0 8 | 1 8.5 | 0 9 [11:07:16.853593][info] # landmarks: 11812 [11:07:16.853593][info] # overall observations: 34142 [11:07:16.853593][info] Landmarks observations length min: 2, mean: 2.89045, median: 2, max: 6 [11:07:16.853593][info] Histogram of observations length:

2 | 6015 3 | 2933 4 | 1510 5 | 851 6 | 503 7 [11:07:16.854592][info] Landmarks per view min: 0, mean: 0, median: 0, max: 0 [11:07:16.854592][info] Histogram of nb landmarks per view:

0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 1

Compute scene structure color 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[11:07:17.488593][info] Structure from motion took (s): 8.140000 [11:07:17.488593][info] Generating HTML report... [11:07:17.498593][info] Export SfMData to disk: C:/Users/user/MeshroomCache/StructureFromMotion/2c3dd681fcd589009e23623dc173c06f1fe7483d/sfm.abc [11:07:17.510593][info] Structure from Motion results:

LOG PROD `[2023-07-19 08:40:51.463769] [0x00007fcd87b15000] [trace] Embedded OCIO configuration file: '/Meshroom-2023.1.0-av3.0.0-centos7-cuda11.3.1/aliceVision/share/aliceVision/config.ocio' found. Program called with the following parameters:

Hardware : Detected core count : 128 User upper limit on core count : 128 OpenMP will use 128 cores Detected available memory : 183428 Mo User upper limit on memory available : 8796093022207 Mo

Loading features 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[08:40:51.945808][info] Fuse matches into tracks:

0 | 2.4e+04 0.5 | 1.1e+04 1 | 4.2e+03 1.5 | 1.5e+03 2 | 6.2e+02 2.5 | 3e+02 3 | 1.5e+02 3.5 | 76 4 | 48 4.5 | 25 5 | 15 5.5 | 8 6 | 7 6.5 | 0 7 | 1 7.5 | 1 8 [08:42:07.349841][info] # landmarks: 15036 [08:42:07.349851][info] # overall observations: 42296 [08:42:07.349857][info] Landmarks observations length min: 2, mean: 2.81298, median: 2, max: 6 [08:42:07.349866][info] Histogram of observations length:

2 | 8178 3 | 3598 4 | 1747 5 | 920 6 | 593 7 [08:42:07.350189][info] Landmarks per view min: 0, mean: 0, median: 0, max: 0 [08:42:07.350202][info] Histogram of nb landmarks per view:

0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 1

Compute scene structure color 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[08:42:07.989449][info] Structure from motion took (s): 76.394000 [08:42:07.989558][info] Generating HTML report... [08:42:08.005434][info] Export SfMData to disk: /test/data/168975324090_Pioltello_LL_dispari_not_available/SD_OUT/ms_out/StructureFromMotion/7f9c2cf20d68b31efc56e47937f5465cec9bdd77/sfm.abc [08:42:08.015161][info] Structure from Motion results:

zell180 commented 1 year ago

No one have opinion about this behaviour?

msanta commented 1 year ago

Not sure why there is such a large time difference. Doing a compare between the logs I can see a variety of differences, such as the number of landmarks found at the end of the SFM step. I would have expected this to be the same given the same input data.

Is the feature detection and matching producing identical results on both systems?

I would try to eliminate any potential differences in the process. For the SFM step use the same initial image pair (on windows it used aaa.tiff and fff.tiff, while on linux it used ddd.tiff and aaa.tiff). Not sure if there is a way to limit the number of CPUs used, but if there is then try to use 8 for both.

zell180 commented 1 year ago

Not sure why there is such a large time difference. Doing a compare between the logs I can see a variety of differences, such as the number of landmarks found at the end of the SFM step. I would have expected this to be the same given the same input data.

Is the feature detection and matching producing identical results on both systems?

I would try to eliminate any potential differences in the process. For the SFM step use the same initial image pair (on windows it used aaa.tiff and fff.tiff, while on linux it used ddd.tiff and aaa.tiff). Not sure if there is a way to limit the number of CPUs used, but if there is then try to use 8 for both.

About saome initial pair i've try to set static pair and the result is the same about elapsed time. I was just looking for a way to limit the cpu, someone know how to do that?

zell180 commented 1 year ago

I've done new test with i7-1165G7. Same photo and same settings on Windows. With normal and ultra respectively i got 9 sec and 12 sec for StructureFromMotion. I think number of cores should be the problem. How can i limit it on that node? @fabiencastan

natowi commented 1 year ago

I think there is a way to add this limitation, but I don´t know details. Related PRs: https://github.com/alicevision/AliceVision/pull/1304 https://github.com/alicevision/Meshroom/pull/1836

zell180 commented 1 year ago

Ok i'm 99.9% sure that the problem are the large number of core. Just try with i5-8250U with 16GB RAM on Ubuntu 22.04 and the task took only 13.4seconds. Please someone can help to reduce core numbers or think a solution?

natowi commented 1 year ago

Maybe https://manpages.ubuntu.com/manpages/trusty/man1/cpulimit.1.html

The  -c flag sets the number of CPU cores the program thinks are available. Usually
              this is detected for us, but can be over-ridden.
zell180 commented 1 year ago

i got: Unrecoginzed argument -c 8 for meshroom_batch

servantftechnicolor commented 1 year ago

Just tested on several computers, 16/40/96/256 cores and they are all the same runtime around 1m and 20 sec for 46 images.

zell180 commented 1 year ago

Just tested on several computers, 16/40/96/256 cores and they are all the same runtime around 1m and 20 sec for 46 images.

i've only 6 image and the runtime gap is crazy. What should i check or fix?

zell180 commented 1 year ago

so there is no way to reduce cpu number?

msanta commented 1 year ago

How did you run the command?

msanta commented 1 year ago

The cpulimit command should be used something like this: cpulimit -c 8 -l 100 -- /path/to/meshroom_batch <options for meshroom_batch>. However _meshroombatch is going to start another process for each step and I don't think these will have the CPU limit applied.

Interestingly I had a look at the .status files from a project and there is a --maxCores flag specified when running the individual commands. Eg: "commandLine": "aliceVision_featureMatching --input \"...\" --featuresFolders \"...\" --imagePairsList \"...\" --describerTypes dspsift --photometricMatchingMethod ANN_L2 --geometricEstimator acransac --geometricFilterType fundamental_matrix --distanceRatio 0.8 --maxIteration 2048 --geometricError 0.0 --knownPosesGeometricErrorMax 5.0 --minRequired2DMotion -1.0 --maxMatches 0 --savePutativeMatches False --crossMatching False --guidedMatching False --matchFromKnownCameraPoses False --exportDebugFiles False --verboseLevel info --output \"..\" --rangeStart 0 --rangeSize 20 --maxMemory=9223372036854771712 --maxCores=16"

zell180 commented 1 year ago

i've successfully launch meshroom_batch with the instruction you kindly provided but as you supspected the limit is ignored in meshroom "subprocess". The steps in which we have bad performance is StructureFromMotion, and seems that is not possible to pass this parameter. Is possible to do some mod to enable them?

msanta commented 1 year ago

You can use cpulimit on a running process by specifying the process ID with the -p option: cpulimit -c 8 -l 100 -p 1234. That should let you see if reducing the CPU resources has an impact.

The -e and -P options might be worth trying out. cpulimit will wait for a process with the given name (-e) or path (-P) and then throttle it. For example I can run the command to target Meshroom and its waits until I have started the app.

$ cpulimit -c 1 -l 100 -e Meshroom
Warning: no target process found. Waiting for it...
Process 60909 detected

However I have had no luck when targetting the _aliceVisionincrementalSfM program by name or path (but specifying the process ID works of course).

zell180 commented 1 year ago

i've used cpulimit -c 8 -l 100 -e aliceVision_incrementalSfM and i've got Warning: no target process found. Waiting for it... Process 1144209 detected

Bingo i think! But in aliceVision_incrementalSfM log i see that cores are 128. I think we need to find a way to pass --maxCores to aliceVision_incrementalSfM

[5/12] StructureFromMotion

FlachyJoe commented 1 year ago

--maxCores is set here for all the command lines https://github.com/alicevision/Meshroom/blob/ea26d89844456883fc995a76115dd701cc51ec10/meshroom/core/desc.py#L680 it reads the value from meshroom's cgroup Does aliceVision_incrementalSfM really access to the all cores?

zell180 commented 1 year ago

if i read aliceVision_incrementalSfM help i see that param is not --maxCores but --maxMemoryAvailable and --maxCoresAvailable. Should be this the problem?