ETH3D / dataset-pipeline

Pipeline for creating multi-view benchmark datasets from laser scans and images.
BSD 3-Clause "New" or "Revised" License
152 stars 20 forks source link

Table of contents

Overview

The ETH3D dataset processing tools consist of a number of programs for creating 3D reconstruction evaluation datasets from images and laser scans. This includes tools for laser scan processing (outlier removal, scan alignment, ...) and image alignment wrt. laser scans (by optimizing for color consistency among images and the scans). The tools additionally include support for semantic labeling of point clouds and limited support for scan-image alignment for depth images, which was not used for the ETH3D benchmark.

If you use this code for research, please cite our paper:

T. Schöps, J. L. Schönberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, A. Geiger, "A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos", Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [Bibtex][PDF][Supplementary]

The pipeline for processing a dataset is as follows:

First, examples of using the pipeline are described for the case of individual images (such as taken by a DSLR camera), and for the case of images taken by a fixed camera rig. Then all pipeline steps are described in detail below.

Building

Building was tested on Ubuntu 18.04 and 16.04. It would be expected that later Ubuntu versions could also be used with little effort.

The following external dependencies are required:

OpenCV was used in version 4.1.2, for PCL version 1.8.1 was used and for Qt, version 5 was used.

The code can be built using CMake, for example as follows:

mkdir build_RelWithDebInfo
cd build_RelWithDebInfo
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
make -j

Note : Problems occur when running code on ubuntu18.04+ when PCL is built from source, but not when using libpcl-dev apt package.

A Dockerfile is given to build everything from scratch in ubuntu 20.04.

Step-by-step example (individual images)

For this example, the pipeline is run on the "terrace" DSLR training dataset of the ETH3D benchmark.

Step-by-step example (camera rig images)

This example shows the case of images taken from a camera rig with fixed extrinsics, i.e., fixed relative camera poses. As for the previous example, the "terrace" dataset is used, however now with different images. Most of the steps are identical to the previous example.

Processing Pipeline

In this section, the steps of the processing pipeline are described in detail.

Data acquisition

The dataset pipeline expects a set of images and colored point clouds (as PLY files) as input. This section contains a few general hints about recording this input data.

The laser scanner scan origin should be at the position (0, 0, 0) of each scan file. The pipeline has been tested with point clouds from a Faro Focus X 330 laser scanner (configured to record up to 28 million points per scan). One to four point clouds have been used per scene. In principle, using more clouds might be helpful for improved occlusion handling, but this has not been tested. Since care is taken in the point cloud processing to not mix colors from different scans, using more scans might lead to different point-neighbor distances in the point clouds used for image alignment than intended, which potentially affects the results. In case this becomes an issue, it may be an option to use many scans for occlusion handling only, and a reduced number for the rest of the pipeline.

If the input images are taken from a video, use of a global shutter may be helpful since rolling shutter is not modeled. It is helpful to fix the camera intrinsics (focus and potentially aperture) while recording the data for a scene. Images should not be taken too far away from the positions from which laser scans are taken, otherwise occlusion handling might become an issue (unless the scene is observed completely by the laser scans). The pipeline has mainly been tested and tuned on high-resolution DSLR images. Images such as the multi-camera rig images of the ETH3D benchmark may lead to worse results: for those datasets, we had to strongly sub-select ones for which good results could be obtained. In general, strongly textured scenes are more likely to work well than scenes with many homogeneous or reflective surfaces. The pipeline contains special support for images taken from a fixed camera rig, ensuring that constant rig extrinsics are used. Notice that the pipeline has not been tested with scenes containing images with different resolutions.

File system layout

For the most part, the organization of the files can be defined freely. However, the pipeline expects the following file system layout for image files: all images of a certain camera should be within a distinct directory (camera_A and camera_B in the example below). All of these image directories should be within a common directory. Optionally, in the same directory the mask folders masks_for_cameras and masks_for_images can be placed. In masks_for_cameras, a file named like the camera directory with the extension .png can be created to define a mask for all images of that camera (for example, to mask out black regions in a fisheye image). In masks_for_images, subdirectories named like the camera directories can be created, and within each subdirectory, images named like the camera images with extension .png can be created to define a mask for an individual image. Below is an example for two cameras, camera_A and camera_B:

- camera_A
  - image_A_1.jpg
  - image_A_2.jpg
- camera_B
  - image_B_1.jpg
  - image_B_2.jpg
- masks_for_cameras (optional)
  - camera_A.png
  - camera_B.png
- masks_for_images (optional)
  - camera_A
    - image_A_1.png
    - image_A_2.png
  - camera_B
    - image_B_1.png
    - image_B_2.png

Laser scan outlier removal

The laser scans should be cleaned from outliers and noise. In principle this can be done with any suitable tool. The dataset pipeline includes a tool for removing some outliers automatically, and a point cloud editor for manual outlier removal. The latter is necessary since due to reflections, scene geometry can get distorted in ways which still make it look like a valid surface, but which can be clearly identified as outliers by a human.

Automatic point cloud cleanup

The PointCloudCleaner tool can automatically remove some outliers based on looking at the distribution of the K-nearest-neighbor distances. This uses a modified version of PCL's StatisticalOutlierRemoval which uses local neighborhoods. Notice that the result is far from perfect, but may be helpful as a first step.

Usage:

PointCloudCleaner --in <in_file.ply> --filter <knn,factor> [--filter <knn2,factor2>, ...]

The result is a file in_file.ply.inliers.ply, which contains the points which were classified as valid, and a file in_file.ply.outliers.ply, which contains the remaining points. It is worth looking at both files, since it is possible that valid points get classified as outliers. For the ETH3D point clouds, the used parameter values were --filter 270,1.15 --filter 20,1.15. This runs two iterations of the algorithm, one to remove outliers at a larger scale and one to remove outliers at a smaller scale.

Manual point cloud cleanup

Point cloud editor

The PointCloudEditor tool was written for the purpose of manually removing outliers from point clouds: it is fairly fast and offers a convenient tool for selecting points within a polygon. However, since the program was intended for research use only, it is not very polished and not self-explanatory. Usage instructions are provided in the following, followed by hints on removing laser scan outliers based on our experience with the ETH3D dataset.

The PointCloudEditor tool also has support for semantic labeling, and very limited support for meshes, including CSG operations using the Cork library. The semantic labeling functionality of PointCloudEditor is described later in this section, which is however not part of the dataset processing pipeline. The mesh functionality is described in the DatasetInspector section.

PointCloudEditor instructions

After starting the program, PLY files can be opened either by using drag'n'drop to drag them onto the editor window, or by using the Open button on the top left. For the use case of removing outliers from a laser scan, only the scan itself and its corresponding .outliers.ply file should be opened. This allows to move points between the two files (move wrong points into the outliers file, and pull valid points classified as outliers back into the main file). All editing operations apply to the object which is selected in the list on the top left. Attention: a common mistake while more than one object is loaded is to attempt an operation on an object which is not selected (while the selected object is potentially set to invisible). Always make sure that the correct object is selected.

Objects can be hidden with the button on the right side of their list entry. Objects can be saved and closed using the right-click menu in the point cloud list.

As a general note, key presses are only registered if the focus is set on the 3D view. Thus, if a key press does not have any effect, left-click the 3D view and retry. Furthermore, there is no undo/redo functionality; save often!

The basic controls for the 3D view are:

The controls in the "Display" section on the left side of the window are:

The relevant editing tools on the left side of the window are:

In addition, the "Statistical outlier detection" section can be used to run the outlier detection algorithm which is also used by the PointCloudCleaner tool. The parameters can be set in the edit boxes below the button. Detected outliers will be selected.

The editing controls are (remember that the focus has to be on the 3D view for key presses to be registered):

Hints on removing outliers in laser scans

This section contains hints / recommendations for removing outliers based on our experience with the ETH3D dataset. Outliers in the laser scans by the scanner used for the this dataset mostly arose from surfaces which are reflective or translucent for the laser beam, but for example also due to moving objects. Furthermore, a very common artifact was interpolation between two non-connected surfaces, apparently caused by parts of the laser beam being reflected by both. It is sometimes very hard to spot where a valid surface ends and an interpolation artifact begins. If outlier-free results are desired, the scans should be screened very accurately because it is extremely easy to miss some outliers. However, it may not be necessary to remove all outliers, depending on the application; cleaning up a laser scan with millions of points thoroughly by hand can take a very long time.

In general it is recommended to hide the .outliers.ply file while editing, but it can be helpful to toggle its visibility to assess which points are outliers and which are not: something which looks like a valid surface on its own can be shown to belong to a foreground-background interpolation artifact by this, for example.

Outliers can be points randomly flying in the air, or distortions of surfaces. For distortions, the severity should be judged and if it is only very small, it may be better to leave the points in. There are always some very small inaccuracies which can for example also arise from different materials. Glass surfaces can cause different measurement results:

Since we did not want to make any statement about the correct reconstruction of glass in ETH3D, the measurements are often uncertain, and the ETH3D evaluation scheme does not handle points correctly which are measured through translucent surfaces that may be correct reconstructions, all such points were removed completely. This includes valid surfaces measured through glass. Some of these parts are very easy to miss. A technique to find them is to look at all surfaces once from the point of view of the laser scanner. This can reveal potential surfaces measured through glass that must be deleted.

Thin objects such as fences are another problematic element. First, often they are not measured correctly if some part of the laser beam is also reflected by their background. Second, if they are missing in the laser scan but their background (as seen from the laser scan) is there, and the thin object is reconstructed correctly by a reconstruction method, then this reconstruction will likely wrongly be classified as incorrect by the ETH3D evaluation. This is because the reconstruction is probably within the free-space region of the background points. To resolve this, either it has to be ensured that thin objects are represented completely in the laser scans, or their background (as seen from the laser scan) has to be deleted. Another option would be to mark a volume enclosing the thin object as not to be evaluated.

Another pitfall is if something moves in-between the laser scan and the image recording. For example, the images might depict a newly parked car which was not there when the laser scan was made. In this case, a reconstruction of the car would be a correct result of a reconstruction algorithm. However, in the evaluation those points would wrongly be classified as inaccurate because they are in the free-space of the car's background which was observed by the laser scan. In this case, the background of the car must be deleted to resolve this.

In general, it can be helpful to keep the intended evaluation scheme in mind to detect these kinds of issues.

Since the automatic outlier detection tool is imperfect, it might be desirable to move some valid points back from the outlier file to the laser scan file. In particular, due to the way the tool works, many borders of surfaces are cut off, for example rectangular corners get rounded off. There is also a characteristic artifact which is that at the boundary of a more densely sampled surface to a more sparsely sampled surface (due to observing them at different angles), the part of the sparsely sampled surface which is directly next to the denser one is incorrectly classified as an outlier.

As a final recommendation, it turned out to be helpful to go over all point clouds at least twice, since it is extremely easy to miss outliers. Having more experience may also help in seeing more of them.

Semantic labeling of point clouds

Semantic labeling is not part of the dataset processing pipeline, but the PointCloudEditor tool includes support for it and its usage is explained here. To start with labeling, a label definition file has to be loaded with the button on the bottom left in the editor window. Each label is described by a unique index (between 0 and 255; not necessarily sequential), a name, and a color. An example for this file is:

# Each line defines a label with the following attributes (numbers in [0, 255]):
# Index       Name Red Green Blue
      0  unlabeled  70    70   70
      1   building 200     0    0
      2 vegetation   0   200    0
      8     ground 200   200  200

Upon loading the file, the related UI elements show up below the label definition loading button:

The selected label can be assigned to the currently selected points by pressing the L ("Label") key.

Labels are saved and loaded as a separate file next to the point cloud PLY file. Note that a suitable label definition file must be loaded before loading a labeled point cloud. The label file of a point cloud has the same filename as the point cloud, but with the extension changed to "labels". It is a binary file containing the label indices of all points as a single uint8_t buffer. It can be loaded as follows in C++:

std::vector<uint8_t> labels;
FILE* labels_file = fopen(label_path, "rb");
if (!labels_file) {
  // Error: Cannot open file.
}
labels.resize(point_cloud->size());
size_t read_amount = fread(labels.data(), sizeof(uint8_t), labels.size(), labels_file);
fclose(labels_file);
if (read_amount != point_cloud->labels.size()) {
  // Error: The label file size does not match the point cloud size.
}

Cube Map Rendering

In this step, cube map face images are rendered from the laser scans. The scans are cleaned from outliers at this point; no further editing will be done on them. They should be named scan1.ply, scan2.ply, and so on. The cube map images are later used for a SfM (Structure-from-Motion) reconstruction together with the images of the dataset. Notice that using a cube map is an arbitrary choice: the goal only is to provide images of the laser scans (with depth maps) to the SfM program. Since cube map faces use the pinhole projection, they are easy to include in the SfM reconstruction. However, for difficult scenes it may be preferable to render the images in such a way that an individual image shows more of the scene than a cube map face, which would make it more likely that the image can be registered in the SfM reconstruction. Ideally, only a single image per laser scan would be used. Alternatively, the information about the relative poses of the images should be passed to the SfM program in case it is able to make use of it. However, for the ETH3D dataset it was not necessary to do any of that: we used individual cube map face images.

Usage of the CubeMapRenderer program is as follows:

CubeMapRenderer -c <laser_scan.ply> -o <output_base_path> --size <image_side_length>

The program will write a file <output_base_path>.intrinsics.txt, image files <output_base_path>.<face_name>.png, and depth maps <output_base_path>.<face_name>.depth, with <face_name> being up, down, left, right, front, or back. These files must be kept in the same directory. For the ETH3D laser scans with up to 28 million points per scan, we used an image side length of 2048. In order to be recognized correctly later, the output_base_path should be set to a filename according to the corresponding laser scan to <output_folder>/scan<X>.ply, for example <output_folder>/scan1.ply.

External SfM

An SfM reconstruction of the cube map face images created in the previous step and all images of the dataset must be created. This can be done with any suited SfM program. We recommend COLMAP, but do not provide any integration with it to avoid its GPL license. We only use COLMAP's file format specification: the resulting (sparse) SfM reconstruction must be provided in the human-readable COLMAP text format. Only camera models which are supported by the dataset processing pipeline can be used:

In case you would like to add your own camera model, you could "grep" the code for the class and type name of an existing model to find the places where it needs to be handled.

SfM Scale Estimation

The SfMScaleEstimator tool is provided to estimate the scale of SfM reconstructions by comparing the depth of SfM keypoints in the cube map face images to the laser scan depth. Usage of the tool is as follows:

SfMScaleEstimator -s <sfm_model_path> -si <sfm_image_path> -i <scans_path> -o <output_path> --cube_map_face_camera_id <camera_id>

Here, sfm_model_path is the path containing cameras.txt, images.txt, etc. of the SfM model in COLMAP's text format, sfm_image_path is the base path for images referenced in images.txt, scans_path is the folder path of the directory containing the laser scans, and output_path is an arbitrary output path. The camera ID for the camera used for the cube map faces also needs to be given (by default, 1 is assumed).

The result files are a MeshLab project file with the estimated initial alignment of the laser scans, as well as a scaled Colmap model. Notice that the MeshLab project file contains relative paths to the laser scan files, so its relative location to these files should not be changed afterwards (without adjusting the paths in the file). The MeshLab project can be opened in MeshLab to verify that all scans are registered, and that the scans are close enough together for further refinement using ICP. If a scan was not registered, it can be manually inserted into the MeshLab project.

Scan Pose Refinement

The ICPScanAligner tool is provided to refine the laser scan poses using point-to-plane ICP. Usage of the tool is as follows:

ICPScanAligner -i <input_meshlab_project> -o <output_meshlab_project> -d <max_correspondence_distance> --max_iterations <max_iterations> --convergence_threshold <convergence_threshold> --number_of_scales <number_of_scales>

This can be applied to the output of SfMScaleEstimator to refine the laser scan poses, for example with the following parameters:

ICPScanAligner -i <input_meshlab_project> -o <output_meshlab_project> -d 0.01 --max_iterations 100 --convergence_threshold 1e-10 --number_of_scales 4

Additional parameters supported by ICPScanAligner are:

Scan normal estimation

A surface reconstruction of the laser scans will later be used for occlusion handling. For creating this reconstruction, it is helpful to estimate normal vectors for the laser scan point clouds. This can be done with the provided NormalEstimator tool, which can be used as follows:

NormalEstimator -i <meshlab_project_input_path> -o <ply_file_output_path> --neighbor_count <neighbor_count>

As input, the MeshLab project file with the refined laser scan poses from the previous step can be used. The tool will write a single merged point as output in case the project contains multiple laser scans. The neighbor_count setting defaults to 8.

External Surface Reconstruction

In this step, a surface reconstruction (i.e., triangle mesh) of the laser scans must be created, which will later be used for occlusion handling. In principle, any suited reconstruction method may be used. It can be helpful if the method is able to fill in unobserved surfaces such that occlusions by those surfaces are also accounted for. We used Poisson surface reconstruction (MIT License) for the ETH3D dataset. The output is expected to be in PLY format by later processing stages.

PoissonRecon could be used as follows:

PoissonRecon --in <point_cloud_with_normals.ply> --out <surface_mesh.ply> --depth 11 --color 16 --density

Notice that the best value for the --depth parameter depends on the size of the "interesting" part of the scene relative to the bounding box of the scene.

Also notice that the output is required to be strongly subdivided if camera models with lens distortion are used. This is because a later step uses vertex-based distortion to render into images with lens distortion, i.e., the distortion is approximated at the resolution of the mesh. The higher the mesh resolution, the better the approximation.

Splat Creation

For example if using Poisson Surface Reconstruction, it is likely that laser scan points belonging to thin objects will not be well represented in the mesh (e.g., a small cable with only few scan points on it might be missing in the mesh). The provided SplatCreator tool may help resolving this by creating an additional mesh containing splats for points which do not have a nearby surface in the mesh. The tool can be used as follows:

SplatCreator --point_normal_cloud_path <point_cloud_with_normals.ply> --mesh_path <surface_mesh.ply> --output_path <splats.ply> --distance_threshold <distance_threshold>

The distance_threshold defaults to 0.02.

Dataset Inspection

Dataset inspector

The DatasetInspector tool allows viewing the datasets, and drawing image masks. By viewing the images with the laser scans rendered into them, the quality of the image alignment can be verified, and images which are too far off in the initial state can be removed (or their pose can be improved). Image masking may help in getting better results, however, since it is labour-intensive, it may be a good strategy to run the process without masking first and only do it if it is expected to improve the results noticeably. Two types of masks are supported: a) masking out image regions in which the projected scan geometry is incorrect (for example, due to incorrect occlusion handling), and b) masking out image regions where the geometry is correct, but the color is not suited for dense image alignment (for example, due to lens flare, reflective objects, etc.). Mask type a) has the same effect on the registration pipeline as b), while additionally excluding the affected scan geometry from evaluation in case it is masked out from all images.

Tool usage

The DatasetInspector can be run as in the following example:

DatasetInspector \
    --scan_alignment_path scan_alignment.mlp \
    --occlusion_mesh_path surface.ply \
    --occlusion_splats_path splats.ply \
    --multi_res_point_cloud_directory_path multi_res_point_cloud_cache \
    --image_base_path . \
    --state_path state_path

Furthermore, various optimization parameters can be specified which affect the pose and intrinsics optimization of ImageRegistrator and might thus also affect the visualizations in DatasetInspector. Thus, if those parameters are used they must be given to both ImageRegistrator and DatasetInspector. Usually it should not be strictly necessary to change these parameters: for the ETH3D benchmark, the default settings were used both for the DSLR and the multi-camera rig datasets. However, the approach was originally tuned on the DSLR datasets only and was significantly less robust on the multi-camera rig images. Tuning these parameters could potentially improve the robustness. A list of those parameters follows:

The first time the DatasetInspector is run on a dataset, it will take a while to start. This is if the multi-resolution point cloud has not been cached yet and thus needs to be generated. This takes especially long for video datasets with many images. Notice that after making changes to the images in the inspector, the folders specified for --multi_res_point_cloud_directory_path and --cache_observations (if the ImageRegistrator has been run) should be deleted such that they are re-generated using the new state.

Pose verification and improvement

In the graphical user interface of DatasetInspector, the list on the left allows to select an image to view. If there is an (M) behind the image filename, it indicates that the image has a mask. To the right of the image list, the image scale can be selected. In most cases, this can be left at the default which in general shows combined information from all scales. Below the scale selection, different display modes can be selected:

Below this list, the "Show masks" checkbox can be toggled to show the image masks as translucent overlays.

The image view can be translated while holding the middle mouse button respectively mouse wheel. Zooming can be done using the mouse wheel.

If an image is registered badly, the easiest option is to exclude it. There is currently no support for this in the DatasetInspector GUI, but the corresponding lines can be manually deleted from images.txt in the Colmap model.

An alternative requiring more manual effort is to try to improve the image's pose instead. There are two options for this in DatasetInspector. The first is to use the positioning controls on the bottom right to incrementally move the image to the correct place. The step sizes for the movement can be specified in the edit boxes above the buttons. The second option is to use the tool "Localize image". You must use the "scan reprojection" view mode for this to work. After clicking the button, control points for the image registration must be given. Alternatingly, first a scan point must be selected by left-clicking, and then the point on the image where this point belongs to must be selected by left-clicking. During the second step, it can make sense to change to the "image only" view mode. The Esc key aborts the process. To finish the registration, you must specify at least 6 correspondences as described above and then press the Return key. If the correspondences were good (it helps to distribute them over all of the image), the image pose should now be improved. The new optimization state can be saved using the "Save state" button.

Notice that in case a rig image is moved, an inconsistency is created since the images will no longer conform to the rig extrinsics. The "Distr. rel pose" button can be used to distribute the relative image poses of the currently selected rig image set to all other images of this rig.

If you want to add images which have not been registered in the initial registration step, you can insert them manually in the optimization state text file by using an existing image as a template. Make sure to give the new image a unique ID. Then the new image's pose can be set in DatasetInspector as described above. Notice that the pose initialization must be good in order for the refinement step to work.

Image masking and occlusion

In some cases, certain parts of images should not be used for the dense image alignment step. For example, reflections are not modeled and might degrade the image registration. In other cases, some parts should even be treated as erroneous and the benchmark processing should behave as if they do not exist. For example, a tree moving in the wind will not match the way it was captured in the laser scan. In such cases, the images can be masked. This is also done with the DatasetInspector tool. However, since it can be a large effort to mask images, it might be a good strategy to run the process without masking first and only do it if it is expected to improve the result significantly.

For masking, first the masks should be set to visible by ticking the "Show masks" checkbox. The mask drawing tools can be activated with the "Draw eval + obs mask", "Draw obs mask", and "Mask eraser" buttons on the top right. All of these tools behave the same: they allow to draw a polygon on the image by subsequently left-clicking all corner points of the polygon. Backspace undoes the last point, Esc removes all points. Return finishes the polygon and draws the mask on the image. The "eval + obs mask" is displayed in red and will treat the image region as erroneous. The "obs mask" is displayed in green and will only exclude the image region from the image pose and intrinsics refinement step. Drawing a mask will set the image mask to modified as indicated by a star behind the image name in the image list. To save the mask for the current image, the "Save image mask" button must be clicked. With the "Save camera mask" button, the current image's mask can be saved as a camera mask (such that it is applied for all images of this camera in addition to the individual image masks). This is however rarely necessary. It can be used to mask out camera problems (for example black borders or dirt on the sensor that remains in a fixed place on the image).

Examples for what should be masked out with "green" (obs mask):

Note that looking at the images alone is not sufficient to find all such problems. For example, it could happen that there is a strong shadow boundary in a laser scan in a place where no boundary is visible in the image. In this case, the part of the image with the boundary in the scan should still be masked even though nothing is visible there by looking at the image only (since masking the scan points is not implemented). Use the "scan reprojection" mode to find these places.

Examples for what should be masked out with "red" (eval + obs mask):

It can become very tedious to mask out all of these regions by hand. The "Label transfer" function is sometimes helpful to reduce the effort. Notice that it relies on accurate image poses and therefore it might be better to first run the pose refinement once, then do the masking using this function, and then re-run the pose refinement. To use label transfer, first select the image which the mask labels should be transferred to. Then click the "Label transfer" button and select the image from which the labels should be taken (by default the previous image is selected). This source image should ideally be spatially close to the target image and also temporally close if it matters (for example if shadows move over time in an outdoor scene). The labels will then be transferred via the scan geometry and the image poses. The result must be checked for errors. The transfer function can also be used multiple times and will accumulate labels in this case, while not allowing "green" labels to overwrite "red" ones.

As noted above, an alternative way to fix occlusion problems is to edit the occlusion mesh directly. This can be more efficient because it fixes the problem for all images at once. However, since the edits are made to the surface mesh, they will be lost in case the laser scan registration is improved afterwards (and thus the surface mesh must be re-generated). Furthermore, it can be hard to see where objects that are missing in the laser scan need to be placed.

To start editing, click the "Edit occlusion meshes" button. The point cloud editor will open with the relevant files loaded. Note that this is a subwindow of the dataset inspector in this case, so it will automatically also close if the dataset inspector window is closed.

Both the "splats.ply" file and the surface mesh file will be used for occlusion testing if they are specified for the respective program argument. Thus, in principle both can be edited. The point cloud editor behaves similarly in the case of mesh editing as in the case of point cloud editing. In this use-case, it can be helpful to first use the "Set up direction" tool while the surface mesh is selected to fix the camera up direction (this is a transient setting and must be done on each start). Vertices can be selected in the same way as points of a cloud, however vertex visibility in the mesh will be taken into account such that vertices which are occluded by other parts of the same mesh will not be accidentally selected. The M ("move") key is not implemented for mesh vertices. Vertices can be deleted with the Del key, which will also delete all of their adjacent faces. Vertices can be moved by pressing G, moving the mouse, and left-clicking to finish. A right-click instead of left-click cancels the move.

While this can be used to make edits, it can be very tedious. An alternative is to use CSG operations via the CSG tool. To use this tool, click the "CSG Tool" button. A cube will show up at the look-at point of the 3D view (if nothing becomes visible, try zooming out or in). The cube can be moved by pressing the G key, moving the mouse, and finishing with a left-click (remember that the 3D view must have focus to receive key presses). The cube can be rotated in the same way using the R key. It can also be scaled with the S key. For scaling, the X, Y, and Z keys can be pressed after pressing S to constrain the scaling to the corresponding local axis. The cube will remain subdivided according to the edge length given in the edit box below the "CSG Tool" button. By pressing Return, the cube will be added to the bottommost mesh in the object list on the left (union operation). By holding Control while pressing Return, the cube will be subtracted from the mesh (A minus B operation). In both cases, the result will be added as a new mesh on the bottom of the mesh list. The result can be verified there before closing the original mesh. The Cork library which is used for CSG operations sometimes returns wrong results.

Note that for some parts of the processing pipeline, it is necessary for the surface mesh to be subdivided strongly. Violating this will not directly degrade the accuracy but lead to wrong results in the visibility estimation. As a rule of thumb it might be helpful to leave the level of subdivision roughly equal to how it is in the original mesh.

Further, notice that the visibility estimation finds occlusion boundaries by finding edges for which one side looks towards the camera and the other side looks away from the camera. Thus, one should take care not to introduce too many artificial occlusion boundaries while editing by avoiding to create rough surfaces.

After an edit is made to the surface mesh (or the splats), the update can be transferred to the dataset inspector by clicking "Reload occlusion meshes" there. The meshes do not need to be saved for this to work (but it might still be a good idea to save them often). This operation typically takes a few seconds.

Image Registrator

This tool implements refinement of image intrinsics and poses by direct alignment with the laser scans. A typical invocation looks as in the following example:

ImageRegistrator \
    --scan_alignment_path scan_alignment.mlp \
    --occlusion_mesh_path surface.ply \
    --occlusion_splats_path splats.ply \
    --multi_res_point_cloud_directory_path multi_res_point_cloud_cache \
    --image_base_path . \
    --state_path colmap_model \
    --output_folder_path output \
    --observations_cache_path observations_cache \
    --camera_ids_to_ignore 0

The following program arguments are most important:

The following additional arguments are supported:

Furthermore, the optimization parameters can be adjusted which were already described in the section about DatasetInspector.

Below is an example refinement result for the DSLR dataset used in the step-by-step example above: on the left is a visualization of the initial state ("depth map over image, no occ" mode in DatasetInspector), on the right is a visualization of the final state. Notice that additional masking or editing of the occlusion mesh would be helpful here to better handle the chair legs. Large parts of them had to be deleted from the laser scan since they were reflective and thus measured wrongly by the scanner.

Initial state Final state

Ground Truth Creation

The final step is to create the ground truth files for a given scene. The GroundTruthCreator tool implements functionality to create ground truth depth maps and to limit the laser scans to parts which are observed by at least two images, such that they can be used to evaluate multi-view reconstructions in 3D. The tool can also render occlusion depth maps and render the laser scans on top of the dataset images. Usage of the tool is as in the following example:

GroundTruthCreator \
    --scan_alignment_path scan_alignment.mlp \
    --occlusion_mesh_path surface.ply \
    --occlusion_splats_path splats.ply \
    --image_base_path . \
    --state_path scale_1_state \
    --output_folder_path ground_truth \
    --rotate_first_scan_upright 1 \
    --scan_point_radius 2 \
    --write_point_cloud 1 \
    --write_depth_maps 1 \
    --write_occlusion_depth 0 \
    --write_scan_renderings 0

The first set of parameters specifies the required input paths, as for ImageRegistrator. If --rotate_first_scan_upright is set to true, the coordinate system of the result is rotated such that the first laser scan faces upright. The write_<...> parameters control which type of output is created. If --write_scan_renderings is set to 1 (true), the --scan_point_radius parameter controls the size of the rendered points for the laser scans. Set this to 0 for rendering each point as one pixel only.

The occlusion depth and ground truth depth images are written as raw float buffers of the same size as the image. In C++ they could be loaded as in the following example, given the image size:

FILE* ground_truth_depth_file = fopen(ground_truth_depth_file_path.c_str(), "rb");
fread(gt_depth_map.data, sizeof(float), gt_depth_map.rows * gt_depth_map.cols, ground_truth_depth_file);
fclose(ground_truth_depth_file);

The ground truth depth map is set to infinity at pixels for which no depth is available.