gkiavash / Master-Thesis-Structure-from-Motion

1 stars 1 forks source link

Study papers 1: SfM, BA, and feature matching #2

Closed gkiavash closed 1 year ago

gkiavash commented 1 year ago

* : additional resourses

gkiavash commented 1 year ago

Multi-View Optimization of Local Feature Geometry

3) Method

  1. Overview
    • a tentative matches graph G = (V, E) with keypoints as nodes and matches as edges, optionally weighted (e.g., by the cosine similarity of descriptors)
    • For each edge(match), perform a two-view refinement using a patch alignment network. This network is used to annotate the edges of the tentative matches graph with geometric transformations Tu→v, Tv→u.
    • we partition the graph into components (i.e., features tracks) and find a global consensus by optimizing a non-linear least squares problem over the keypoint locations, given the estimated two-view transformations.
  2. Two-view refinement
    • Given local patches, Pu, Pv around the initial keypoint locations u, v ∈ R2, predicts the flow du→v of the central pixel from one patch to the other and vice versa as dv→u.
    • Use Siamese architecture for feature extraction followed by a correlation layer for don product similarity.
    • Patch has h×w dimension. apply CNN for each pixel and create "d" dimension description (h×w×d dimension)
    • Run dot product similarity between two patches => R^(hwh*w)
    • Regress the minimum value to be new displacement
  3. Multi-view refinement

    • displacement chains without loops may make a big error
    • create tracks for each 3D point in all images
    • feature matching is imperfect. A single incorrect match can merge two tracks => partition the connected components into smaller, more reliable subsets based on the descriptor cosine similarity
    • approximate the full flow field between two patches by a 3 × 3 displacement grid and use bi-square interpolation
    • asd

    Evaluation:

gkiavash commented 1 year ago

Deep Two-View Structure-from-Motion Revisited

  1. An optical flow estimation network that predicts dense correspondences between two frames;
  2. A normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences
  3. A scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.

Related Works

  1. Type 1: monocular depth estimation network and a pose regression network, self-supervisory
    • SfMLearner estimates mask to exclude the dynamic objects GeoNet utilizes an optical flow module to mask out these outliers by comparing the rigid flow
  2. Type 2: require two image frames to estimate depth maps and camera poses at test time
    • DeMoN concatenates a pair of frames and uses multiple stacked encoder-decoder networks to regress camera poses and depth maps, implicitly utilizing multi-view geometry.

asd

3) Method Our method is able to find better matching points and therefore more accurate poses and depth maps, especially for textureless and occluded areas. At the same time, it follows the wisdom of classic methods to avoid ill-posed problems.

  1. Optical Flow Estimation
    • DICLFlow to generate dense matching points between two consecutive frames: This method uses a displacement invariant matching cost learning strategy and a soft-argmin projection layer to ensure that the network learns dense matching points rather than image-flow regression
  2. Essential Matrix Estimation
    • Use matching points to compute camera poses (previous deep learning-based methods regress the camera poses from input images)
    • We should robustly filter the noisy dense matches from optical flow.
    • We found out that using SIFT keypoint locations to generate a mask works well in all datasets. The optical flow matches at the locations within the mask are filtered to avoid distraction by dynamic objects. Idea: more accurate optical flow in richer area
  3. Scale-Invariant Depth Estimation
    • Performed matching again by reducing the search space to epipolar lines computed from the relative camera poses. This process is similar to multi-view stereo (MVS) matching with one important difference: we do not have the absolute scale in inference
    • Plane-sweep powered networks require consistent scale during the training and testing process.
  4. Loss Function:
    • Refer to the image:
    • asd

4) Experiments

gkiavash commented 1 year ago

BA-NET: DENSE BUNDLE ADJUSTMENT NETWORKS

Solve SfM problem via feature-metric bundle adjustment (BA), which explicitly enforces multi-view geometry constraints in the form of feature-metric error

Introduction

3 BUNDLE ADJUSTMENT REVISITED

4 THE BA-NET ARCHITECTURE

asd

5 EVALUATION

gkiavash commented 1 year ago

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

Related work

4. Approach

4.1. Featuremetric optimization

Direct alignment:

Learned representation:

4.2. Keypoint adjustment

asd

4.3. Bundle adjustment

5. Experiments

gkiavash commented 1 year ago

Metrics: Known Ground Truth:

gkiavash commented 1 year ago

Deep Patch Visual Odometry

1. Introduction

3. Approach

Our approach has two main modules:

3.1. Feature and Patch Extraction

3.2. Update Operator