Uncertainty in Computer Vision

Bootstrapping Neural Processes
Evaluting Scalable Bayesian Deep Learning Methods for Robust Computer Vision (CVPR2020 Workshop)
Bayesian Semantic Instance Segmentation in Open Set World (ECCV2018)
Uncertainty-aware Instance Segmentation using Dropout Sampling
Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving (ICCV2019)
Probabilistic Deep Learning for Instance Segmentation
Efficient Uncertainty Estimation for Semantic Segmentation in Videos (ECCV2018)
Introspective Robot Perception using Smoothed Predictions from Bayesian Neural Networks

Bootstrapping Neural Processes (NIPS2020)

Goal

construct a stochastic process that describe the data well
data-driven way of defining stochastic processes
- define a flexible class of stochastic processes well suited for highly non-trivial functions that are not easily represented by existing stochastic processes
Drop robustness under the presence of model-data mismatch, where test data distributions different from the one used to train the model

Model: Bootstrapping Neural Process(BNP)

: an extension of NP using bootstrap to induce functional uncertainty, "data-driven" way of computing the uncertainty of theta

utilize bootstrap to construct multiple resampled datasets
combines the predictions computed from them (functional uncertainty is then naturally induced by the uncertainty in the bootstrap procedure) By using bootstrapping
- Bootstrap: a technique to model uncertainty in parameter estimation by simulating population distribution via resampling (estimate the sampling distribution of theta from multiple datasets resampled from X)
- Residual Bootstrap: fixes X and only resamples the residuals of predictions
- resolves the issue of missing x in bootstrap datasets

How?

Point-wise Uncertainty -> induces stochasticity in function realization
Global latent variable capturing functional uncertainty ( a global uncertainty in the overall structure of the function)

Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision (CVPR2020 Workshop)

Uncertainty

Epistemic uncertainty: uncertainty in the DNN model parameters -> hard bc the uncertainty in the model parameters is disregarded (e.g. vast dimensionality of the parameter space)
Aleatoric uncertainty: inherent and irreducible data noise -> efficiently estimated by letting a DNN directly output the parameters of a certain probability distribution(modeling the conditional distribution of the target given the input)
Solution of uncertainty
1. MC dropout
2. Ensemble

Bayesian Semantic Instance Segmentation in Open Set World (ECCV2018)

Input(: RGB image)
-> (Object Detector) AND (Boundary Detector) -> (Simulated Annealing Optimization) ->Output (: a set of regions which are perceptually grouped and are each associated either to a known detection or an unknown object class)

Uncertainty-aware Instance Segmentation using Dropout Sampling

Dropout Sampling for Instance Segmentation

to capture both semantic and spatial uncertainty using Mask-RCNN
1. Apply dropout to the fully-connected layers of Mask-RCNN, which are responsible for providing class scores and bounding box locations for each detection in the image
2. compute uncertainty score
  - the average softmax scores of an observation U_sem(O)
  - spatial uncertainty u_spl(O) by mean IoU
  - number of forward passes _n(O)
  - Hybrid Metric U_h(O) = U_sem(O) U_spl(O) U_n(O)

Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving (ICCV2019)

real-time operation by modeling the bounding box of YOLOv3, which is the most representative of one-stage detectors, with a Gaussian parameter and redesigning the loss function
predicting the localization uncertainty that indicates the reliability of Xbox
Gaussian Modeling
YOLOv3 bbox coordinates : (t_x, t_y, t_w, t_h), objectness score(whether an object is present or not in the bbox), class scores(the category of the object, 0~1) add Gaussian parameter of bounding box coordinates

Probabilistic Deep Learning for Instance Segmentation

Probabilistic convolutional neural networks: predict distributions of predictions instead of point estimates and quantify the inherent uncertainty of predictions
Limitations of the proposal-based segmentation method: on bio-medical domain. bounding-boxes frequently contain large parts of instances of the same category, which deteriorates segmentation performance -> limit their analysis of results to a probabilistic object detection benchmark
Limitations of proposal-free models: binary output misses information like confidence scores or uncertainty measures

Uncertainties that make up the predictive uncertainty

Data Uncertainty(i.e. aleatoric uncertainty)
- due to noise in the observation and measurement process of data or ambiguities in the annotation process (does not decrease with more training data)
Model Uncertainty(i.e. epistemic uncertainty)
- due to uncertainty about the model architecture and parameters
- commonly estimated by approximating the unknown parameter distribution by simple variational distribution
  - Gaussian distribution(but high computational complexity and memory consumption)
  - Bernoulli distribution => Dropout

Efficient Uncertainty Estimation for Semantic Segmentation in Videos (ECCV2018)

Model Description

propose temporal aggregation(TA) and region-based temporal aggregation(RTA)
Temporal Aggregation
- approximates the sampling procedure of MC dropout by calculating the moving average of the outputs in consecutive frames
- To obtain the correct aggregation for moving objects in video, Utilize optical flow to catch the flow of each pixel in the frame and aggregate each pixel's output depending on the flow
Region-based Temporal Aggregation
- due to poor flow estimation
- dynamically assign multiplying factor (alpha), which is used to decide the weight of incoming data, depending on the reconstruction error for every pixel
  - for regions that have a wrong optical flow estimation (i.e., the reconstruction error is greater than a threshold), we use larger multiplying factor to let the pixels rely more on itself rather than the previous predictions

Uncertainty Metrics

Pixel-level Metric
Frame-level Metric
- Kendall tau
- Ranking IoU

Introspective Robot Perception using Smoothed Predictions from Bayesian Neural Networks

Employ a Bayesian Neural Network
1. Concrete Dropout(CDP)
2. Kronecker-factored Laplace Approximation(LAP)

JisuHann / One-day-One-paper

Uncertainty in Computer Vision #19