Abstract
The regulatory approval and broad clinical deployment of medical AI have been hampered by the perception that deep learning models fail in unpredictable and possibly catastrophic ways. A lack of statistically rigorous uncertainty quantification is a significant factor undermining trust in AI results. Recent developments in distribution-free uncertainty quantification present practical solutions for these issues by providing reliability guarantees for black-box models on arbitrary data distributions as formally valid finite-sample prediction intervals. Our work applies these new uncertainty quantification methods -- specifically conformal prediction -- to a deep-learning model for grading the severity of spinal stenosis in lumbar spine MRI. We demonstrate a technique for forming ordinal prediction sets that are guaranteed to contain the correct stenosis severity within a user-defined probability (confidence interval). On a dataset of 409 MRI exams processed by the deep-learning model, the conformal method provides tight coverage with small prediction set sizes. Furthermore, we explore the potential clinical applicability of flagging cases with high uncertainty predictions (large prediction sets) by quantifying an increase in the prevalence of significant imaging abnormalities (e.g. motion artifacts, metallic artifacts, and tumors) that could degrade confidence in predictive performance when compared to a random sample of cases.
Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation
Authors: Zhehan Kan, Shuoshuo Chen, Zeng Li, Zhihai He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
We observe that human poses exhibit strong group-wise structural correlation and spatial coupling between keypoints due to the biological constraints of different body parts. This group-wise structural correlation can be explored to improve the accuracy and robustness of human pose estimation. In this work, we develop a self-constrained prediction-verification network to characterize and learn the structural correlation between keypoints during training. During the inference stage, the feedback information from the verification network allows us to perform further optimization of pose prediction, which significantly improves the performance of human pose estimation. Specifically, we partition the keypoints into groups according to the biological structure of human body. Within each group, the keypoints are further partitioned into two subsets, high-confidence base keypoints and low-confidence terminal keypoints. We develop a self-constrained prediction-verification network to perform forward and backward predictions between these keypoint subsets. One fundamental challenge in pose estimation, as well as in generic prediction tasks, is that there is no mechanism for us to verify if the obtained pose estimation or prediction results are accurate or not, since the ground truth is not available. Once successfully learned, the verification network serves as an accuracy verification module for the forward pose prediction. During the inference stage, it can be used to guide the local optimization of the pose estimation results of low-confidence keypoints with the self-constrained loss on high-confidence keypoints as the objective function. Our extensive experimental results on benchmark MS COCO and CrowdPose datasets demonstrate that the proposed method can significantly improve the pose estimation results.
Keyword: scaling
Variational energy based XPINNs for phase field analysis in brittle fracture
Abstract
Modeling fracture is computationally expensive even in computational simulations of two-dimensional problems. Hence, scaling up the available approaches to be directly applied to large components or systems crucial for real applications become challenging. In this work. we propose domain decomposition framework for the variational physics-informed neural networks to accurately approximate the crack path defined using the phase field approach. We show that coupling domain decomposition and adaptive refinement schemes permits to focus the numerical effort where it is most needed: around the zones where crack propagates. No a priori knowledge of the damage pattern is required. The ability to use numerous deep or shallow neural networks in the smaller subdomains gives the proposed method the ability to be parallelized. Additionally, the framework is integrated with adaptive non-linear activation functions which enhance the learning ability of the networks, and results in faster convergence. The efficiency of the proposed approach is demonstrated numerically with three examples relevant to engineering fracture mechanics. Upon the acceptance of the manuscript, all the codes associated with the manuscript will be made available on Github.
Difference in Euclidean Norm Can Cause Semantic Divergence in Batch Normalization
Authors: Zhennan Wang, Kehan Li, Runyi Yu, Yian Zhao, Pengchong Qiao, Guoli Song, Fan Xu, Jie Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In this paper, we show that the difference in Euclidean norm of samples can make a contribution to the semantic divergence and even confusion, after the spatial translation and scaling transformation in batch normalization. To address this issue, we propose an intuitive but effective method to equalize the Euclidean norms of sample vectors. Concretely, we $l_2$-normalize each sample vector before batch normalization, and therefore the sample vectors are of the same magnitude. Since the proposed method combines the $l_2$ normalization and batch normalization, we name our method as $L_2$BN. The $L_2$BN can strengthen the compactness of intra-class features and enlarge the discrepancy of inter-class features. In addition, it can help the gradient converge to a stable scale. The $L_2$BN is easy to implement and can exert its effect without any additional parameters and hyper-parameters. Therefore, it can be used as a basic normalization method for neural networks. We evaluate the effectiveness of $L_2$BN through extensive experiments with various models on image classification and acoustic scene classification tasks. The experimental results demonstrate that the $L_2$BN is able to boost the generalization ability of various neural network models and achieve considerable performance improvements.
Keyword: calibration
DCT-Net: Domain-Calibrated Translation for Portrait Stylization
Abstract
This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization. Given limited style exemplars ($\sim$100), the new architecture can produce high-quality style transfer results with advanced ability to synthesize high-fidelity contents and strong generality to handle complicated scenes (e.g., occlusions and accessories). Moreover, it enables full-body image translation via one elegant evaluation network trained by partial observations (i.e., stylized heads). Few-shot learning based style transfer is challenging since the learned model can easily become overfitted in the target domain, due to the biased distribution formed by only a few training examples. This paper aims to handle the challenge by adopting the key idea of "calibration first, translation later" and exploring the augmented global structure with locally-focused translation. Specifically, the proposed DCT-Net consists of three modules: a content adapter borrowing the powerful prior from source photos to calibrate the content distribution of target samples; a geometry expansion module using affine transformations to release spatially semantic constraints; and a texture translation module leveraging samples produced by the calibrated distribution to learn a fine-grained conversion. Experimental results demonstrate the proposed method's superiority over the state of the art in head stylization and its effectiveness on full image translation with adaptive deformations.
VMRF: View Matching Neural Radiance Fields
Authors: Jiahui Zhang, Fangneng Zhan, Rongliang Wu, Yingchen Yu, Wenqing Zhang, Bai Song, Xiaoqin Zhang, Shijian Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Neural Radiance Fields (NeRF) have demonstrated very impressive performance in novel view synthesis via implicitly modelling 3D representations from multi-view 2D images. However, most existing studies train NeRF models with either reasonable camera pose initialization or manually-crafted camera pose distributions which are often unavailable or hard to acquire in various real-world data. We design VMRF, an innovative view matching NeRF that enables effective NeRF training without requiring prior knowledge in camera poses or camera pose distributions. VMRF introduces a view matching scheme, which exploits unbalanced optimal transport to produce a feature transport plan for mapping a rendered image with randomly initialized camera pose to the corresponding real image. With the feature transport plan as the guidance, a novel pose calibration technique is designed which rectifies the initially randomized camera poses by predicting relative pose transformations between the pair of rendered and real images. Extensive experiments over a number of synthetic and real datasets show that the proposed VMRF outperforms the state-of-the-art qualitatively and quantitatively by large margins.
Spike Calibration: Fast and Accurate Conversion of Spiking Neural Network for Object Detection and Segmentation
Authors: Yang Li, Xiang He, Yiting Dong, Qingqun Kong, Yi Zeng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Spiking neural network (SNN) has been attached to great importance due to the properties of high biological plausibility and low energy consumption on neuromorphic hardware. As an efficient method to obtain deep SNN, the conversion method has exhibited high performance on various large-scale datasets. However, it typically suffers from severe performance degradation and high time delays. In particular, most of the previous work focuses on simple classification tasks while ignoring the precise approximation to ANN output. In this paper, we first theoretically analyze the conversion errors and derive the harmful effects of time-varying extremes on synaptic currents. We propose the Spike Calibration (SpiCalib) to eliminate the damage of discrete spikes to the output distribution and modify the LIPooling to allow conversion of the arbitrary MaxPooling layer losslessly. Moreover, Bayesian optimization for optimal normalization parameters is proposed to avoid empirical settings. The experimental results demonstrate the state-of-the-art performance on classification, object detection, and segmentation tasks. To the best of our knowledge, this is the first time to obtain SNN comparable to ANN on these tasks simultaneously. Moreover, we only need 1/50 inference time of the previous work on the detection task and can achieve the same performance under 0.492$\times$ energy consumption of ANN on the segmentation task.
Multi-View Object Pose Refinement With Differentiable Renderer
Authors: Ivan Shugurov, Ivan Pavlov, Sergey Zakharov, Slobodan Ilic
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper introduces a novel multi-view 6 DoF object pose refinement approach focusing on improving methods trained on synthetic data. It is based on the DPOD detector, which produces dense 2D-3D correspondences between the model vertices and the image pixels in each frame. We have opted for the use of multiple frames with known relative camera transformations, as it allows introduction of geometrical constraints via an interpretable ICP-like loss function. The loss function is implemented with a differentiable renderer and is optimized iteratively. We also demonstrate that a full detection and refinement pipeline, which is trained solely on synthetic data, can be used for auto-labeling real data. We perform quantitative evaluation on LineMOD, Occlusion, Homebrewed and YCB-V datasets and report excellent performance in comparison to the state-of-the-art methods trained on the synthetic and real data. We demonstrate empirically that our approach requires only a few frames and is robust to close camera locations and noise in extrinsic camera calibration, making its practical usage easier and more ubiquitous.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Improving Trustworthiness of AI Disease Severity Rating in Medical Imaging with Ordinal Conformal Prediction Sets
Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation
Keyword: scaling
Variational energy based XPINNs for phase field analysis in brittle fracture
Difference in Euclidean Norm Can Cause Semantic Divergence in Batch Normalization
Keyword: calibration
DCT-Net: Domain-Calibrated Translation for Portrait Stylization
VMRF: View Matching Neural Radiance Fields
Spike Calibration: Fast and Accurate Conversion of Spiking Neural Network for Object Detection and Segmentation
Multi-View Object Pose Refinement With Differentiable Renderer