Abstract
Machine learning models deployed on medical imaging tasks must be equipped with out-of-distribution detection capabilities in order to avoid erroneous predictions. It is unsure whether out-of-distribution detection models reliant on deep neural networks are suitable for detecting domain shifts in medical imaging. Gaussian Processes can reliably separate in-distribution data points from out-of-distribution data points via their mathematical construction. Hence, we propose a parameter efficient Bayesian layer for hierarchical convolutional Gaussian Processes that incorporates Gaussian Processes operating in Wasserstein-2 space to reliably propagate uncertainty. This directly replaces convolving Gaussian Processes with a distance-preserving affine operator on distributions. Our experiments on brain tissue-segmentation show that the resulting architecture approaches the performance of well-established deterministic segmentation algorithms (U-Net), which has not been achieved with previous hierarchical Gaussian Processes. Moreover, by applying the same segmentation model to out-of-distribution data (i.e., images with pathology such as brain tumors), we show that our uncertainty estimates result in out-of-distribution detection that outperforms the capabilities of previous Bayesian networks and reconstruction-based approaches that learn normative distributions. To facilitate future work our code is publicly available.
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
Safe, Learning-Based MPC for Highway Driving under Lane-Change Uncertainty: A Distributionally Robust Approach
Authors: Mathijs Schuurmans, Alexander Katriniok, Christopher Meissen, H. Eric Tseng, Panagiotis Patrinos
Abstract
We present a case study applying learning-based distributionally robust model predictive control to highway motion planning under stochastic uncertainty of the lane change behavior of surrounding road users. The dynamics of road users are modelled using Markov jump systems, in which the switching variable describes the desired lane of the vehicle under consideration. We assume the switching probabilities of the underlying Markov chain to be unknown. As the vehicle is observed and thus, samples from the Markov chain are drawn, the transition probabilities are estimated along with an ambiguity set which accounts for misestimations of these probabilities. Correspondingly, a distributionally robust optimal control problem is formulated over a scenario tree, and solved in receding horizon. As a result, a motion planning procedure is obtained which through observation of the target vehicle gradually becomes less conservative while avoiding overconfidence in estimates obtained from small sample sizes. We present an extensive numerical case study, comparing the effects of several different design aspects on the controller performance and safety.
Keyword: confidence
SCAI: A Spectral data Classification framework with Adaptive Inference for the IoT platform
Abstract
Currently, it is a hot research topic to realize accurate, efficient, and real-time identification of massive spectral data with the help of deep learning and IoT technology. Deep neural networks played a key role in spectral analysis. However, the inference of deeper models is performed in a static manner, and cannot be adjusted according to the device. Not all samples need to allocate all computation to reach confident prediction, which hinders maximizing the overall performance. To address the above issues, we propose a Spectral data Classification framework with Adaptive Inference. Specifically, to allocate different computations for different samples while better exploiting the collaboration among different devices, we leverage Early-exit architecture, place intermediate classifiers at different depths of the architecture, and the model outputs the results when the prediction confidence reaches a preset threshold. We propose a training paradigm of self-distillation learning, the deepest classifier performs soft supervision on the shallow ones to maximize their performance and training speed. At the same time, to mitigate the vulnerability of performance to the location and number settings of intermediate classifiers in the Early-exit paradigm, we propose a Position-Adaptive residual network. It can adjust the number of layers in each block at different curve positions, so it can focus on important positions of the curve (e.g.: Raman peak), and accurately allocate the appropriate computational budget based on task performance and computing resources. To the best of our knowledge, this paper is the first attempt to conduct optimization by adaptive inference for spectral detection under the IoT platform. We conducted many experiments, the experimental results show that our proposed method can achieve higher performance with less computational budget than existing methods.
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Authors: Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao
Abstract
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks. However, these models contain enormous amounts of parameters, which restrict their deployment to real-world applications. To reduce the model size, researchers prune these models based on the weights' importance scores. However, such scores are usually estimated on mini-batches during training, which incurs large variability/uncertainty due to mini-batch sampling and complicated training dynamics. As a result, some crucial weights could be pruned by commonly used pruning methods because of such uncertainty, which makes training unstable and hurts generalization. To resolve this issue, we propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation. In particular, for the weights with low importance scores but high uncertainty, PLATON tends to retain them and explores their capacity. We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification to validate the effectiveness of PLATON. Results demonstrate that PLATON manifests notable improvement under different sparsity levels. Our code is publicly available at https://github.com/QingruZhang/PLATON.
AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation
Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Motion estimation approaches typically employ sensor fusion techniques, such as the Kalman Filter, to handle individual sensor failures. More recently, deep learning-based fusion approaches have been proposed, increasing the performance and requiring less model-specific implementations. However, current deep fusion approaches often assume that sensors are synchronised, which is not always practical, especially for low-cost hardware. To address this limitation, in this work, we propose AFT-VO, a novel transformer-based sensor fusion architecture to estimate VO from multiple sensors. Our framework combines predictions from asynchronous multi-view cameras and accounts for the time discrepancies of measurements coming from different sources. Our approach first employs a Mixture Density Network (MDN) to estimate the probability distributions of the 6-DoF poses for every camera in the system. Then a novel transformer-based fusion module, AFT-VO, is introduced, which combines these asynchronous pose estimations, along with their confidences. More specifically, we introduce Discretiser and Source Encoding techniques which enable the fusion of multi-source asynchronous signals. We evaluate our approach on the popular nuScenes and KITTI datasets. Our experiments demonstrate that multi-view fusion for VO estimation provides robust and accurate trajectories, outperforming the state of the art in both challenging weather and lighting conditions.
Efficient Private SCO for Heavy-Tailed Data via Clipping
Authors: Chenhan Jin, Kaiwen Zhou, Bo Han, James Cheng, Ming-Chang Yang
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Prior work on this problem is restricted to the gradient descent (GD) method, which is inefficient for large-scale problems. In this paper, we resolve this issue and derive the first high-probability bounds for private stochastic method with clipping. For general convex problems, we derive excess population risks $\Tilde{O}\left(\frac{d^{1/7}\sqrt{\ln\frac{(n \epsilon)^2}{\beta d}}}{(n\epsilon)^{2/7}}\right)$ and $\Tilde{O}\left(\frac{d^{1/7}\ln\frac{(n\epsilon)^2}{\beta d}}{(n\epsilon)^{2/7}}\right)$ under bounded or unbounded domain assumption, respectively (here $n$ is the sample size, $d$ is the dimension of the data, $\beta$ is the confidence level and $\epsilon$ is the private level). Then, we extend our analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with H$\ddot{\text{o}}$lder-continuous gradients). We establish new excess risk bounds without bounded domain assumption. The results above achieve lower excess risks and gradient complexities than existing methods in their corresponding cases. Numerical experiments are conducted to justify the theoretical improvement.
PARTICUL: Part Identification with Confidence measure using Unsupervised Learning
Abstract
In this paper, we present PARTICUL, a novel algorithm for unsupervised learning of part detectors from datasets used in fine-grained recognition. It exploits the macro-similarities of all images in the training set in order to mine for recurring patterns in the feature space of a pre-trained convolutional neural network. We propose new objective functions enforcing the locality and unicity of the detected parts. Additionally, we embed our detectors with a confidence measure based on correlation scores, allowing the system to estimate the visibility of each part. We apply our method on two public fine-grained datasets (Caltech-UCSD Bird 200 and Stanford Cars) and show that our detectors can consistently highlight parts of the object while providing a good measure of the confidence in their prediction. We also demonstrate that these detectors can be directly used to build part-based fine-grained classifiers that provide a good compromise between the transparency of prototype-based approaches and the performance of non-interpretable methods.
Safe, Learning-Based MPC for Highway Driving under Lane-Change Uncertainty: A Distributionally Robust Approach
Authors: Mathijs Schuurmans, Alexander Katriniok, Christopher Meissen, H. Eric Tseng, Panagiotis Patrinos
Abstract
We present a case study applying learning-based distributionally robust model predictive control to highway motion planning under stochastic uncertainty of the lane change behavior of surrounding road users. The dynamics of road users are modelled using Markov jump systems, in which the switching variable describes the desired lane of the vehicle under consideration. We assume the switching probabilities of the underlying Markov chain to be unknown. As the vehicle is observed and thus, samples from the Markov chain are drawn, the transition probabilities are estimated along with an ambiguity set which accounts for misestimations of these probabilities. Correspondingly, a distributionally robust optimal control problem is formulated over a scenario tree, and solved in receding horizon. As a result, a motion planning procedure is obtained which through observation of the target vehicle gradually becomes less conservative while avoiding overconfidence in estimates obtained from small sample sizes. We present an extensive numerical case study, comparing the effects of several different design aspects on the controller performance and safety.
Process Knowledge-Infused AI: Towards User-level Explainability, Interpretability, and Safety
Abstract
AI systems have been widely adopted across various domains in the real world. However, in high-value, sensitive, or safety-critical applications such as self-management for personalized health or food recommendation with a specific purpose (e.g., allergy-aware recipe recommendations), their adoption is unlikely. Firstly, the AI system needs to follow guidelines or well-defined processes set by experts; the data alone will not be adequate. For example, to diagnose the severity of depression, mental healthcare providers use Patient Health Questionnaire (PHQ-9). So if an AI system were to be used for diagnosis, the medical guideline implied by the PHQ-9 needs to be used. Likewise, a nutritionist's knowledge and steps would need to be used for an AI system that guides a diabetic patient in developing a food plan. Second, the BlackBox nature typical of many current AI systems will not work; the user of an AI system will need to be able to give user-understandable explanations, explanations constructed using concepts that humans can understand and are familiar with. This is the key to eliciting confidence and trust in the AI system. For such applications, in addition to data and domain knowledge, the AI systems need to have access to and use the Process Knowledge, an ordered set of steps that the AI system needs to use or adhere to.
Iso-CapsNet: Isomorphic Capsule Network for Brain Graph Representation Learning
Abstract
Brain graph representation learning serves as the fundamental technique for brain diseases diagnosis. Great efforts from both the academic and industrial communities have been devoted to brain graph representation learning in recent years. The isomorphic neural network (IsoNN) introduced recently can automatically learn the existence of sub-graph patterns in brain graphs, which is also the state-of-the-art brain graph representation learning method by this context so far. However, IsoNN fails to capture the orientations of sub-graph patterns, which may render the learned representations to be useless for many cases. In this paper, we propose a new Iso-CapsNet (Isomorphic Capsule Net) model by introducing the graph isomorphic capsules for effective brain graph representation learning. Based on the capsule dynamic routing, besides the subgraph pattern existence confidence scores, Iso-CapsNet can also learn other sub-graph rich properties, including position, size and orientation, for calculating the class-wise digit capsules. We have compared Iso-CapsNet with both classic and state-of-the-art brain graph representation approaches with extensive experiments on four brain graph benchmark datasets. The experimental results also demonstrate the effectiveness of Iso-CapsNet, which can out-perform the baseline methods with significant improvements.
Keyword: scaling
A penalisation method for batch multi-objective Bayesian optimisation with application in heat exchanger design
Authors: Andrei Paleyes, Henry B. Moss, Victor Picheny, Piotr Zulawski, Felix Newman
Abstract
We present HIghly Parallelisable Pareto Optimisation (HIPPO) -- a batch acquisition function that enables multi-objective Bayesian optimisation methods to efficiently exploit parallel processing resources. Multi-Objective Bayesian Optimisation (MOBO) is a very efficient tool for tackling expensive black-box problems. However, most MOBO algorithms are designed as purely sequential strategies, and existing batch approaches are prohibitively expensive for all but the smallest of batch sizes. We show that by encouraging batch diversity through penalising evaluations with similar predicted objective values, HIPPO is able to cheaply build large batches of informative points. Our extensive experimental validation demonstrates that HIPPO is at least as efficient as existing alternatives whilst incurring an order of magnitude lower computational overhead and scaling easily to batch sizes considerably higher than currently supported in the literature. Additionally, we demonstrate the application of HIPPO to a challenging heat exchanger design problem, stressing the real-world utility of our highly parallelisable approach to MOBO.
Keyword: calibration
Asymptotic-Preserving Neural Networks for multiscale hyperbolic models of epidemic spread
Authors: Giulia Bertaglia, Chuan Lu, Lorenzo Pareschi, Xueyu Zhu
Abstract
When investigating epidemic dynamics through differential models, the parameters needed to understand the phenomenon and to simulate forecast scenarios require a delicate calibration phase, often made even more challenging by the scarcity and uncertainty of the observed data reported by official sources. In this context, Physics-Informed Neural Networks (PINNs), by embedding the knowledge of the differential model that governs the physical phenomenon in the learning process, can effectively address the inverse and forward problem of data-driven learning and solving the corresponding epidemic problem. In many circumstances, however, the spatial propagation of an infectious disease is characterized by movements of individuals at different scales governed by multiscale PDEs. This reflects the heterogeneity of a region or territory in relation to the dynamics within cities and in neighboring zones. In presence of multiple scales, a direct application of PINNs generally leads to poor results due to the multiscale nature of the differential model in the loss function of the neural network. To allow the neural network to operate uniformly with respect to the small scales, it is desirable that the neural network satisfies an Asymptotic-Preservation (AP) property in the learning process. To this end, we consider a new class of AP Neural Networks (APNNs) for multiscale hyperbolic transport models of epidemic spread that, thanks to an appropriate AP formulation of the loss function, is capable to work uniformly at the different scales of the system. A series of numerical tests for different epidemic scenarios confirms the validity of the proposed approach, highlighting the importance of the AP property in the neural network when dealing with multiscale problems especially in presence of sparse and partially observed systems.
Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization
Abstract
The class of deep deterministic off-policy algorithms is effectively applied to solve challenging continuous control problems. However, current approaches use random noise as a common exploration method that has several weaknesses, such as a need for manual adjusting on a given task and the absence of exploratory calibration during the training process. We address these challenges by proposing a novel guided exploration method that uses a differential directional controller to incorporate scalable exploratory action correction. An ensemble of Monte Carlo Critics that provides exploratory direction is presented as a controller. The proposed method improves the traditional exploration scheme by changing exploration dynamically. We then present a novel algorithm exploiting the proposed directional controller for both policy and critic modification. The presented algorithm outperforms modern reinforcement learning algorithms across a variety of problems from DMControl suite.
Learning neural state-space models: do we need a state estimator?
Authors: Marco Forgione, Manas Mejari, Dario Piga
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
In recent years, several algorithms for system identification with neural state-space models have been introduced. Most of the proposed approaches are aimed at reducing the computational complexity of the learning problem, by splitting the optimization over short sub-sequences extracted from a longer training dataset. Different sequences are then processed simultaneously within a minibatch, taking advantage of modern parallel hardware for deep learning. An issue arising in these methods is the need to assign an initial state for each of the sub-sequences, which is required to run simulations and thus to evaluate the fitting loss. In this paper, we provide insights for calibration of neural state-space training algorithms based on extensive experimentation and analyses performed on two recognized system identification benchmarks. Particular focus is given to the choice and the role of the initial state estimation. We demonstrate that advanced initial state estimation techniques are really required to achieve high performance on certain classes of dynamical systems, while for asymptotically stable ones basic procedures such as zero or random initialization already yield competitive performance.
Uncertainty Calibration for Deep Audio Classifiers
Authors: Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Jing Xiao
Abstract
Although deep Neural Networks (DNNs) have achieved tremendous success in audio classification tasks, their uncertainty calibration are still under-explored. A well-calibrated model should be accurate when it is certain about its prediction and indicate high uncertainty when it is likely to be inaccurate. In this work, we investigate the uncertainty calibration for deep audio classifiers. In particular, we empirically study the performance of popular calibration methods: (i) Monte Carlo Dropout, (ii) ensemble, (iii) focal loss, and (iv) spectral-normalized Gaussian process (SNGP), on audio classification datasets. To this end, we evaluate (i-iv) for the tasks of environment sound and music genre classification. Results indicate that uncalibrated deep audio classifiers may be over-confident, and SNGP performs the best and is very efficient on the two datasets of this paper.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
Distributional Gaussian Processes Layers for Out-of-Distribution Detection
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
Safe, Learning-Based MPC for Highway Driving under Lane-Change Uncertainty: A Distributionally Robust Approach
Keyword: confidence
SCAI: A Spectral data Classification framework with Adaptive Inference for the IoT platform
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation
Efficient Private SCO for Heavy-Tailed Data via Clipping
PARTICUL: Part Identification with Confidence measure using Unsupervised Learning
Safe, Learning-Based MPC for Highway Driving under Lane-Change Uncertainty: A Distributionally Robust Approach
Process Knowledge-Infused AI: Towards User-level Explainability, Interpretability, and Safety
Iso-CapsNet: Isomorphic Capsule Network for Brain Graph Representation Learning
Keyword: scaling
A penalisation method for batch multi-objective Bayesian optimisation with application in heat exchanger design
Keyword: calibration
Asymptotic-Preserving Neural Networks for multiscale hyperbolic models of epidemic spread
Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization
Learning neural state-space models: do we need a state estimator?
Uncertainty Calibration for Deep Audio Classifiers