Abstract
Semantic simultaneous localization and mapping is a subject of increasing interest in robotics and AI that directly influences the autonomous vehicles industry, the army industries, and more. One of the challenges in this field is to obtain object classification jointly with robot trajectory estimation. Considering view-dependent semantic measurements, there is a coupling between different classes, resulting in a combinatorial number of hypotheses. A common solution is to prune hypotheses that have a sufficiently low probability and to retain only a limited number of hypotheses. However, after pruning and renormalization, the updated probability is overconfident with respect to the original probability. This is especially problematic for systems that require high accuracy. If the prior probability of the classes is independent, the original normalization factor can be computed efficiently without pruning hypotheses. To the best of our knowledge, this is the first work to present these results. If the prior probability of the classes is dependent, we propose a lower bound on the normalization factor that ensures cautious results. The bound is calculated incrementally and with similar efficiency as in the independent case. After pruning and updating based on the bound, this belief is shown empirically to be close to the original belief.
Keyword: overconfidence
There is no result
Keyword: confidence
Easy Batch Normalization
Authors: Arip Asadulaev, Alexander Panfilov, Andrey Filchenkov
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
It was shown that adversarial examples improve object recognition. But what about their opposite side, easy examples? Easy examples are samples that the machine learning model classifies correctly with high confidence. In our paper, we are making the first step toward exploring the potential benefits of using easy examples in the training procedure of neural networks. We propose to use an auxiliary batch normalization for easy examples for the standard and robust accuracy improvement.
Time Is MattEr: Temporal Self-supervision for Video Transformers
Authors: Sukmin Yun, Jaehyung Kim, Dongyoon Han, Hwanjun Song, Jung-Woo Ha, Jinwoo Shin
Abstract
Understanding temporal dynamics of video is an essential aspect of learning better video representations. Recently, transformer-based architectural designs have been extensively explored for video tasks due to their capability to capture long-term dependency of input sequences. However, we found that these Video Transformers are still biased to learn spatial dynamics rather than temporal ones, and debiasing the spurious correlation is critical for their performance. Based on the observations, we design simple yet effective self-supervised tasks for video models to learn temporal dynamics better. Specifically, for debiasing the spatial bias, our method learns the temporal order of video frames as extra self-supervision and enforces the randomly shuffled frames to have low-confidence outputs. Also, our method learns the temporal flow direction of video tokens among consecutive frames for enhancing the correlation toward temporal dynamics. Under various video action recognition tasks, we demonstrate the effectiveness of our method and its compatibility with state-of-the-art Video Transformers.
Can You Fool AI by Doing a 180? $\unicode{x2013}$ A Case Study on Authorship Analysis of Texts by Arata Osada
Authors: Jagna Nieuwazny, Karol Nowakowski, Michal Ptaszynski, Fumito Masui
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
This paper is our attempt at answering a twofold question covering the areas of ethics and authorship analysis. Firstly, since the methods used for performing authorship analysis imply that an author can be recognized by the content he or she creates, we were interested in finding out whether it would be possible for an author identification system to correctly attribute works to authors if in the course of years they have undergone a major psychological transition. Secondly, and from the point of view of the evolution of an author's ethical values, we checked what it would mean if the authorship attribution system encounters difficulties in detecting single authorship. We set out to answer those questions through performing a binary authorship analysis task using a text classifier based on a pre-trained transformer model and a baseline method relying on conventional similarity metrics. For the test set, we chose works of Arata Osada, a Japanese educator and specialist in the history of education, with half of them being books written before the World War II and another half in the 1950s, in between which he underwent a transformation in terms of political opinions. As a result, we were able to confirm that in the case of texts authored by Arata Osada in a time span of more than 10 years, while the classification accuracy drops by a large margin and is substantially lower than for texts by other non-fiction writers, confidence scores of the predictions remain at a similar level as in the case of a shorter time span, indicating that the classifier was in many instances tricked into deciding that texts written over a time span of multiple years were actually written by two different people, which in turn leads us to believe that such a change can affect authorship analysis, and that historical events have great impact on a person's ethical outlook as expressed in their writings.
On the development of a Bayesian optimisation framework for complex unknown systems
Authors: Mike Diessner, Yu Guan, Kevin J. Wilson, Richard D. Whalley
Abstract
Bayesian optimisation provides an effective method to optimise expensive black box functions. It has recently been applied to problems in fluid dynamics. This paper studies and compares common Bayesian optimisation algorithms empirically on a range of synthetic test functions. It investigates the choice of acquisition function and number of training samples, exact calculation of acquisition functions and Monte Carlo based approaches and both single-point and multi-point optimisation. The test functions considered cover a wide selection of challenges and therefore serve as an ideal test bed to understand the performance of Bayesian optimisation and to identify general situations where Bayesian optimisation performs well and poorly. This knowledge can be utilised in applications, including those in fluid dynamics, where objective functions are unknown. The results of this investigation show that the choices to be made are less relevant for relatively simple functions, while optimistic acquisition functions such as Upper Confidence Bound should be preferred for more complex objective functions. Furthermore, results from the Monte Carlo approach are comparable to results from analytical acquisition functions. In instances where the objective function allows parallel evaluations, the multi-point approach offers a quicker alternative, yet it may potentially require more objective function evaluations.
Bounding generalization error with input compression: An empirical study with infinite-width networks
Authors: Angus Galloway, Anna Golubeva, Mahmoud Salem, Mihai Nica, Yani Ioannou, Graham W. Taylor
Abstract
Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.
Keyword: scaling
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
Authors: Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B.G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection. Starting with a generic and class-agnostic region proposal mechanism, we use vision and language models to categorize each region of an image into any object category that is required for downstream tasks. We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection, where a model needs to generalize to unseen object categories, and semi-supervised object detection, where additional unlabeled images can be used to improve the model. Our empirical evaluation shows the effectiveness of the pseudo labels in both tasks, where we outperform competitive baselines and achieve a novel state-of-the-art for open-vocabulary object detection. Our code is available at https://github.com/xiaofeng94/VL-PLM.
Abstract
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated. However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation. Such problems are especially severe on tasks with limited data, thus hindering the progress for MoE models to improve performance by scaling up. In this work, we propose Mixture of Expert Clusters - a general approach to enable expert layers to learn more diverse and appropriate knowledge by imposing variance-based constraints on the routing stage. We further propose a cluster-level expert dropout strategy specifically designed for the expert cluster structure. Our experiments reveal that MoEC could improve performance on machine translation and natural language understanding tasks, and raise the performance upper bound for scaling up experts under limited data. We also verify that MoEC plays a positive role in mitigating overfitting and sparse data allocation.
A Physical-Constraint-Preserving Finite Volume WENO Method for Special Relativistic Hydrodynamics on Unstructured Meshes
Authors: Yaping Chen, Kailiang Wu
Subjects: Numerical Analysis (math.NA); Instrumentation and Methods for Astrophysics (astro-ph.IM); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)
Abstract
This paper presents a highly robust third-order accurate finite volume weighted essentially non-oscillatory (WENO) method for special relativistic hydrodynamics on unstructured triangular meshes. We rigorously prove that the proposed method is physical-constraint-preserving (PCP), namely, always preserves the positivity of the pressure and the rest-mass density as well as the subluminal constraint on the fluid velocity. The method is built on a highly efficient compact WENO reconstruction on unstructured meshes, a simple PCP limiter, the provably PCP property of the Harten--Lax--van Leer flux, and third-order strong-stability-preserving time discretization. Due to the relativistic effects, the primitive variables (namely, the rest-mass density, velocity, and pressure) are highly nonlinear implicit functions in terms of the conservative variables, making the design and analysis of our method nontrivial. To address the difficulties arising from the strong nonlinearity, we adopt a novel quasilinear technique for the theoretical proof of the PCP property. Three provable convergence-guaranteed iterative algorithms are also introduced for the robust recovery of primitive quantities from admissible conservative variables. We also propose a slight modification to an existing WENO reconstruction to ensure the scaling invariance of the nonlinear weights and thus to accommodate the homogeneity of the evolution operator, leading to the advantages of the modified WENO reconstruction in resolving multi-scale wave structures. Extensive numerical examples are presented to demonstrate the robustness, expected accuracy, and high resolution of the proposed method.
Keyword: calibration
Assaying Out-Of-Distribution Generalization in Transfer Learning
Authors: Florian Wenzel, Andrea Dittadi, Peter Vincent Gehler, Carl-Johann Simon-Gabriel, Max Horn, Dominik Zietlow, David Kernert, Chris Russell, Thomas Brox, Bernt Schiele, Bernhard Schölkopf, Francesco Locatello
Abstract
Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting. Our findings confirm that in- and out-of-distribution accuracies tend to increase jointly, but show that their relation is largely dataset-dependent, and in general more nuanced and more complex than posited by previous, smaller scale studies.
Computer Vision to the Rescue: Infant Postural Symmetry Estimation from Incongruent Annotations
Authors: Xiaofei Huang, Michael Wan, Lingfei Luan, Bethany Tunik, Sarah Ostadabbas
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Bilateral postural symmetry plays a key role as a potential risk marker for autism spectrum disorder (ASD) and as a symptom of congenital muscular torticollis (CMT) in infants, but current methods of assessing symmetry require laborious clinical expert assessments. In this paper, we develop a computer vision based infant symmetry assessment system, leveraging 3D human pose estimation for infants. Evaluation and calibration of our system against ground truth assessments is complicated by our findings from a survey of human ratings of angle and symmetry, that such ratings exhibit low inter-rater reliability. To rectify this, we develop a Bayesian estimator of the ground truth derived from a probabilistic graphical model of fallible human raters. We show that the 3D infant pose estimation model can achieve 68% area under the receiver operating characteristic curve performance in predicting the Bayesian aggregate labels, compared to only 61% from a 2D infant pose estimation model and 60% from a 3D adult pose estimation model, highlighting the importance of 3D poses and infant domain knowledge in assessing infant body symmetry. Our survey analysis also suggests that human ratings are susceptible to higher levels of bias and inconsistency, and hence our final 3D pose-based symmetry assessment system is calibrated but not directly supervised by Bayesian aggregate human ratings, yielding higher levels of consistency and lower levels of inter-limb assessment bias.
SphereFed: Hyperspherical Federated Learning
Authors: Xin Dong, Sai Qian Zhang, Ang Li, H.T. Kung
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the non-i.i.d. issue by constraining learned representations of data points to be on a unit hypersphere shared by clients. Specifically, all clients learn their local representations by minimizing the loss with respect to a fixed classifier whose weights span the unit hypersphere. After federated training in improving the global model, this classifier is further calibrated with a closed-form solution by minimizing a mean squared loss. We show that the calibration solution can be computed efficiently and distributedly without direct access of local data. Extensive experiments indicate that our SphereFed approach is able to improve the accuracy of multiple existing federated learning algorithms by a considerable margin (up to 6% on challenging datasets) with enhanced computation and communication efficiency across datasets and model architectures.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
Hybrid Belief Pruning with Guarantees for Viewpoint-Dependent Semantic SLAM
Keyword: overconfidence
There is no result
Keyword: confidence
Easy Batch Normalization
Time Is MattEr: Temporal Self-supervision for Video Transformers
Can You Fool AI by Doing a 180? $\unicode{x2013}$ A Case Study on Authorship Analysis of Texts by Arata Osada
On the development of a Bayesian optimisation framework for complex unknown systems
Bounding generalization error with input compression: An empirical study with infinite-width networks
Keyword: scaling
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
MoEC: Mixture of Expert Clusters
A Physical-Constraint-Preserving Finite Volume WENO Method for Special Relativistic Hydrodynamics on Unstructured Meshes
Keyword: calibration
Assaying Out-Of-Distribution Generalization in Transfer Learning
Computer Vision to the Rescue: Infant Postural Symmetry Estimation from Incongruent Annotations
SphereFed: Hyperspherical Federated Learning