Abstract
This paper presents Bayesian methods that support conservative dependability claims for a software-based safety-critical system, particularly when evidence suggests the software's executions are not statistically independent. We formalise informal notions of "doubting" that the software's executions are independent, and incorporate such doubts into dependability assessments. We study the extent to which an assumption of independent executions can undermine conservatism in assessments, and identify conditions under which this impact is, or is not, significant. These techniques -- novel extensions of conservative Bayesian inference (CBI) methods -- are illustrated in two applications: the assessment of a nuclear power-plant safety protection system and the assessment of autonomous vehicle (AV) safety. Our analyses reveals: 1) the required amount of confidence an assessor should possess before subjecting a system to operational testing. Otherwise, such testing is shown to be futile -- no amount of favourable operational testing evidence will increase one's confidence in the system being sufficiently dependable; 2) the independence assumption supports optimistic claims in certain situations, and conservative claims in other situations; 3) in some scenarios, upon observing a system operate without failure, an assessor's confidence in the system being sufficiently dependable is less than it would be had the system exhibited some failures; 4) posterior confidence in a system being sufficiently dependable is very sensitive to failures -- each additional failure means significantly more operational testing evidence is required, in order to support a dependability claim.
Progressive Cross-modal Knowledge Distillation for Human Action Recognition
Authors: Jianyuan Ni, Anne H.H. Ngu, Yan Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract
Wearable sensor-based Human Action Recognition (HAR) has achieved remarkable success recently. However, the accuracy performance of wearable sensor-based HAR is still far behind the ones from the visual modalities-based system (i.e., RGB video, skeleton, and depth). Diverse input modalities can provide complementary cues and thus improve the accuracy performance of HAR, but how to take advantage of multi-modal data on wearable sensor-based HAR has rarely been explored. Currently, wearable devices, i.e., smartwatches, can only capture limited kinds of non-visual modality data. This hinders the multi-modal HAR association as it is unable to simultaneously use both visual and non-visual modality data. Another major challenge lies in how to efficiently utilize multimodal data on wearable devices with their limited computation resources. In this work, we propose a novel Progressive Skeleton-to-sensor Knowledge Distillation (PSKD) model which utilizes only time-series data, i.e., accelerometer data, from a smartwatch for solving the wearable sensor-based HAR problem. Specifically, we construct multiple teacher models using data from both teacher (human skeleton sequence) and student (time-series accelerometer data) modalities. In addition, we propose an effective progressive learning scheme to eliminate the performance gap between teacher and student models. We also designed a novel loss function called Adaptive-Confidence Semantic (ACS), to allow the student model to adaptively select either one of the teacher models or the ground-truth label it needs to mimic. To demonstrate the effectiveness of our proposed PSKD method, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and MMAct datasets. The results confirm that the proposed PSKD method has competitive performance compared to the previous mono sensor-based HAR methods.
Keyword: scaling
DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc
Abstract
Weak supervision has been applied to various Natural Language Understanding tasks in recent years. Due to technical challenges with scaling weak supervision to work on long-form documents, spanning up to hundreds of pages, applications in the document understanding space have been limited. At Lexion, we built a weak supervision-based system tailored for long-form (10-200 pages long) PDF documents. We use this platform for building dozens of language understanding models and have applied it successfully to various domains, from commercial agreements to corporate formation documents. In this paper, we demonstrate the effectiveness of supervised learning with weak supervision in a situation with limited time, workforce, and training data. We built 8 high quality machine learning models in the span of one week, with the help of a small team of just 3 annotators working with a dataset of under 300 documents. We share some details about our overall architecture, how we utilize weak supervision, and what results we are able to achieve. We also include the dataset for researchers who would like to experiment with alternate approaches or refine ours. Furthermore, we shed some light on the additional complexities that arise when working with poorly scanned long-form documents in PDF format, and some of the techniques that help us achieve state-of-the-art performance on such data.
Performance Analysis and Optimization for RIS-Assisted Multi-User Massive MIMO Systems with Imperfect Hardware
Abstract
The paper studies a reconfigurable intelligent surface (RIS)-assisted multi-user uplink massive multiple-input multiple-output (MIMO) system with imperfect hardware. At the RIS, the paper considers phase noise, while at the base station, the paper takes into consideration the radio frequency impairments and low-resolution analog-to-digital converters. The paper derives approximate expressions for the ergodic achievable rate in closed forms under Rician fading channels. For the cases of infinite numbers of antennas and infinite numbers of reflecting elements, asymptotic data rates are derived to provide new design insights. The derived power scaling laws indicate that while guaranteeing a required system performance, the transmit power of the users can be scaled down at most by the factor 1/M when M goes infinite, or by the factor 1/(MN) when M and N go infinite, where M is the number of antennas and N is the number of the reflecting units. Furthermore, an optimization algorithm is proposed based on the genetic algorithm to solve the phase shift optimization problem with the aim of maximizing the sum rate of the system. Additionally, the optimization problem with discrete phase shifts is considered. Finally, numerical results are provided to validate the correctness of the analytical results.
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors
Authors: Sindhu B Hegde, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we explore an interesting question of what can be obtained from an $8\times8$ pixel video sequence. Surprisingly, it turns out to be quite a lot. We show that when we process this $8\times8$ video with the right set of audio and image priors, we can obtain a full-length, $256\times256$ video. We achieve this $32\times$ scaling of an extremely low-resolution input using our novel audio-visual upsampling network. The audio prior helps to recover the elemental facial details and precise lip shapes and a single high-resolution target identity image prior provides us with rich appearance details. Our approach is an end-to-end multi-stage framework. The first stage produces a coarse intermediate output video that can be then used to animate single target identity image and generate realistic, accurate and high-quality outputs. Our approach is simple and performs exceedingly well (an $8\times$ improvement in FID score) compared to previous super-resolution methods. We also extend our model to talking-face video compression, and show that we obtain a $3.5\times$ improvement in terms of bits/pixel over the previous state-of-the-art. The results from our network are thoroughly analyzed through extensive ablation experiments (in the paper and supplementary material). We also provide the demo video along with code and models on our website: \url{this http URL}.
Last-iterate Convergence to Trembling-hand Perfect Equilibria
Authors: Martino Bernasconi, Alberto Marchesi, Francesco Trovò
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
Designing efficient algorithms to find Nash equilibrium (NE) refinements in sequential games is of paramount importance in practice. Indeed, it is well known that the NE has several weaknesses, since it may prescribe to play sub-optimal actions in those parts of the game that are never reached at the equilibrium. NE refinements, such as the extensive-form perfect equilibrium (EFPE), amend such weaknesses by accounting for the possibility of players' mistakes. This is crucial in real-world applications, where bounded rationality players are usually involved, and it turns out being useful also in boosting the performances of superhuman agents for recreational games like Poker. Nevertheless, only few works addressed the problem of computing NE refinements. Most of them propose algorithms finding exact NE refinements by means of linear programming, and, thus, these do not have the potential of scaling up to real-world-size games. On the other hand, existing iterative algorithms that exploit the tree structure of sequential games only provide convergence guarantees to approximate refinements. In this paper, we provide the first efficient last-iterate algorithm that provably converges to an EFPE in two-player zero-sum sequential games with imperfect information. Our algorithm works by tracking a sequence of equilibria of suitably-defined, regularized-perturbed games. In order to do that, it uses a procedure that is tailored to converge last-iterate to the equilibria of such games. Crucially, the updates performed by such a procedure can be performed efficiently by visiting the game tree, thus making our algorithm potentially more scalable than its linear-programming-based competitors. Finally, we evaluate our algorithm on a standard testbed of games, showing that it produces strategies which are much more robust to players' mistakes than those of state-of-the-art NE-computation algorithms.
Distributed Out-of-Memory SVD on CPU/GPU Architectures
Authors: Ismael Boureima, Manish Bhattarai, Maksim E. Eren, Nick Solovyev, Hristo Djidjev, Boian S. Alexandrov
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
We propose an efficient, distributed, out-of-memory implementation of the truncated singular value decomposition (t-SVD) for heterogeneous (CPU+GPU) high performance computing (HPC) systems. Various implementations of SVD have been proposed, but most only estimate the singular values as an estimation of the singular vectors which can significantly increase the time and memory complexity of the algorithm. In this work, we propose an implementation of SVD based on the power method, which is a truncated singular values and singular vectors estimation method. Memory utilization bottlenecks seen in the power method are typically associated with the computation of the Gram matrix $\mat{A}^T\mat{A}$, which can be significant when $\mat{A}$ is large and dense, or when $\mat{A}$ is super-large and sparse. The proposed implementation is optimized for out-of-memory problems where the memory required to factorize a given matrix is greater than the available GPU memory. We reduce the memory complexity of $\mat{A}^T\mat{A}$ by using a batching strategy where the intermediate factors are computed block by block. We also suppress I/O latency associated with both host-to-device (H2D) and device-to-host (D2H) batch copies by overlapping each batch copy with compute using CUDA streams. Furthermore, we use optimized \textit{NCCL} based communicators to reduce the latency associated with collective communications (both intra-node and inter-node). In addition, sparse and dense matrix multiplications are significantly accelerated with GPU cores (or tensors cores when available), resulting in an implementation with good scaling. We demonstrate the scalability of our distributed out of core SVD algorithm to successfully decompose dense matrix of size 1TB and sparse matrix of size 128PB with 1e-6 sparsity.
Keyword: calibration
Deep Learning-Based Discrete Calibrated Survival Prediction
Authors: Patrick Fuhlert, Anne Ernst, Esther Dietrich, Fabian Westhaeusser, Karin Kloiber, Stefan Bonn
Abstract
Deep neural networks for survival prediction outper-form classical approaches in discrimination, which is the ordering of patients according to their time-of-event. Conversely, classical approaches like the Cox Proportional Hazards model display much better calibration, the correct temporal prediction of events of the underlying distribution. Especially in the medical domain, where it is critical to predict the survival of a single patient, both discrimination and calibration are important performance metrics. Here we present Discrete Calibrated Survival (DCS), a novel deep neural network for discriminated and calibrated survival prediction that outperforms competing survival models in discrimination on three medical datasets, while achieving best calibration among all discrete time models. The enhanced performance of DCS can be attributed to two novel features, the variable temporal output node spacing and the novel loss term that optimizes the use of uncensored and censored patient data. We believe that DCS is an important step towards clinical application of deep-learning-based survival prediction with state-of-the-art discrimination and good calibration.
DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations
Authors: Gabriel Van Zandycke, Vladimir Somers, Maxime Istasse, Carlo Del Don, Davide Zambrano
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
With the recent development of Deep Learning applied to Computer Vision, sport video understanding has gained a lot of attention, providing much richer information for both sport consumers and leagues. This paper introduces DeepSportradar-v1, a suite of computer vision tasks, datasets and benchmarks for automated sport understanding. The main purpose of this framework is to close the gap between academic research and real world settings. To this end, the datasets provide high-resolution raw images, camera parameters and high quality annotations. DeepSportradar currently supports four challenging tasks related to basketball: ball 3D localization, camera calibration, player instance segmentation and player re-identification. For each of the four tasks, a detailed description of the dataset, objective, performance metrics, and the proposed baseline method are provided. To encourage further research on advanced methods for sport understanding, a competition is organized as part of the MMSports workshop from the ACM Multimedia 2022 conference, where participants have to develop state-of-the-art methods to solve the above tasks. The four datasets, development kits and baselines are publicly available.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Conservative Bayesian Assessment of Software-based Systems Exhibiting Correlated Executions
Progressive Cross-modal Knowledge Distillation for Human Action Recognition
Keyword: scaling
DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc
Performance Analysis and Optimization for RIS-Assisted Multi-User Massive MIMO Systems with Imperfect Hardware
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors
Last-iterate Convergence to Trembling-hand Perfect Equilibria
Distributed Out-of-Memory SVD on CPU/GPU Architectures
Keyword: calibration
Deep Learning-Based Discrete Calibrated Survival Prediction
DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations