Abstract
We address dense action forecasting: the problem of predicting future action sequence over long durations based on partial observation. Our key insight is that future action sequences are more accurately modeled with variable, rather than one, levels of abstraction, and that the optimal level of abstraction can be dynamically selected during the prediction process. Our experiments show that most parts of future action sequences can be predicted confidently in fine detail only in small segments of future frames, which are effectively islands'' of high model prediction confidence in asea'' of uncertainty. We propose a combination Bayesian neural network and hierarchical convolutional segmentation model to both accurately predict future actions and optimally select abstraction levels. We evaluate this approach on standard datasets against existing state-of-the-art systems and demonstrate that our ``islands of predictability'' approach maintains fine-grained action predictions while also making accurate abstract predictions where systems were previously unable to do so, and thus results in substantial, monotonic increases in accuracy.
Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction
Authors: YuXuan Liu, Nikhil Mishra, Maximilian Sieb, Yide Shentu, Pieter Abbeel, Xi Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
3D bounding boxes are a widespread intermediate representation in many computer vision applications. However, predicting them is a challenging task, largely due to partial observability, which motivates the need for a strong sense of uncertainty. While many recent methods have explored better architectures for consuming sparse and unstructured point cloud data, we hypothesize that there is room for improvement in the modeling of the output distribution and explore how this can be achieved using an autoregressive prediction head. Additionally, we release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications, where 3D bounding box prediction has largely been underexplored. We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures, achieving strong results on SUN-RGBD, Scannet, KITTI, and our new dataset.
DroneARchery: Human-Drone Interaction through Augmented Reality with Haptic Feedback and Multi-UAV Collision Avoidance Driven by Deep Reinforcement Learning
Authors: Ekaterina Dorzhieva, Ahmed Baza, Ayush Gupta, Aleksey Fedoseev, Miguel Altamirano Cabrera, Ekaterina Karmanova, Dzmitry Tsetserukou
Abstract
We propose a novel concept of augmented reality (AR) human-drone interaction driven by RL-based swarm behavior to achieve intuitive and immersive control of a swarm formation of unmanned aerial vehicles. The DroneARchery system developed by us allows the user to quickly deploy a swarm of drones, generating flight paths simulating archery. The haptic interface LinkGlide delivers a tactile stimulus of the bowstring tension to the forearm to increase the precision of aiming. The swarm of released drones dynamically avoids collisions between each other, the drone following the user, and external obstacles with behavior control based on deep reinforcement learning. The developed concept was tested in the scenario with a human, where the user shoots from a virtual bow with a real drone to hit the target. The human operator observes the ballistic trajectory of the drone in an AR and achieves a realistic and highly recognizable experience of the bowstring tension through the haptic display. The experimental results revealed that the system improves trajectory prediction accuracy by 63.3% through applying AR technology and conveying haptic feedback of pulling force. DroneARchery users highlighted the naturalness (4.3 out of 5 point Likert scale) and increased confidence (4.7 out of 5) when controlling the drone. We have designed the tactile patterns to present four sliding distances (tension) and three applied force levels (stiffness) of the haptic display. Users demonstrated the ability to distinguish tactile patterns produced by the haptic display representing varying bowstring tension(average recognition rate is of 72.8%) and stiffness (average recognition rate is of 94.2%). The novelty of the research is the development of an AR-based approach for drone control that does not require special skills and training from the operator.
Confidence estimation of classification based on the distribution of the neural network output layer
Authors: Abdel Aziz Taha, Leonhard Hennig, Petr Knoth
Abstract
One of the most common problems preventing the application of prediction models in the real world is lack of generalization: The accuracy of models, measured in the benchmark does repeat itself on future data, e.g. in the settings of real business. There is relatively little methods exist that estimate the confidence of prediction models. In this paper, we propose novel methods that, given a neural network classification model, estimate uncertainty of particular predictions generated by this model. Furthermore, we propose a method that, given a model and a confidence level, calculates a threshold that separates prediction generated by this model into two subsets, one of them meets the given confidence level. In contrast to other methods, the proposed methods do not require any changes on existing neural networks, because they simply build on the output logit layer of a common neural network. In particular, the methods infer the confidence of a particular prediction based on the distribution of the logit values corresponding to this prediction. The proposed methods constitute a tool that is recommended for filtering predictions in the process of knowledge extraction, e.g. based on web scrapping, where predictions subsets are identified that maximize the precision on cost of the recall, which is less important due to the availability of data. The method has been tested on different tasks including relation extraction, named entity recognition and image classification to show the significant increase of accuracy achieved.
HGARN: Hierarchical Graph Attention Recurrent Network for Human Mobility Prediction
Abstract
Human mobility prediction is a fundamental task essential for various applications, including urban planning, transportation services, and location recommendation. Existing approaches often ignore activity information crucial for reasoning human preferences and routines, or adopt a simplified representation of the dependencies between time, activities and locations. To address these issues, we present Hierarchical Graph Attention Recurrent Network (HGARN) for human mobility prediction. Specifically, we construct a hierarchical graph based on all users' history mobility records and employ a Hierarchical Graph Attention Module to capture complex time-activity-location dependencies. This way, HGARN can learn representations with rich contextual semantics to model user preferences at the global level. We also propose a model-agnostic history-enhanced confidence (MaHec) label to focus our model on each user's individual-level preferences. Finally, we introduce a Recurrent Encoder-Decoder Module, which employs recurrent structures to jointly predict users' next activities (as an auxiliary task) and locations. For model evaluation, we test the performances of our Hgarn against existing SOTAs in recurring and explorative settings. The recurring setting focuses more on assessing models' capabilities to capture users' individual-level preferences. In contrast, the results in the explorative setting tend to reflect the power of different models to learn users' global-level preferences. Overall, our model outperforms other baselines significantly in the main, recurring, and explorative settings based on two real-world human mobility data benchmarks. Source codes of HGARN are available at https://github.com/YihongT/HGARN.
Federated Best Arm Identification with Heterogeneous Clients
Authors: Zhirui Chen, P. N. Karthik, Vincent Y. F. Tan, Yeow Meng Chee
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Abstract
We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients, when each client has access to a {\em subset} of arms and each arm yields independent Gaussian observations. The {\em reward} from an arm at any given time is defined as the average of the observations generated at this time across all the clients that have access to the arm. The end goal is to identify the best arm (the arm with the largest mean reward) of each client with the least expected stopping time, subject to an upper bound on the error probability (i.e., the {\em fixed-confidence regime}). We provide a lower bound on the growth rate of the expected time to find the best arm of each client. Furthermore, we show that for any algorithm whose upper bound on the expected time to find the best arms matches with the lower bound up to a multiplicative constant, the ratio of any two consecutive communication time instants must be bounded, a result that is of independent interest. We then provide the first-known lower bound on the expected number of {\em communication rounds} required to find the best arms. We propose a novel algorithm based on the well-known {\em Track-and-Stop} strategy that communicates only at exponential time instants, and derive asymptotic upper bounds on its expected time to find the best arms and the expected number of communication rounds, where the asymptotics is one of vanishing error probabilities.
Keyword: scaling
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
Authors: Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang
Abstract
Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks. However, training big models requires strong distributed system expertise to carefully design model-parallel execution strategies that suit the model architectures and cluster setups. In this paper, we develop AMP, a framework that automatically derives such strategies. AMP identifies a valid space of model parallelism strategies and efficiently searches the space for high-performed strategies, by leveraging a cost model designed to capture the heterogeneity of the model and cluster specifications. Unlike existing methods, AMP is specifically tailored to support complex models composed of uneven layers and cluster setups with more heterogeneous accelerators and bandwidth. We evaluate AMP on popular models and cluster setups from public clouds and show that AMP returns parallel strategies that match the expert-tuned strategies on typical cluster setups. On heterogeneous clusters or models with heterogeneous architectures, AMP finds strategies with 1.54x and 1.77x higher throughput than state-of-the-art model-parallel systems, respectively.
Abstract
While the Turkish language is listed among low-resource languages, literature on Turkish automatic speech recognition (ASR) is relatively old. In this paper, we present HuBERT-TR, a speech representation model for Turkish based on HuBERT. HuBERT-TR achieves state-of-the-art results on several Turkish ASR datasets. We investigate pre-training HuBERT for Turkish with large-scale data curated from online resources. We pre-train HuBERT-TR using over 6,500 hours of speech data curated from YouTube that includes extensive variability in terms of quality and genre. We show that pre-trained models within a multi-lingual setup are inferior to language-specific models, where our Turkish model HuBERT-TR base performs better than its x10 times larger multi-lingual counterpart XLS-R-1B. Moreover, we study the effect of scaling on ASR performance by scaling our models up to 1B parameters. Our best model yields a state-of-the-art word error rate of 4.97% on the Turkish Broadcast News dataset. Models are available at huggingface.co/asafaya .
A $μ$-mode approach for exponential integrators: actions of $\varphi$-functions of Kronecker sums
Authors: Marco Caliari, Fabio Cassini, Franco Zivcovich
Abstract
We present a novel method for computing actions of the so-called $\varphi$-functions for a Kronecker sum $K$ of $d$ arbitrary matrices $A\mu$. It is based on the approximation of the integral representation of the $\varphi$-functions by Gaussian quadrature formulas combined with a scaling and squaring technique. The resulting algorithm, which we call PHIKS, evaluates the required actions by means of $\mu$-mode products involving exponentials of the small sized matrices $A\mu$, without using the large sized matrix $K$ itself. PHIKS, which profits from the highly efficient level 3 BLAS, is designed to compute different $\varphi$-functions applied on the same vector or a linear combination of actions of $\varphi$-functions applied on different vectors. In addition, due to the underlying scaling and squaring technique, the desired quantities are available simultaneously at suitable time scales. All these features allow the effective usage of PHIKS in the exponential integration context. In particular, we tested our newly designed method on popular exponential Runge-Kutta integrators of stiff order from one to four, in comparison with state-of-the-art algorithms for computing actions of $\varphi$-functions. Our numerical experiments with discretized semilinear evolutionary 2D or 3D advection-diffusion-reaction, Allen-Cahn, and Brusselator equations show the superiority of the $\mu$-mode approach of PHIKS.
An Empirical Evaluation of Multivariate Time Series Classification with Input Transformation across Different Dimensions
Authors: Leonardos Pantiskas, Kees Verstoep, Mark Hoogendoorn, Henri Bal
Abstract
In current research, machine and deep learning solutions for the classification of temporal data are shifting from single-channel datasets (univariate) to problems with multiple channels of information (multivariate). The majority of these works are focused on the method novelty and architecture, and the format of the input data is often treated implicitly. Particularly, multivariate datasets are often treated as a stack of univariate time series in terms of input preprocessing, with scaling methods applied across each channel separately. In this evaluation, we aim to demonstrate that the additional channel dimension is far from trivial and different approaches to scaling can lead to significantly different results in the accuracy of a solution. To that end, we test seven different data transformation methods on four different temporal dimensions and study their effect on the classification accuracy of five recent methods. We show that, for the large majority of tested datasets, the best transformation-dimension configuration leads to an increase in the accuracy compared to the result of each model with the same hyperparameters and no scaling, ranging from 0.16 to 76.79 percentage points. We also show that if we keep the transformation method constant, there is a statistically significant difference in accuracy results when applying it across different dimensions, with accuracy differences ranging from 0.23 to 47.79 percentage points. Finally, we explore the relation of the transformation methods and dimensions to the classifiers, and we conclude that there is no prominent general trend, and the optimal configuration is dataset- and classifier-specific.
Neural Routing in Meta Learning
Authors: Jicang Cai, Saeed Vahidian, Weijia Wang, Mohsen Joneidi, Bill Lin
Abstract
Meta-learning often referred to as learning-to-learn is a promising notion raised to mimic human learning by exploiting the knowledge of prior tasks but being able to adapt quickly to novel tasks. A plethora of models has emerged in this context and improved the learning efficiency, robustness, etc. The question that arises here is can we emulate other aspects of human learning and incorporate them into the existing meta learning algorithms? Inspired by the widely recognized finding in neuroscience that distinct parts of the brain are highly specialized for different types of tasks, we aim to improve the model performance of the current meta learning algorithms by selectively using only parts of the model conditioned on the input tasks. In this work, we describe an approach that investigates task-dependent dynamic neuron selection in deep convolutional neural networks (CNNs) by leveraging the scaling factor in the batch normalization (BN) layer associated with each convolutional layer. The problem is intriguing because the idea of helping different parts of the model to learn from different types of tasks may help us train better filters in CNNs, and improve the model generalization performance. We find that the proposed approach, neural routing in meta learning (NRML), outperforms one of the well-known existing meta learning baselines on few-shot classification tasks on the most widely used benchmark datasets.
Keyword: calibration
NOCaL: Calibration-Free Semi-Supervised Learning of Odometry and Camera Intrinsics
Authors: Ryan Griffiths, Jack Naylor, Donald G. Dansereau
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Abstract
There are a multitude of emerging imaging technologies that could benefit robotics. However the need for bespoke models, calibration and low-level processing represents a key barrier to their adoption. In this work we present NOCaL, Neural odometry and Calibration using Light fields, a semi-supervised learning architecture capable of interpreting previously unseen cameras without calibration. NOCaL learns to estimate camera parameters, relative pose, and scene appearance. It employs a scene-rendering hypernetwork pretrained on a large number of existing cameras and scenes, and adapts to previously unseen cameras using a small supervised training set to enforce metric scale. We demonstrate NOCaL on rendered and captured imagery using conventional cameras, demonstrating calibration-free odometry and novel view synthesis. This work represents a key step toward automating the interpretation of general camera geometries and emerging imaging technologies.
SAILOR: Scaling Anchors via Insights into Latent Object
Authors: Dušan Malić, Christian Fruhwirth-Reisinger, Horst Possegger, Horst Bischof
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
LiDAR 3D object detection models are inevitably biased towards their training dataset. The detector clearly exhibits this bias when employed on a target dataset, particularly towards object sizes. However, object sizes vary heavily between domains due to, for instance, different labeling policies or geographical locations. State-of-the-art unsupervised domain adaptation approaches outsource methods to overcome the object size bias. Mainstream size adaptation approaches exploit target domain statistics, contradicting the original unsupervised assumption. Our novel unsupervised anchor calibration method addresses this limitation. Given a model trained on the source data, we estimate the optimal target anchors in a completely unsupervised manner. The main idea stems from an intuitive observation: by varying the anchor sizes for the target domain, we inevitably introduce noise or even remove valuable object cues. The latent object representation, perturbed by the anchor size, is closest to the learned source features only under the optimal target anchors. We leverage this observation for anchor size optimization. Our experimental results show that, without any retraining, we achieve competitive results even compared to state-of-the-art weakly-supervised size adaptation approaches. In addition, our anchor calibration can be combined with such existing methods, making them completely unsupervised.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Finding Islands of Predictability in Action Forecasting
islands'' of high model prediction confidence in a
sea'' of uncertainty. We propose a combination Bayesian neural network and hierarchical convolutional segmentation model to both accurately predict future actions and optimally select abstraction levels. We evaluate this approach on standard datasets against existing state-of-the-art systems and demonstrate that our ``islands of predictability'' approach maintains fine-grained action predictions while also making accurate abstract predictions where systems were previously unable to do so, and thus results in substantial, monotonic increases in accuracy.Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction
DroneARchery: Human-Drone Interaction through Augmented Reality with Haptic Feedback and Multi-UAV Collision Avoidance Driven by Deep Reinforcement Learning
Confidence estimation of classification based on the distribution of the neural network output layer
HGARN: Hierarchical Graph Attention Recurrent Network for Human Mobility Prediction
Federated Best Arm Identification with Heterogeneous Clients
Keyword: scaling
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning
A $μ$-mode approach for exponential integrators: actions of $\varphi$-functions of Kronecker sums
An Empirical Evaluation of Multivariate Time Series Classification with Input Transformation across Different Dimensions
Neural Routing in Meta Learning
Keyword: calibration
NOCaL: Calibration-Free Semi-Supervised Learning of Odometry and Camera Intrinsics
SAILOR: Scaling Anchors via Insights into Latent Object