Abstract
Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by partial observability. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm Value Upper Confidence Limit (Value-UCL) that selects what failure modes to prioritize and which state to recover to such that the expected performance improves maximally in every training episode. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71\% to 92.4\% in simulation and from 75\% to 90\% on a real robot.
Reconstruction-guided attention improves the robustness and shape processing of neural networks
Authors: Seoyoung Ahn, Hossein Adeli, Gregory J. Zelinsky
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Abstract
Many visual phenomena suggest that humans use top-down generative or reconstructive processes to create visual percepts (e.g., imagery, object completion, pareidolia), but little is known about the role reconstruction plays in robust object recognition. We built an iterative encoder-decoder network that generates an object reconstruction and used it as top-down attentional feedback to route the most relevant spatial and feature information to feed-forward object recognition processes. We tested this model using the challenging out-of-distribution digit recognition dataset, MNIST-C, where 15 different types of transformation and corruption are applied to handwritten digit images. Our model showed strong generalization performance against various image perturbations, on average outperforming all other models including feedforward CNNs and adversarially trained networks. Our model is particularly robust to blur, noise, and occlusion corruptions, where shape perception plays an important role. Ablation studies further reveal two complementary roles of spatial and feature-based attention in robust object recognition, with the former largely consistent with spatial masking benefits in the attention literature (the reconstruction serves as a mask) and the latter mainly contributing to the model's inference speed (i.e., number of time steps to reach a certain confidence threshold) by reducing the space of possible object hypotheses. We also observed that the model sometimes hallucinates a non-existing pattern out of noise, leading to highly interpretable human-like errors. Our study shows that modeling reconstruction-based feedback endows AI systems with a powerful attention mechanism, which can help us understand the role of generating perception in human visual processing.
Falsification before Extrapolation in Causal Effect Estimation
Authors: Zeshan Hussain, Michael Oberst, Ming-Chieh Shih, David Sontag
Abstract
Randomized Controlled Trials (RCTs) represent a gold standard when developing policy guidelines. However, RCTs are often narrow, and lack data on broader populations of interest. Causal effects in these populations are often estimated using observational datasets, which may suffer from unobserved confounding and selection bias. Given a set of observational estimates (e.g. from multiple studies), we propose a meta-algorithm that attempts to reject observational estimates that are biased. We do so using validation effects, causal effects that can be inferred from both RCT and observational data. After rejecting estimators that do not pass this test, we generate conservative confidence intervals on the extrapolated causal effects for subgroups not observed in the RCT. Under the assumption that at least one observational estimator is asymptotically normal and consistent for both the validation and extrapolated effects, we provide guarantees on the coverage probability of the intervals output by our algorithm. To facilitate hypothesis testing in settings where causal effect transportation across datasets is necessary, we give conditions under which a doubly-robust estimator of group average treatment effects is asymptotically normal, even when flexible machine learning methods are used for estimation of nuisance parameters. We illustrate the properties of our approach on semi-synthetic and real world datasets, and show that it compares favorably to standard meta-analysis techniques.
Inducing Data Amplification Using Auxiliary Datasets in Adversarial Training
Authors: Saehyung Lee, Hyungyu Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Several recent studies have shown that the use of extra in-distribution data can lead to a high level of adversarial robustness. However, there is no guarantee that it will always be possible to obtain sufficient extra data for a selected dataset. In this paper, we propose a biased multi-domain adversarial training (BiaMAT) method that induces training data amplification on a primary dataset using publicly available auxiliary datasets, without requiring the class distribution match between the primary and auxiliary datasets. The proposed method can achieve increased adversarial robustness on a primary dataset by leveraging auxiliary datasets via multi-domain learning. Specifically, data amplification on both robust and non-robust features can be accomplished through the application of BiaMAT as demonstrated through a theoretical and empirical analysis. Moreover, we demonstrate that while existing methods are vulnerable to negative transfer due to the distributional discrepancy between auxiliary and primary data, the proposed method enables neural networks to flexibly leverage diverse image datasets for adversarial training by successfully handling the domain discrepancy through the application of a confidence-based selection strategy. The pre-trained models and code are available at: \url{https://github.com/Saehyung-Lee/BiaMAT}.
Keyword: scaling
Scalable and Equivariant Spherical CNNs by Discrete-Continuous (DISCO) Convolutions
Authors: Jeremy Ocampo, Matthew A. Price, Jason D. McEwen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
Abstract
No existing spherical convolutional neural network (CNN) framework is both computationally scalable and rotationally equivariant. Continuous approaches capture rotational equivariance but are often prohibitively computationally demanding. Discrete approaches offer more favorable computational performance but at the cost of equivariance. We develop a hybrid discrete-continuous (DISCO) group convolution that is simultaneously equivariant and computationally scalable to high-resolution. While our framework can be applied to any compact group, we specialize to the sphere. Our DISCO spherical convolutions not only exhibit $\text{SO}(3)$ rotational equivariance but also a form of asymptotic $\text{SO}(3)/\text{SO}(2)$ rotational equivariance, which is more desirable for many applications (where $\text{SO}(n)$ is the special orthogonal group representing rotations in $n$-dimensions). Through a sparse tensor implementation we achieve linear scaling in number of pixels on the sphere for both computational cost and memory usage. For 4k spherical images we realize a saving of $10^9$ in computational cost and $10^4$ in memory usage when compared to the most efficient alternative equivariant spherical convolution. We apply the DISCO spherical CNN framework to a number of benchmark dense-prediction problems on the sphere, such as semantic segmentation and depth estimation, on all of which we achieve the state-of-the-art performance.
Performance Analysis of MIMO-NOMA Systems with Randomly Deployed Users
Authors: Zheng Shi, Guanghua Yang, Yaru Fu, Hong Wang, Shaodan Ma
Abstract
This paper investigates the performance of Multiple-input multiple-output non-orthogonal multiple access (MIMO-NOMA) systems with randomly deployed users, where the randomly deployed NOMA users follow Poisson point process (PPP), the spatial correlation between MIMO channels are characterized by using Kronecker model, and the composite channel model is used to capture large-scale fading as well as small-scale fading. The spatial randomness of users' distribution, the spatial correlation among antennas and large-scale fading will severally impact the system performance, but they are seldom considered in prior literature for MIMO-NOMA systems, and the consideration of all these impact factors challenges the analysis. Based on zero-forcing (ZF) detection, the exact expressions for both the average outage probability and the average goodput are derived in closed-form. Moreover, the asymptotic analyses are conducted for both high signal-to-noise ratio (SNR) (/small cell radius) and low SNR (/large cell radius) to gain more insightful results. In particular, the diversity order is given by $\delta = {{N_r} - {M} + 1}$, the average outage probability of $k$-th nearest user to the base station follows a scaling law of $O\left({D^{\alpha \left( {{N_r} - {M} + 1} \right) + 2k}}\right)$, the average goodput scales as $O({D^{2}})$ and $O({D^{-2}})$ as $D \to 0$ and $D \to \infty$, respectively, where $N_r$, $M$, $\alpha$ and $D$ stand for the number of receive antennas, the total number of data streams, the path loss exponent and the cell radius, respectively. The analytical results are finally validated through the numerical analysis.
Hybrid Stochastic Synapses Enabled by Scaled Ferroelectric Field-effect Transistors
Authors: A N M Nafiul Islam, Arnob Saha, Zhouhang Jiang, Kai Ni, Abhronil Sengupta
Abstract
Achieving brain-like density and performance in neuromorphic computers necessitates scaling down the size of nanodevices emulating neuro-synaptic functionalities. However, scaling nanodevices results in reduction of programming resolution and emergence of stochastic non-idealities. While prior work has mainly focused on binary transitions, in this work we leverage the stochastic switching of a three-state ferroelectric field effect transistor (FeFET) to implement a long-term and short-term 2-tier stochastic synaptic memory with a single device. Experimental measurements are performed on a scaled 28nm high-$k$ metal gate technology based device to develop a probabilistic model of the hybrid stochastic synapse. In addition to the advantage of ultra-low programming energies afforded by scaling, our hardware-algorithm co-design analysis reveals the efficacy of the 2-tier memory in comparison to binary stochastic synapses in on-chip learning tasks -- paving the way for algorithms exploiting multi-state devices with probabilistic transitions beyond deterministic ones.
Abstract
Visual-inertial navigation systems are powerful in their ability to accurately estimate localization of mobile systems within complex environments that preclude the use of global navigation satellite systems. However, these navigation systems are reliant on accurate and up-to-date temporospatial calibrations of the sensors being used. As such, online estimators for these parameters are useful in resilient systems. This paper presents an extension to existing Kalman Filter based frameworks for estimating and calibrating the extrinsic parameters of multi-camera IMU systems. In addition to extending the filter framework to include multiple camera sensors, the measurement model was reformulated to make use of measurement data that is typically made available in fiducial detection software. A secondary filter layer was used to estimate time translation parameters without closed-loop feedback of sensor data. Experimental calibration results, including the use of cameras with non-overlapping fields of view, were used to validate the stability and accuracy of the filter formulation when compared to offline methods. Finally the generalized filter code has been open-sourced and is available online.
SmartMocap: Joint Estimation of Human and Camera Motion using Uncalibrated RGB Cameras
Authors: Nitin Saini, Chun-hao P. Huang, Michael J. Black, Aamir Ahmad
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem. Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system. The calibration step has to be done a priori for every capture session, which is a tedious process, and re-calibration is required whenever cameras are intentionally or accidentally moved. In this paper, we propose a mocap method which uses multiple static and moving extrinsically uncalibrated RGB cameras. The key components of our method are as follows. First, since the cameras and the subject can move freely, we select the ground plane as a common reference to represent both the body and the camera motions unlike existing methods which represent bodies in the camera coordinate. Second, we learn a probability distribution of short human motion sequences ($\sim$1sec) relative to the ground plane and leverage it to disambiguate between the camera and human motion. Third, we use this distribution as a motion prior in a novel multi-stage optimization approach to fit the SMPL human body model and the camera poses to the human body keypoints on the images. Finally, we show that our method can work on a variety of datasets ranging from aerial cameras to smartphones. It also gives more accurate results compared to the state-of-the-art on the task of monocular human mocap with a static camera. Our code is available for research purposes on https://github.com/robot-perception-group/SmartMocap.
DTact: A Vision-Based Tactile Sensor that Measures High-Resolution 3D Geometry Directly from Darkness
Abstract
Vision-based tactile sensors that can measure 3D geometry of the contacting objects are crucial for robots to perform dexterous manipulation tasks. However, the existing sensors are usually complicated to fabricate and delicate to extend. In this work, we novelly take advantage of the reflection property of semitransparent elastomer to design a robust, low-cost, and easy-to-fabricate tactile sensor named DTact. DTact measures high-resolution 3D geometry accurately from the darkness shown in the captured tactile images with only a single image for calibration. In contrast to previous sensors, DTact is robust under various illumination conditions. Then, we build prototypes of DTact that have non-planar contact surfaces with minimal extra efforts and costs. Finally, we perform two intelligent robotic tasks including pose estimation and object recognition using DTact, in which DTact shows large potential in applications.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Efficiently Learning Recoveries from Failures Under Partial Observability
Reconstruction-guided attention improves the robustness and shape processing of neural networks
Falsification before Extrapolation in Causal Effect Estimation
Inducing Data Amplification Using Auxiliary Datasets in Adversarial Training
Keyword: scaling
Scalable and Equivariant Spherical CNNs by Discrete-Continuous (DISCO) Convolutions
Performance Analysis of MIMO-NOMA Systems with Randomly Deployed Users
Hybrid Stochastic Synapses Enabled by Scaled Ferroelectric Field-effect Transistors
Keyword: calibration
Online Multi Camera-IMU Calibration
SmartMocap: Joint Estimation of Human and Camera Motion using Uncalibrated RGB Cameras
DTact: A Vision-Based Tactile Sensor that Measures High-Resolution 3D Geometry Directly from Darkness