Abstract
To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by distilling the internal knowledge of the model itself. Conventional self-knowledge distillation methods require additional trainable parameters or are dependent on the data. In this paper, we propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout). SD-Dropout distills the posterior distributions of multiple models through a dropout sampling. Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations. Furthermore, this simple method can be easily combined with various self-knowledge distillation approaches. We provide a theoretical and experimental analysis of the effect of forward and reverse KL-divergences in our work. Extensive experiments on various vision tasks, i.e., image classification, object detection, and distribution shift, demonstrate that the proposed method can effectively improve the generalization of a single network. Further experiments show that the proposed method also improves calibration performance, adversarial robustness, and out-of-distribution detection ability.
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Regret Analysis for Hierarchical Experts Bandit Problem
Authors: Qihan Guo (1), Siwei Wang (1), Jun Zhu (1) ((1) Tsinghua University)
Abstract
We study an extension of standard bandit problem in which there are R layers of experts. Multi-layered experts make selections layer by layer and only the experts in the last layer can play arms. The goal of the learning policy is to minimize the total regret in this hierarchical experts setting. We first analyze the case that total regret grows linearly with the number of layers. Then we focus on the case that all experts are playing Upper Confidence Bound (UCB) strategy and give several sub-linear upper bounds for different circumstances. Finally, we design some experiments to help the regret analysis for the general case of hierarchical UCB structure and show the practical significance of our theoretical results. This article gives many insights about reasonable hierarchical decision structure.
Abstract
We investigate the problem of best policy identification in discounted linear Markov Decision Processes in the fixed confidence setting under a generative model. We first derive an instance-specific lower bound on the expected number of samples required to identify an $\varepsilon$-optimal policy with probability $1-\delta$. The lower bound characterizes the optimal sampling rule as the solution of an intricate non-convex optimization program, but can be used as the starting point to devise simple and near-optimal sampling rules and algorithms. We devise such algorithms. One of these exhibits a sample complexity upper bounded by ${\cal O}({\frac{d}{(\varepsilon+\Delta)^2}} (\log(\frac{1}{\delta})+d))$ where $\Delta$ denotes the minimum reward gap of sub-optimal actions and $d$ is the dimension of the feature space. This upper bound holds in the moderate-confidence regime (i.e., for all $\delta$), and matches existing minimax and gap-dependent lower bounds. We extend our algorithm to episodic linear MDPs.
A Modified UDP for Federated Learning Packet Transmissions
Abstract
This paper introduces a Modified User Datagram Protocol (UDP) for Federated Learning to ensure efficiency and reliability in the model parameter transport process, maximizing the potential of the Global model in each Federated Learning round. In developing and testing this protocol, the NS3 simulator is utilized to simulate the packet transport over the network and Google TensorFlow is used to create a custom Federated learning environment. In this preliminary implementation, the simulation contains three nodes where two nodes are client nodes, and one is a server node. The results obtained in this paper provide confidence in the capabilities of the protocol in the future of Federated Learning therefore, in future the Modified UDP will be tested on a larger Federated learning system with a TensorFlow model containing more parameters and a comparison between the traditional UDP protocol and the Modified UDP protocol will be simulated. Optimization of the Modified UDP will also be explored to improve efficiency while ensuring reliability.
Uncertainty Quantification of Sparse Travel Demand Prediction with Spatial-Temporal Graph Neural Networks
Authors: Dingyi Zhuang, Shenhao Wang, Haris N. Koutsopoulos, Jinhua Zhao
Abstract
Origin-Destination (O-D) travel demand prediction is a fundamental challenge in transportation. Recently, spatial-temporal deep learning models demonstrate the tremendous potential to enhance prediction accuracy. However, few studies tackled the uncertainty and sparsity issues in fine-grained O-D matrices. This presents a serious problem, because a vast number of zeros deviate from the Gaussian assumption underlying the deterministic deep learning models. To address this issue, we design a Spatial-Temporal Zero-Inflated Negative Binomial Graph Neural Network (STZINB-GNN) to quantify the uncertainty of the sparse travel demand. It analyzes spatial and temporal correlations using diffusion and temporal convolution networks, which are then fused to parameterize the probabilistic distributions of travel demand. The STZINB-GNN is examined using two real-world datasets with various spatial and temporal resolutions. The results demonstrate the superiority of STZINB-GNN over benchmark models, especially under high spatial-temporal resolutions, because of its high accuracy, tight confidence intervals, and interpretable parameters. The sparsity parameter of the STZINB-GNN has physical interpretation for various transportation applications.
Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace
Abstract
In this paper we develop a novel regularization method for deep neural networks by penalizing the trace of Hessian. This regularizer is motivated by a recent guarantee bound of the generalization error. Hutchinson method is a classical unbiased estimator for the trace of a matrix, but it is very time-consuming on deep learning models. Hence a dropout scheme is proposed to efficiently implements the Hutchinson method. Then we discuss a connection to linear stability of a nonlinear dynamical system and flat/sharp minima. Experiments demonstrate that our method outperforms existing regularizers and data augmentation methods, such as Jacobian, confidence penalty, and label smoothing, cutout and mixup.
Keyword: scaling
Modular Extremely Large-Scale Array Communication: Near-Field Modelling and Performance Analysis
Abstract
This paper investigates wireless communications based on a new antenna array architecture, termed modular extremely large-scale array (XL-array), where an extremely large number of antenna elements are regularly arranged on a common platform in a modular manner. Each module consists of a flexible/moderate number of antenna elements, and different modules are separated with an inter-module spacing that is typically much larger than the inter-element spacing/signal wavelength for ease of deployment. By properly modelling the variations of signal phase, amplitude and projected aperture across different array modules/elements, we develop the new channel model and analyze the signal-to-noise ratio (SNR) performance of the modular XL-array based communications. Under the practical non-uniform spherical wave (NUSW) model, the closed-form expression of the maximum achievable SNR is derived in terms of key geometric parameters, including the total planar array size, module separation distances along each dimension, as well as the user's location in the three-dimensional (3D) space. Besides, the asymptotic SNR scaling laws are revealed as the number of modules along different dimensions goes to infinity. Moreover, we show that our developed near-field modelling and performance analysis include the existing ones for the collocated XL-array, the far-field uniform plane wave (UPW) model, as well as the one-dimensional (1D) modular extremely large-scale uniform linear array (XL-ULA) as special cases. Extensive simulation results are provided to validate our obtained results.
Uncertainty Quantification for Traffic Forecasting: A Unified Approach
Authors: Weizhu Qian, Dalin Zhang, Yan Zhao, Kai Zheng, James J.Q. Yu
Abstract
Uncertainty is an essential consideration for time series forecasting tasks. In this work, we specifically focus on quantifying the uncertainty of traffic forecasting. To achieve this, we develop Deep Spatio-Temporal Uncertainty Quantification (DeepSTUQ), which can estimate both aleatoric and epistemic uncertainty. We first leverage a spatio-temporal model to model the complex spatio-temporal correlations of traffic data. Subsequently, two independent sub-neural networks maximizing the heterogeneous log-likelihood are developed to estimate aleatoric uncertainty. For estimating epistemic uncertainty, we combine the merits of variational inference and deep ensembling by integrating the Monte Carlo dropout and the Adaptive Weight Averaging re-training methods, respectively. Finally, we propose a post-processing calibration approach based on Temperature Scaling, which improves the model's generalization ability to estimate uncertainty. Extensive experiments are conducted on four public datasets, and the empirical results suggest that the proposed method outperforms state-of-the-art methods in terms of both point prediction and uncertainty quantification.
Keyword: calibration
Self-Knowledge Distillation via Dropout
Authors: Hyoje Lee, Yeachan Park, Hyun Seo, Myungjoo Kang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by distilling the internal knowledge of the model itself. Conventional self-knowledge distillation methods require additional trainable parameters or are dependent on the data. In this paper, we propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout). SD-Dropout distills the posterior distributions of multiple models through a dropout sampling. Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations. Furthermore, this simple method can be easily combined with various self-knowledge distillation approaches. We provide a theoretical and experimental analysis of the effect of forward and reverse KL-divergences in our work. Extensive experiments on various vision tasks, i.e., image classification, object detection, and distribution shift, demonstrate that the proposed method can effectively improve the generalization of a single network. Further experiments show that the proposed method also improves calibration performance, adversarial robustness, and out-of-distribution detection ability.
Uncertainty Quantification for Traffic Forecasting: A Unified Approach
Authors: Weizhu Qian, Dalin Zhang, Yan Zhao, Kai Zheng, James J.Q. Yu
Abstract
Uncertainty is an essential consideration for time series forecasting tasks. In this work, we specifically focus on quantifying the uncertainty of traffic forecasting. To achieve this, we develop Deep Spatio-Temporal Uncertainty Quantification (DeepSTUQ), which can estimate both aleatoric and epistemic uncertainty. We first leverage a spatio-temporal model to model the complex spatio-temporal correlations of traffic data. Subsequently, two independent sub-neural networks maximizing the heterogeneous log-likelihood are developed to estimate aleatoric uncertainty. For estimating epistemic uncertainty, we combine the merits of variational inference and deep ensembling by integrating the Monte Carlo dropout and the Adaptive Weight Averaging re-training methods, respectively. Finally, we propose a post-processing calibration approach based on Temperature Scaling, which improves the model's generalization ability to estimate uncertainty. Extensive experiments are conducted on four public datasets, and the empirical results suggest that the proposed method outperforms state-of-the-art methods in terms of both point prediction and uncertainty quantification.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
Self-Knowledge Distillation via Dropout
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Regret Analysis for Hierarchical Experts Bandit Problem
Best Policy Identification in Linear MDPs
A Modified UDP for Federated Learning Packet Transmissions
Uncertainty Quantification of Sparse Travel Demand Prediction with Spatial-Temporal Graph Neural Networks
Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace
Keyword: scaling
Modular Extremely Large-Scale Array Communication: Near-Field Modelling and Performance Analysis
Uncertainty Quantification for Traffic Forecasting: A Unified Approach
Keyword: calibration
Self-Knowledge Distillation via Dropout
Uncertainty Quantification for Traffic Forecasting: A Unified Approach