New submissions for Fri, 12 Aug 22

Keyword: out of distribution detection

There is no result

Keyword: out-of-distribution detection

Self-Knowledge Distillation via Dropout

Authors: Hyoje Lee, Yeachan Park, Hyun Seo, Myungjoo Kang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.05642
Pdf link: https://arxiv.org/pdf/2208.05642
Abstract To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by distilling the internal knowledge of the model itself. Conventional self-knowledge distillation methods require additional trainable parameters or are dependent on the data. In this paper, we propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout). SD-Dropout distills the posterior distributions of multiple models through a dropout sampling. Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations. Furthermore, this simple method can be easily combined with various self-knowledge distillation approaches. We provide a theoretical and experimental analysis of the effect of forward and reverse KL-divergences in our work. Extensive experiments on various vision tasks, i.e., image classification, object detection, and distribution shift, demonstrate that the proposed method can effectively improve the generalization of a single network. Further experiments show that the proposed method also improves calibration performance, adversarial robustness, and out-of-distribution detection ability.
Keyword: expected calibration error

There is no result

Keyword: overconfident

There is no result

Keyword: overconfidence

There is no result

Keyword: confidence

Regret Analysis for Hierarchical Experts Bandit Problem
Authors: Qihan Guo (1), Siwei Wang (1), Jun Zhu (1) ((1) Tsinghua University)
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.05622
Pdf link: https://arxiv.org/pdf/2208.05622
Abstract We study an extension of standard bandit problem in which there are R layers of experts. Multi-layered experts make selections layer by layer and only the experts in the last layer can play arms. The goal of the learning policy is to minimize the total regret in this hierarchical experts setting. We first analyze the case that total regret grows linearly with the number of layers. Then we focus on the case that all experts are playing Upper Confidence Bound (UCB) strategy and give several sub-linear upper bounds for different circumstances. Finally, we design some experiments to help the regret analysis for the general case of hierarchical UCB structure and show the practical significance of our theoretical results. This article gives many insights about reasonable hierarchical decision structure.
Best Policy Identification in Linear MDPs
Authors: Jerome Taupin, Yassir Jedra, Alexandre Proutiere
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2208.05633
Pdf link: https://arxiv.org/pdf/2208.05633
Abstract We investigate the problem of best policy identification in discounted linear Markov Decision Processes in the fixed confidence setting under a generative model. We first derive an instance-specific lower bound on the expected number of samples required to identify an $\varepsilon$-optimal policy with probability $1-\delta$. The lower bound characterizes the optimal sampling rule as the solution of an intricate non-convex optimization program, but can be used as the starting point to devise simple and near-optimal sampling rules and algorithms. We devise such algorithms. One of these exhibits a sample complexity upper bounded by ${\cal O}({\frac{d}{(\varepsilon+\Delta)^2}} (\log(\frac{1}{\delta})+d))$ where $\Delta$ denotes the minimum reward gap of sub-optimal actions and $d$ is the dimension of the feature space. This upper bound holds in the moderate-confidence regime (i.e., for all $\delta$), and matches existing minimax and gap-dependent lower bounds. We extend our algorithm to episodic linear MDPs.
A Modified UDP for Federated Learning Packet Transmissions
Authors: Bright Kudzaishe Mahembe, Clement Nyirenda
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.05737
Pdf link: https://arxiv.org/pdf/2208.05737
Abstract This paper introduces a Modified User Datagram Protocol (UDP) for Federated Learning to ensure efficiency and reliability in the model parameter transport process, maximizing the potential of the Global model in each Federated Learning round. In developing and testing this protocol, the NS3 simulator is utilized to simulate the packet transport over the network and Google TensorFlow is used to create a custom Federated learning environment. In this preliminary implementation, the simulation contains three nodes where two nodes are client nodes, and one is a server node. The results obtained in this paper provide confidence in the capabilities of the protocol in the future of Federated Learning therefore, in future the Modified UDP will be tested on a larger Federated learning system with a TensorFlow model containing more parameters and a comparison between the traditional UDP protocol and the Modified UDP protocol will be simulated. Optimization of the Modified UDP will also be explored to improve efficiency while ensuring reliability.
Uncertainty Quantification of Sparse Travel Demand Prediction with Spatial-Temporal Graph Neural Networks
Authors: Dingyi Zhuang, Shenhao Wang, Haris N. Koutsopoulos, Jinhua Zhao
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2208.05908
Pdf link: https://arxiv.org/pdf/2208.05908
Abstract Origin-Destination (O-D) travel demand prediction is a fundamental challenge in transportation. Recently, spatial-temporal deep learning models demonstrate the tremendous potential to enhance prediction accuracy. However, few studies tackled the uncertainty and sparsity issues in fine-grained O-D matrices. This presents a serious problem, because a vast number of zeros deviate from the Gaussian assumption underlying the deterministic deep learning models. To address this issue, we design a Spatial-Temporal Zero-Inflated Negative Binomial Graph Neural Network (STZINB-GNN) to quantify the uncertainty of the sparse travel demand. It analyzes spatial and temporal correlations using diffusion and temporal convolution networks, which are then fused to parameterize the probabilistic distributions of travel demand. The STZINB-GNN is examined using two real-world datasets with various spatial and temporal resolutions. The results demonstrate the superiority of STZINB-GNN over benchmark models, especially under high spatial-temporal resolutions, because of its high accuracy, tight confidence intervals, and interpretable parameters. The sparsity parameter of the STZINB-GNN has physical interpretation for various transportation applications.
Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace
Authors: Yucong Liu, Shixing Yu, Tong Lin
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.05924
Pdf link: https://arxiv.org/pdf/2208.05924
Abstract In this paper we develop a novel regularization method for deep neural networks by penalizing the trace of Hessian. This regularizer is motivated by a recent guarantee bound of the generalization error. Hutchinson method is a classical unbiased estimator for the trace of a matrix, but it is very time-consuming on deep learning models. Hence a dropout scheme is proposed to efficiently implements the Hutchinson method. Then we discuss a connection to linear stability of a nonlinear dynamical system and flat/sharp minima. Experiments demonstrate that our method outperforms existing regularizers and data augmentation methods, such as Jacobian, confidence penalty, and label smoothing, cutout and mixup.
Keyword: scaling

Modular Extremely Large-Scale Array Communication: Near-Field Modelling and Performance Analysis
Authors: Xinrui Li, Haiquan Lu, Yong Zeng, Shi Jin, Rui Zhang
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2208.05691
Pdf link: https://arxiv.org/pdf/2208.05691
Abstract This paper investigates wireless communications based on a new antenna array architecture, termed modular extremely large-scale array (XL-array), where an extremely large number of antenna elements are regularly arranged on a common platform in a modular manner. Each module consists of a flexible/moderate number of antenna elements, and different modules are separated with an inter-module spacing that is typically much larger than the inter-element spacing/signal wavelength for ease of deployment. By properly modelling the variations of signal phase, amplitude and projected aperture across different array modules/elements, we develop the new channel model and analyze the signal-to-noise ratio (SNR) performance of the modular XL-array based communications. Under the practical non-uniform spherical wave (NUSW) model, the closed-form expression of the maximum achievable SNR is derived in terms of key geometric parameters, including the total planar array size, module separation distances along each dimension, as well as the user's location in the three-dimensional (3D) space. Besides, the asymptotic SNR scaling laws are revealed as the number of modules along different dimensions goes to infinity. Moreover, we show that our developed near-field modelling and performance analysis include the existing ones for the collocated XL-array, the far-field uniform plane wave (UPW) model, as well as the one-dimensional (1D) modular extremely large-scale uniform linear array (XL-ULA) as special cases. Extensive simulation results are provided to validate our obtained results.
Uncertainty Quantification for Traffic Forecasting: A Unified Approach
Authors: Weizhu Qian, Dalin Zhang, Yan Zhao, Kai Zheng, James J.Q. Yu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.05875
Pdf link: https://arxiv.org/pdf/2208.05875
Abstract Uncertainty is an essential consideration for time series forecasting tasks. In this work, we specifically focus on quantifying the uncertainty of traffic forecasting. To achieve this, we develop Deep Spatio-Temporal Uncertainty Quantification (DeepSTUQ), which can estimate both aleatoric and epistemic uncertainty. We first leverage a spatio-temporal model to model the complex spatio-temporal correlations of traffic data. Subsequently, two independent sub-neural networks maximizing the heterogeneous log-likelihood are developed to estimate aleatoric uncertainty. For estimating epistemic uncertainty, we combine the merits of variational inference and deep ensembling by integrating the Monte Carlo dropout and the Adaptive Weight Averaging re-training methods, respectively. Finally, we propose a post-processing calibration approach based on Temperature Scaling, which improves the model's generalization ability to estimate uncertainty. Extensive experiments are conducted on four public datasets, and the empirical results suggest that the proposed method outperforms state-of-the-art methods in terms of both point prediction and uncertainty quantification.
Keyword: calibration

Self-Knowledge Distillation via Dropout
Authors: Hyoje Lee, Yeachan Park, Hyun Seo, Myungjoo Kang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.05642
Pdf link: https://arxiv.org/pdf/2208.05642
Abstract To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by distilling the internal knowledge of the model itself. Conventional self-knowledge distillation methods require additional trainable parameters or are dependent on the data. In this paper, we propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout). SD-Dropout distills the posterior distributions of multiple models through a dropout sampling. Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations. Furthermore, this simple method can be easily combined with various self-knowledge distillation approaches. We provide a theoretical and experimental analysis of the effect of forward and reverse KL-divergences in our work. Extensive experiments on various vision tasks, i.e., image classification, object detection, and distribution shift, demonstrate that the proposed method can effectively improve the generalization of a single network. Further experiments show that the proposed method also improves calibration performance, adversarial robustness, and out-of-distribution detection ability.
Uncertainty Quantification for Traffic Forecasting: A Unified Approach
Authors: Weizhu Qian, Dalin Zhang, Yan Zhao, Kai Zheng, James J.Q. Yu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.05875
Pdf link: https://arxiv.org/pdf/2208.05875
Abstract Uncertainty is an essential consideration for time series forecasting tasks. In this work, we specifically focus on quantifying the uncertainty of traffic forecasting. To achieve this, we develop Deep Spatio-Temporal Uncertainty Quantification (DeepSTUQ), which can estimate both aleatoric and epistemic uncertainty. We first leverage a spatio-temporal model to model the complex spatio-temporal correlations of traffic data. Subsequently, two independent sub-neural networks maximizing the heterogeneous log-likelihood are developed to estimate aleatoric uncertainty. For estimating epistemic uncertainty, we combine the merits of variational inference and deep ensembling by integrating the Monte Carlo dropout and the Adaptive Weight Averaging re-training methods, respectively. Finally, we propose a post-processing calibration approach based on Temperature Scaling, which improves the model's generalization ability to estimate uncertainty. Extensive experiments are conducted on four public datasets, and the empirical results suggest that the proposed method outperforms state-of-the-art methods in terms of both point prediction and uncertainty quantification.

ericbeyer / L-arxiv-interest-tracker

New submissions for Fri, 12 Aug 22 #598

Keyword: out of distribution detection

Keyword: out-of-distribution detection

Self-Knowledge Distillation via Dropout

Keyword: expected calibration error

Keyword: overconfident

Keyword: overconfidence

Keyword: confidence

Regret Analysis for Hierarchical Experts Bandit Problem

Best Policy Identification in Linear MDPs

A Modified UDP for Federated Learning Packet Transmissions

Uncertainty Quantification of Sparse Travel Demand Prediction with Spatial-Temporal Graph Neural Networks

Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

Keyword: scaling

Modular Extremely Large-Scale Array Communication: Near-Field Modelling and Performance Analysis

Uncertainty Quantification for Traffic Forecasting: A Unified Approach

Keyword: calibration

Self-Knowledge Distillation via Dropout

Uncertainty Quantification for Traffic Forecasting: A Unified Approach