Abstract
Unsupervised clustering algorithms for vectors has been widely used in the area of machine learning. Many applications, including the biological data we studied in this paper, contain some boundary datapoints which show combination properties of two underlying clusters and could lower the performance of the traditional clustering algorithms. We develop a confident clustering method aiming to diminish the influence of these datapoints and improve the clustering results. Concretely, for a list of datapoints, we give two clustering results. The first-round clustering attempts to classify only pure vectors with high confidence. Based on it, we classify more vectors with less confidence in the second round. We validate our algorithm on single-cell RNA-seq data, which is a powerful and widely used tool in biology area. Our confident clustering shows a high accuracy on our tested datasets. In addition, unlike traditional clustering methods in single-cell analysis, the confident clustering shows high stability under different choices of parameters.
A BCS-GDE Multi-objective Optimization Algorithm for Combined Cooling, Heating and Power Model with Decision Strategies
Abstract
District energy systems can not only reduce energy consumption but also set energy supply dispatching schemes according to demand. In addition to economic cost, energy consumption and pollutant are more worthy of attention when evaluating combined cooling, heating and power (CCHP) models. In this paper, the CCHP model is established with the objective of economic cost, primary energy consumption, and pollutant emissions. The mathematical expression of the CCHP system is proposed, and a multi-objective optimization model with constraints is established. According to different usage requirements, two decision-making strategies are designed, which can adapt to different scenarios. Besides, a generalized differential evolution with the best compromise solution processing mechanism (BCS-GDE) algorithm is proposed to optimize the CCHP model for the first time. The algorithm provides the optimal energy scheduling scheme by optimizing the production capacity of different capacity equipment. The simulation is conducted in three application scenarios: hotels, offices, and residential buildings. The simulation results show that the model established in this paper can reduce economic cost by 72%, primary energy consumption by 73%, and pollutant emission by 88%. Concurrently, the Wilcoxon signed-rank test shows that BCSGDE is significantly better than OMOPSO, NSGA-II, and SPEA2 with greater than 95% confidence.
Action parsing using context features
Authors: Nagita Mehrseresht
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
We propose an action parsing algorithm to parse a video sequence containing an unknown number of actions into its action segments. We argue that context information, particularly the temporal information about other actions in the video sequence, is valuable for action segmentation. The proposed parsing algorithm temporally segments the video sequence into action segments. The optimal temporal segmentation is found using a dynamic programming search algorithm that optimizes the overall classification confidence score. The classification score of each segment is determined using local features calculated from that segment as well as context features calculated from other candidate action segments of the sequence. Experimental results on the Breakfast activity data-set showed improved segmentation accuracy compared to existing state-of-the-art parsing techniques.
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
Authors: Wenxuan Wang, Wenxiang Jiao, Shuo Wang, Zhaopeng Tu, Michael R. Lyu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Zero-shot translation is a promising direction for building a comprehensive multilingual neural machine translation (MNMT) system. However, its quality is still not satisfactory due to off-target issues. In this paper, we aim to understand and alleviate the off-target issues from the perspective of uncertainty in zero-shot translation. By carefully examining the translation output and model confidence, we identify two uncertainties that are responsible for the off-target issues, namely, extrinsic data uncertainty and intrinsic model uncertainty. Based on the observations, we propose two light-weight and complementary approaches to denoise the training data for model training, and mask out the vocabulary of the off-target languages in inference. Extensive experiments on both balanced and unbalanced datasets show that our approaches significantly improve the performance of zero-shot translation over strong MNMT baselines. Qualitative analyses provide insights into where our approaches reduce off-target translations
Estimation of binary time-frequency masks from ambient noise
Authors: José Luis Romero, Michael Speckbacher
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Functional Analysis (math.FA); Statistics Theory (math.ST)
Abstract
We investigate the retrieval of a binary time-frequency mask from a few observations of filtered white ambient noise. Confirming household wisdom in acoustic modeling, we show that this is possible by inspecting the average spectrogram of ambient noise. Specifically, we show that the lower quantile of the average of $\mathcal{O}(\log(|\Omega|/\varepsilon))$ masked spectrograms is enough to identify a rather general mask $\Omega$ with confidence at least $\varepsilon$, up to shape details concentrated near the boundary of $\Omega$. As an application, the expected measure of the estimation error is dominated by the perimeter of the time-frequency mask. The estimator requires no knowledge of the noise variance, and only a very qualitative profile of the filtering window, but no exact knowledge of it.
Keyword: scaling
Second-order uniformly asymptotic-preserving space-time-ImEx schemes for hyperbolic balance laws with stiff relaxation
Authors: Louis Reboul (CMAP), Teddy Pichard (CMAP), Marc Massot (CMAP)
Abstract
We consider hyperbolic systems of conservation laws with relaxation source terms leading to a diffusive asymptotic limit under a parabolic scaling. We introduce a new class of secondorder in time and space numerical schemes, which are uniformly asymptotic preserving schemes. The proposed Implicit-Explicit (ImEx) approach, does not follow the usual path relying on the method of lines, either with multi-step methods or Runge-Kutta methods, or semi-discretized in time equations, but is inspired from the Lax-Wendroff approach with the proper level of implicit treatment of the source term. As a result, it yields a very compact stencil in space and time and we are able to rigorously show that both the second-order accuracy and the stability conditions are independent of the fast scales in the asymptotic regime, including the study of boundary conditions. We provide an original derivation of l 2 and l $\infty$ stability conditions of the scheme that do not deteriorate the second order accuracy without relying on a limiter of any type in the linear case, in particular for shock solutions, and extend such results to the nonlinear case, showing the novelty of the method. The prototype system for the linear case is the hyperbolic heat equation, whereas barotropic Euler equations of gas dynamics with friction are the one for the nonlinear case. The method is also able to yield very accurate steady solutions in the nonlinear case when the relaxation coefficient in the source term depends on space. A thorough numerical assessment of the proposed strategy is provided by investigating smooth solutions, solutions with shocks and solutions leading to a steady state with space dependent relaxation coefficient.
Self-Paced Multi-Agent Reinforcement Learning
Authors: Wenshuai Zhao, Joni Pajarinen
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract
Curriculum reinforcement learning (CRL) aims to speed up learning of a task by changing gradually the difficulty of the task from easy to hard through control of factors such as initial state or environment dynamics. While automating CRL is well studied in the single-agent setting, in multi-agent reinforcement learning (MARL) an open question is whether control of the number of agents with other factors in a principled manner is beneficial, prior approaches typically relying on hand-crafted heuristics. In addition, how the tasks evolve as the number of agents changes remains understudied, which is critical for scaling to more challenging tasks. We introduce self-paced MARL (SPMARL) that enables optimizing the number of agents with other environment factors in a principled way, and, show that usual assumptions such as that fewer agents make the task always easier are not generally valid. The curriculum induced by SPMARL reveals the evolution of tasks w.r.t. number of agents and experiments show that SPMARL improves the performance when the number of agents sufficiently influences task difficulty.
The role of the Big Geographic Sort in the circulation of misinformation among U.S. Reddit users
Authors: Lia Bozarth, Daniele Quercia, Licia Capra, Sanja Scepanovic
Abstract
Past research has attributed the online circulation of misinformation to two main factors - individual characteristics (e.g., a person's information literacy) and social media effects (e.g., algorithm-mediated information diffusion) - and has overlooked a third one: the critical mass created by the offline self-segregation of Americans into like-minded geographical regions such as states (a phenomenon called "The Big Sort"). We hypothesized that this latter factor matters for the online spreading of misinformation not least because online interactions, despite having the potential of being global, end up being localized: interaction probability is known to rapidly decay with distance. Upon analysis of more than 8M Reddit comments containing news links spanning four years, from January 2016 to December 2019, we found that Reddit did not work as an "hype machine" for misinformation (as opposed to what previous work reported for other platforms, circulation was not mainly caused by platform-facilitated network effects) but worked as a supply-and-demand system: misinformation news items scaled linearly with the number of users in each state (with a scaling exponent beta=1, and a goodness of fit R2 = 0.95). Furthermore, deviations from such a universal pattern were best explained by state-level personality and cultural factors (R2 = {0.12, 0.39}), rather than socioeconomic conditions (R2 = {0.15, 0.29}) or, as one would expect, political characteristics (R2 ={0.06, 0.21}). Higher-than-expected circulation of any type of news (including reputable news) was found in states characterised by residents who tend to be less diligent in terms of their personality (low in conscientiousness) and by loose cultures understating the importance of adherence to norms (low in cultural tightness).
EXODUS: Stable and Efficient Training of Spiking Neural Networks
Authors: Felix Christian Bauer (1), Gregor Lenz (1), Saeid Haghighatshoar (1), Sadique Sheik (1) ((1) SynSense)
Abstract
Spiking Neural Networks (SNNs) are gaining significant traction in machine learning tasks where energy-efficiency is of utmost importance. Training such networks using the state-of-the-art back-propagation through time (BPTT) is, however, very time-consuming. Previous work by Shrestha and Orchard [2018] employs an efficient GPU-accelerated back-propagation algorithm called SLAYER, which speeds up training considerably. SLAYER, however, does not take into account the neuron reset mechanism while computing the gradients, which we argue to be the source of numerical instability. To counteract this, SLAYER introduces a gradient scale hyperparameter across layers, which needs manual tuning. In this paper, (i) we modify SLAYER and design an algorithm called EXODUS, that accounts for the neuron reset mechanism and applies the Implicit Function Theorem (IFT) to calculate the correct gradients (equivalent to those computed by BPTT), (ii) we eliminate the need for ad-hoc scaling of gradients, thus, reducing the training complexity tremendously, (iii) we demonstrate, via computer simulations, that EXODUS is numerically stable and achieves a comparable or better performance than SLAYER especially in various tasks with SNNs that rely on temporal features. Our code is available at https://github.com/synsense/sinabs-exodus.
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Abstract
Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD. Analogous study of adaptive gradient methods, such as RMSprop and Adam, has been challenging because there were no rigorously proven SDE approximations for these methods. This paper derives the SDE approximations for RMSprop and Adam, giving theoretical guarantees of their correctness as well as experimental validation of their applicability to common large-scaling vision and language settings. A key practical result is the derivation of a $\textit{square root scaling rule}$ to adjust the optimization hyperparameters of RMSprop and Adam when changing batch size, and its empirical validation in deep learning settings.
ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities
Abstract
Entity alignment (EA) aims at finding equivalent entities in different knowledge graphs (KGs). Embedding-based approaches have dominated the EA task in recent years. Those methods face problems that come from the geometric properties of embedding vectors, including hubness and isolation. To solve these geometric problems, many normalization approaches have been adopted to EA. However, the increasing scale of KGs renders it is hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. ClusterEA contains three components to align entities between large-scale KGs, including stochastic training, ClusterSampler, and SparseFusion. It first trains a large-scale Siamese GNN for EA in a stochastic fashion to produce entity embeddings. Based on the embeddings, a novel ClusterSampler strategy is proposed for sampling highly overlapped mini-batches. Finally, ClusterEA incorporates SparseFusion, which normalizes local and global similarity and then fuses all similarity matrices to obtain the final similarity matrix. Extensive experiments with real-life datasets on EA benchmarks offer insight into the proposed framework, and suggest that it is capable of outperforming the state-of-the-art scalable EA framework by up to 8 times in terms of Hits@1.
Keyword: calibration
Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems
Abstract
Calibration is defined as the ratio of the average predicted click rate to the true click rate. The optimization of calibration is essential to many online advertising recommendation systems because it directly affects the downstream bids in ads auctions and the amount of money charged to advertisers. Despite its importance, calibration optimization often suffers from a problem called "maximization bias". Maximization bias refers to the phenomenon that the maximum of predicted values overestimates the true maximum. The problem is introduced because the calibration is computed on the set selected by the prediction model itself. It persists even if unbiased predictions can be achieved on every datapoint and worsens when covariate shifts exist between the training and test sets. To mitigate this problem, we theorize the quantification of maximization bias and propose a variance-adjusting debiasing (VAD) meta-algorithm in this paper. The algorithm is efficient, robust, and practical as it is able to mitigate maximization bias problems under covariate shifts, neither incurring additional online serving costs nor compromising the ranking performance. We demonstrate the effectiveness of the proposed algorithm using a state-of-the-art recommendation neural network model on a large-scale real-world dataset.
Abstract
In this paper, we study the problem of consistency in the context of adversarial examples. Specifically, we tackle the following question: can surrogate losses still be used as a proxy for minimizing the $0/1$ loss in the presence of an adversary that alters the inputs at test-time? Different from the standard classification task, this question cannot be reduced to a point-wise minimization problem, and calibration needs not to be sufficient to ensure consistency. In this paper, we expose some pathological behaviors specific to the adversarial problem, and show that no convex surrogate loss can be consistent or calibrated in this context. It is therefore necessary to design another class of surrogate functions that can be used to solve the adversarial consistency issue. As a first step towards designing such a class, we identify sufficient and necessary conditions for a surrogate loss to be calibrated in both the adversarial and standard settings. Finally, we give some directions for building a class of losses that could be consistent in the adversarial framework.
Converting Artificial Neural Networks to Spiking Neural Networks via Parameter Calibration
Authors: Yuhang Li, Shikuang Deng, Xin Dong, Shi Gu
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Abstract
Spiking Neural Network (SNN), originating from the neural behavior in biology, has been recognized as one of the next-generation neural networks. Conventionally, SNNs can be obtained by converting from pre-trained Artificial Neural Networks (ANNs) by replacing the non-linear activation with spiking neurons without changing the parameters. In this work, we argue that simply copying and pasting the weights of ANN to SNN inevitably results in activation mismatch, especially for ANNs that are trained with batch normalization (BN) layers. To tackle the activation mismatch issue, we first provide a theoretical analysis by decomposing local conversion error to clipping error and flooring error, and then quantitatively measure how this error propagates throughout the layers using the second-order analysis. Motivated by the theoretical results, we propose a set of layer-wise parameter calibration algorithms, which adjusts the parameters to minimize the activation mismatch. Extensive experiments for the proposed algorithms are performed on modern architectures and large-scale tasks including ImageNet classification and MS COCO detection. We demonstrate that our method can handle the SNN conversion with batch normalization layers and effectively preserve the high accuracy even in 32 time steps. For example, our calibration algorithms can increase up to 65% accuracy when converting VGG-16 with BN layers.
Prototypical Calibration for Few-shot Learning of Language Models
Authors: Zhixiong Han, Yaru Hao, Li Dong, Furu Wei
Abstract
In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations. In this work, we propose prototypical calibration to adaptively learn a more robust decision boundary for zero- and few-shot classification, instead of greedy decoding. Concretely, our method first adopts Gaussian mixture distribution to estimate the prototypical clusters for all categories. Then we assign each cluster to the corresponding label by solving a weighted bipartite matching problem. Given an example, its prediction is calibrated by the likelihood of prototypical clusters. Experimental results show that prototypical calibration yields a 15% absolute improvement on a diverse set of tasks. Extensive analysis across different scales also indicates that our method calibrates the decision boundary as expected, greatly improving the robustness of GPT to templates, permutations, and class imbalance.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Confident Clustering via PCA Compression Ratio and Its Application to Single-cell RNA-seq Analysis
A BCS-GDE Multi-objective Optimization Algorithm for Combined Cooling, Heating and Power Model with Decision Strategies
Action parsing using context features
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
Estimation of binary time-frequency masks from ambient noise
Keyword: scaling
Second-order uniformly asymptotic-preserving space-time-ImEx schemes for hyperbolic balance laws with stiff relaxation
Self-Paced Multi-Agent Reinforcement Learning
The role of the Big Geographic Sort in the circulation of misinformation among U.S. Reddit users
EXODUS: Stable and Efficient Training of Spiking Neural Networks
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities
Keyword: calibration
Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems
Towards Consistency in Adversarial Classification
Converting Artificial Neural Networks to Spiking Neural Networks via Parameter Calibration
Prototypical Calibration for Few-shot Learning of Language Models