New submissions for Mon, 5 Jun 23

Keyword: efficient

Towards Fair Disentangled Online Learning for Changing Environments

Authors: Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Christan Grant, Feng Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01007
Pdf link: https://arxiv.org/pdf/2306.01007
Abstract In the problem of online learning for changing environments, data are sequentially received one after another over time, and their distribution assumptions may vary frequently. Although existing methods demonstrate the effectiveness of their learning algorithms by providing a tight bound on either dynamic regret or adaptive regret, most of them completely ignore learning with model fairness, defined as the statistical parity across different sub-population (e.g., race and gender). Another drawback is that when adapting to a new environment, an online learner needs to update model parameters with a global change, which is costly and inefficient. Inspired by the sparse mechanism shift hypothesis, we claim that changing environments in online learning can be attributed to partial changes in learned parameters that are specific to environments and the rest remain invariant to changing environments. To this end, in this paper, we propose a novel algorithm under the assumption that data collected at each time can be disentangled with two representations, an environment-invariant semantic factor and an environment-specific variation factor. The semantic factor is further used for fair prediction under a group fairness constraint. To evaluate the sequence of model parameters generated by the learner, a novel regret is proposed in which it takes a mixed form of dynamic and static regret metrics followed by a fairness-aware long-term constraint. The detailed analysis provides theoretical guarantees for loss regret and violation of cumulative fairness constraints. Empirical evaluations on real-world datasets demonstrate our proposed method sequentially outperforms baseline methods in model accuracy and fairness.
How to Estimate Model Transferability of Pre-Trained Speech Models?
Authors: Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shou-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.01015
Pdf link: https://arxiv.org/pdf/2306.01015
Abstract In this work, we introduce a ``score-based assessment'' framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates using the extracted representations. Our framework efficiently computes transferability scores without actual fine-tuning of candidate models or layers by making a temporal independent hypothesis. We evaluate some popular supervised speech models (e.g., Conformer RNN-Transducer) and self-supervised speech models (e.g., HuBERT) in cross-layer and cross-model settings using public data. Experimental results show a high Spearman's rank correlation and low $p$-value between our estimation framework and fine-tuning ground truth. Our proposed transferability framework requires less computational time and resources, making it a resource-saving and time-efficient approach for tuning speech foundation models.
ITR: A grammar-based graph compressor supporting fast neighborhood queries
Authors: Enno Adler, Stefan Böttcher, Rita Hartel
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2306.01028
Pdf link: https://arxiv.org/pdf/2306.01028
Abstract Neighborhood queries are the most common queries on graphs; thus, it is desirable to answer them efficiently on compressed data structures. We present a compression scheme called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair and apply the scheme to RDF graphs. We show that ITR speeds up neighborhood queries to only a few milliseconds and thereby outperforms existing solutions while providing a compression size comparable to existing RDF graph compressors.
Improving the Robustness of Summarization Systems with Dual Augmentation
Authors: Xiuying Chen, Guodong Long, Chongyang Tao, Mingzhe Li, Xin Gao, Chengqi Zhang, Xiangliang Zhang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.01090
Pdf link: https://arxiv.org/pdf/2306.01090
Abstract A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input. In this work, we first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise. To create semantic-consistent substitutes, we propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models. Experimental results show that state-of-the-art summarization models have a significant decrease in performance on adversarial and noisy test sets. Next, we analyze the vulnerability of the summarization systems and explore improving the robustness by data augmentation. Specifically, the first brittleness factor we found is the poor understanding of infrequent words in the input. Correspondingly, we feed the encoder with more diverse cases created by SummAttacker in the input space. The other factor is in the latent space, where the attacked inputs bring more variations to the hidden states. Hence, we construct adversarial decoder input and devise manifold softmixing operation in hidden space to introduce more diversity. Experimental results on Gigaword and CNN/DM datasets demonstrate that our approach achieves significant improvements over strong baselines and exhibits higher robustness on noisy, attacked, and clean datasets.
Large-Batch, Neural Multi-Objective Bayesian Optimization
Authors: Navid Ansari, Hans-Peter Seidel, Vahid Babaei
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2306.01095
Pdf link: https://arxiv.org/pdf/2306.01095
Abstract Bayesian optimization provides a powerful framework for global optimization of black-box, expensive-to-evaluate functions. However, it has a limited capacity in handling data-intensive problems, especially in multi-objective settings, due to the poor scalability of default Gaussian Process surrogates. We present a novel Bayesian optimization framework specifically tailored to address these limitations. Our method leverages a Bayesian neural networks approach for surrogate modeling. This enables efficient handling of large batches of data, modeling complex problems, and generating the uncertainty of the predictions. In addition, our method incorporates a scalable, uncertainty-aware acquisition strategy based on the well-known, easy-to-deploy NSGA-II. This fully parallelizable strategy promotes efficient exploration of uncharted regions. Our framework allows for effective optimization in data-intensive environments with a minimum number of iterations. We demonstrate the superiority of our method by comparing it with state-of-the-art multi-objective optimizations. We perform our evaluation on two real-world problems - airfoil design and color printing - showcasing the applicability and efficiency of our approach. Code is available at: https://github.com/an-on-ym-ous/lbn\_mobo
A Neural RDE-based model for solving path-dependent PDEs
Authors: Bowen Fang, Hao Ni, Yue Wu
Subjects: Machine Learning (cs.LG); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2306.01123
Pdf link: https://arxiv.org/pdf/2306.01123
Abstract The concept of the path-dependent partial differential equation (PPDE) was first introduced in the context of path-dependent derivatives in financial markets. Its semilinear form was later identified as a non-Markovian backward stochastic differential equation (BSDE). Compared to the classical PDE, the solution of a PPDE involves an infinite-dimensional spatial variable, making it challenging to approximate, if not impossible. In this paper, we propose a neural rough differential equation (NRDE)-based model to learn PPDEs, which effectively encodes the path information through the log-signature feature while capturing the fundamental dynamics. The proposed continuous-time model for the PPDE solution offers the benefits of efficient memory usage and the ability to scale with dimensionality. Several numerical experiments, provided to validate the performance of the proposed model in comparison to the strong baseline in the literature, are used to demonstrate its effectiveness.
Numerical Investigation of the Fractional Oscillation Equations under the Context of Variable Order Caputo Fractional Derivative via Fractional Order Bernstein Wavelets
Authors: Ashish Rayal, Bhagawati Prasad Joshi, Mukesh Pandey, Delfim F. M. Torres
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01124
Pdf link: https://arxiv.org/pdf/2306.01124
Abstract This article describes an approximation technique based on fractional order Bernstein wavelets for the numerical simulations of fractional oscillation equations under variable order, and the fractional order Bernstein wavelets are derived by means of fractional Bernstein polynomials. The oscillation equation describes electrical circuits and exhibits a wide range of nonlinear dynamical behaviors. The proposed variable order model is of current interest in a lot of application areas in engineering and applied sciences. The purpose of this study is to analyze the behavior of the fractional force-free and forced oscillation equations under the variable-order fractional operator. The basic idea behind using the approximation technique is that it converts the proposed model into non-linear algebraic equations with the help of collocation nodes for easy computation. Different cases of the proposed model are examined under the selected variable order parameters for the first time in order to show the precision and performance of the mentioned scheme. The dynamic behavior and results are presented via tables and graphs to ensure the validity of the mentioned scheme. Further, the behavior of the obtained solutions for the variable order is also depicted. From the calculated results, it is observed that the mentioned scheme is extremely simple and efficient for examining the behavior of nonlinear random (constant or variable) order fractional models occurring in engineering and science.
The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks
Authors: Can Yaras, Peng Wang, Wei Hu, Zhihui Zhu, Laura Balzano, Qing Qu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01154
Pdf link: https://arxiv.org/pdf/2306.01154
Abstract Over the past few years, an extensively studied phenomenon in training deep networks is the implicit bias of gradient descent towards parsimonious solutions. In this work, we investigate this phenomenon by narrowing our focus to deep linear networks. Through our analysis, we reveal a surprising "law of parsimony" in the learning dynamics when the data possesses low-dimensional structures. Specifically, we show that the evolution of gradient descent starting from orthogonal initialization only affects a minimal portion of singular vector spaces across all weight matrices. In other words, the learning process happens only within a small invariant subspace of each weight matrix, despite the fact that all weight parameters are updated throughout training. This simplicity in learning dynamics could have significant implications for both efficient training and a better understanding of deep networks. First, the analysis enables us to considerably improve training efficiency by taking advantage of the low-dimensional structure in learning dynamics. We can construct smaller, equivalent deep linear networks without sacrificing the benefits associated with the wider counterparts. Second, it allows us to better understand deep representation learning by elucidating the linear progressive separation and concentration of representations from shallow to deep layers. We also conduct numerical experiments to support our theoretical results. The code for our experiments can be found at https://github.com/cjyaras/lawofparsimony.
BAMF-SLAM: Bundle Adjusted Multi-Fisheye Visual-Inertial SLAM Using Recurrent Field Transforms
Authors: Wei Zhang, Sen Wang, Xingliang Dong, Rongwei Guo, Norbert Haala
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01173
Pdf link: https://arxiv.org/pdf/2306.01173
Abstract In this paper, we present BAMF-SLAM, a novel multi-fisheye visual-inertial SLAM system that utilizes Bundle Adjustment (BA) and recurrent field transforms (RFT) to achieve accurate and robust state estimation in challenging scenarios. First, our system directly operates on raw fisheye images, enabling us to fully exploit the wide Field-of-View (FoV) of fisheye cameras. Second, to overcome the low-texture challenge, we explore the tightly-coupled integration of multi-camera inputs and complementary inertial measurements via a unified factor graph and jointly optimize the poses and dense depth maps. Third, for global consistency, the wide FoV of the fisheye camera allows the system to find more potential loop closures, and powered by the broad convergence basin of RFT, our system can perform very wide baseline loop closing with little overlap. Furthermore, we introduce a semi-pose-graph BA method to avoid the expensive full global BA. By combining relative pose factors with loop closure factors, the global states can be adjusted efficiently with modest memory footprint while maintaining high accuracy. Evaluations on TUM-VI, Hilti-Oxford and Newer College datasets show the superior performance of the proposed system over prior works. In the Hilti SLAM Challenge 2022, our VIO version achieves second place. In a subsequent submission, our complete system, including the global BA backend, outperforms the winning approach.
A Yee-like finite-element scheme for Maxwell's equations on unstructured grids
Authors: Herbert Egger, Bogdan Radu
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01182
Pdf link: https://arxiv.org/pdf/2306.01182
Abstract A novel finite element scheme is studied for solving the time-dependent Maxwell's equations on unstructured grids efficiently. Similar to the traditional Yee scheme, the method has one degree of freedom for most edges and a sparse inverse mass matrix. This allows for an efficient realization by explicit time-stepping without solving linear systems. The method is constructed by algebraic reduction of another underlying finite element scheme which involves two degrees of freedom for every edge. Mass-lumping and additional modifications are used in the construction of this method to allow for the mentioned algebraic reduction in the presence of source terms and lossy media later on. A full error analysis of the underlying method is developed which by construction also carries over to the reduced scheme and allows to prove convergence rates for the latter. The efficiency and accuracy of both methods are illustrated by numerical tests. The proposed schemes and their analysis can be extended to structured grids and in special cases the reduced method turns out to be algebraically equivalent to the Yee scheme. The analysis of this paper highlights possible difficulties in extensions of the Yee scheme to non-orthogonal or unstructured grids, discontinuous material parameters, and non-smooth source terms, and also offers potential remedies.
Labeled Interleaving Distance for Reeb Graphs
Authors: Fangfei Lan, Salman Parsa, Bei Wang
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS); Algebraic Topology (math.AT)
Arxiv link: https://arxiv.org/abs/2306.01186
Pdf link: https://arxiv.org/pdf/2306.01186
Abstract Merge trees, contour trees, and Reeb graphs are graph-based topological descriptors that capture topological changes of (sub)level sets of scalar fields. Comparing scalar fields using their topological descriptors has many applications in topological data analysis and visualization of scientific data. Recently, Munch and Stefanou introduced a labeled interleaving distance for comparing two labeled merge trees, which enjoys a number of theoretical and algorithmic properties. In particular, the labeled interleaving distance between merge trees can be computed in polynomial time. In this work, we define the labeled interleaving distance for labeled Reeb graphs. We then prove that the (ordinary) interleaving distance between Reeb graphs equals the minimum of the labeled interleaving distance over all labelings. We also provide an efficient algorithm for computing the labeled interleaving distance between two labeled contour trees (which are special types of Reeb graphs that arise from simply-connected domains). In the case of merge trees, the notion of the labeled interleaving distance was used by Gasparovic et al. to prove that the (ordinary) interleaving distance on the set of (unlabeled) merge trees is intrinsic. As our final contribution, we present counterexamples showing that, on the contrary, the (ordinary) interleaving distance on (unlabeled) Reeb graphs (and contour trees) is not intrinsic. It turns out that, under mild conditions on the labelings, the labeled interleaving distance is a metric on isomorphism classes of Reeb graphs, analogous to the ordinary interleaving distance. This provides new metrics on large classes of Reeb graphs.
Physics-informed UNets for Discovering Hidden Elasticity in Heterogeneous Materials
Authors: Ali Kamali, Kaveh Laksari
Subjects: Machine Learning (cs.LG); Soft Condensed Matter (cond-mat.soft)
Arxiv link: https://arxiv.org/abs/2306.01204
Pdf link: https://arxiv.org/pdf/2306.01204
Abstract Soft biological tissues often have complex mechanical properties due to variation in structural components. In this paper, we develop a novel UNet-based neural network model for inversion in elasticity (El-UNet) to infer the spatial distributions of mechanical parameters from strain maps as input images, normal stress boundary conditions, and domain physics information. We show superior performance, both in terms of accuracy and computational cost, by El-UNet compared to fully-connected physics-informed neural networks in estimating unknown parameters and stress distributions for isotropic linear elasticity. We characterize different variations of El-UNet and propose a self-adaptive spatial loss weighting approach. To validate our inversion models, we performed various finite-element simulations of isotropic domains with heterogenous distributions of material parameters to generate synthetic data. El-UNet is faster and more accurate than the fully-connected physics-informed implementation in resolving the distribution of unknown fields. Among the tested models, the self-adaptive spatially weighted models had the most accurate reconstructions in equal computation times. The learned spatial weighting distribution visibly corresponded to regions that the unweighted models were resolving inaccurately. Our work demonstrates a computationally efficient inversion algorithm for elasticity imaging using convolutional neural networks and presents a potential fast framework for three-dimensional inverse elasticity problems that have proven unachievable through previously proposed methods.
CSMAAFL: Client Scheduling and Model Aggregation in Asynchronous Federated Learning
Authors: Xiang Ma, Qun Wang, Haijian Sun, Rose Qingyang Hu, Yi Qian
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.01207
Pdf link: https://arxiv.org/pdf/2306.01207
Abstract Asynchronous federated learning aims to solve the straggler problem in heterogeneous environments, i.e., clients have small computational capacities that could cause aggregation delay. The principle of asynchronous federated learning is to allow the server to aggregate the model once it receives an update from any client rather than waiting for updates from multiple clients or waiting a specified amount of time in the synchronous mode. Due to the asynchronous setting, the stale model problem could occur, where the slow clients could utilize an outdated local model for their local data training. Consequently, when these locally trained models are uploaded to the server, they may impede the convergence of the global training. Therefore, effective model aggregation strategies play a significant role in updating the global model. Besides, client scheduling is also critical when heterogeneous clients with diversified computing capacities are participating in the federated learning process. This work first investigates the impact of the convergence of asynchronous federated learning mode when adopting the aggregation coefficient in synchronous mode. The effective aggregation solutions that can achieve the same convergence result as in the synchronous mode are then proposed, followed by an improved aggregation method with client scheduling. The simulation results in various scenarios demonstrate that the proposed algorithm converges with a similar level of accuracy as the classical synchronous federated learning algorithm but effectively accelerates the learning process, especially in its early stage.
A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits
Authors: Mohammad Ghavamzadeh, Marek Petrik, Guy Tennenholtz
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.01237
Pdf link: https://arxiv.org/pdf/2306.01237
Abstract Algorithms for offline bandits must optimize decisions in uncertain environments using only offline data. A compelling and increasingly popular objective in offline bandits is to learn a policy which achieves low Bayesian regret with high confidence. An appealing approach to this problem, inspired by recent offline reinforcement learning results, is to maximize a form of lower confidence bound (LCB). This paper proposes a new approach that directly minimizes upper bounds on Bayesian regret using efficient conic optimization solvers. Our bounds build on connections among Bayesian regret, Value-at-Risk (VaR), and chance-constrained optimization. Compared to prior work, our algorithm attains superior theoretical offline regret bounds and better results in numerical simulations. Finally, we provide some evidence that popular LCB-style algorithms may be unsuitable for minimizing Bayesian regret in offline bandits.
Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations
Authors: Minshuo Chen, Yu Bai, H. Vincent Poor, Mengdi Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01243
Pdf link: https://arxiv.org/pdf/2306.01243
Abstract In real-world reinforcement learning (RL) systems, various forms of impaired observability can complicate matters. These situations arise when an agent is unable to observe the most recent state of the system due to latency or lossy channels, yet the agent must still make real-time decisions. This paper introduces a theoretical investigation into efficient RL in control systems where agents must act with delayed and missing state observations. We establish near-optimal regret bounds, of the form $\tilde{\mathcal{O}}(\sqrt{{\rm poly}(H) SAK})$, for RL in both the delayed and missing observation settings. Despite impaired observability posing significant challenges to the policy class and planning, our results demonstrate that learning remains efficient, with the regret bound optimally depending on the state-action size of the original system. Additionally, we provide a characterization of the performance of the optimal policy under impaired observability, comparing it to the optimal value obtained with full observability.
Active Code Learning: Benchmarking Sample-Efficient Training of Code Models
Authors: Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, Yves Le Traon
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.01250
Pdf link: https://arxiv.org/pdf/2306.01250
Abstract The costly human effort required to prepare the training data of machine learning (ML) models hinders their practical development and usage in software engineering (ML4Code), especially for those with limited budgets. Therefore, efficiently training models of code with less human effort has become an emergent problem. Active learning is such a technique to address this issue that allows developers to train a model with reduced data while producing models with desired performance, which has been well studied in computer vision and natural language processing domains. Unfortunately, there is no such work that explores the effectiveness of active learning for code models. In this paper, we bridge this gap by building the first benchmark to study this critical problem - active code learning. Specifically, we collect 11 acquisition functions~(which are used for data selection in active learning) from existing works and adapt them for code-related tasks. Then, we conduct an empirical study to check whether these acquisition functions maintain performance for code data. The results demonstrate that feature selection highly affects active learning and using output vectors to select data is the best choice. For the code summarization task, active code learning is ineffective which produces models with over a 29.64\% gap compared to the expected performance. Furthermore, we explore future directions of active code learning with an exploratory study. We propose to replace distance calculation methods with evaluation metrics and find a correlation between these evaluation-based distance methods and the performance of code models.
A stable imaging functional for anisotropic periodic media in electromagnetic inverse scattering
Authors: Dinh-Liem Nguyen, Trung Truong
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01262
Pdf link: https://arxiv.org/pdf/2306.01262
Abstract The paper is concerned with the inverse scattering problem for Maxwell's equations in three dimensional anisotropic periodic media. We study a new imaging functional for fast and stable reconstruction of the shape of anisotropic periodic scatterers from boundary measurements of the scattered field for a number of incident fields. This imaging functional is simple to implement and very robust against noise in the data. Its implementation is non-iterative, computationally cheap, and does not involve solving any ill-posed problems. The resolution and stability analysis of the imaging functional is investigated. Our numerical study shows that this imaging functional is more stable than that of the factorization method and more efficient than that of the orthogonality sampling method in reconstructing periodic scatterers.
Adaptive Robotic Information Gathering via Non-Stationary Gaussian Processes
Authors: Weizhe Chen, Roni Khardon, Lantao Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01263
Pdf link: https://arxiv.org/pdf/2306.01263
Abstract Robotic Information Gathering (RIG) is a foundational research topic that answers how a robot (team) collects informative data to efficiently build an accurate model of an unknown target function under robot embodiment constraints. RIG has many applications, including but not limited to autonomous exploration and mapping, 3D reconstruction or inspection, search and rescue, and environmental monitoring. A RIG system relies on a probabilistic model's prediction uncertainty to identify critical areas for informative data collection. Gaussian Processes (GPs) with stationary kernels have been widely adopted for spatial modeling. However, real-world spatial data is typically non-stationary -- different locations do not have the same degree of variability. As a result, the prediction uncertainty does not accurately reveal prediction error, limiting the success of RIG algorithms. We propose a family of non-stationary kernels named Attentive Kernel (AK), which is simple, robust, and can extend any existing kernel to a non-stationary one. We evaluate the new kernel in elevation mapping tasks, where AK provides better accuracy and uncertainty quantification over the commonly used stationary kernels and the leading non-stationary kernels. The improved uncertainty quantification guides the downstream informative planner to collect more valuable data around the high-error area, further increasing prediction accuracy. A field experiment demonstrates that the proposed method can guide an Autonomous Surface Vehicle (ASV) to prioritize data collection in locations with significant spatial variations, enabling the model to characterize salient environmental features.
Self Contrastive Learning for Session-based Recommendation
Authors: Zhengxiang Shi, Xi Wang, Aldo Lipani
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01266
Pdf link: https://arxiv.org/pdf/2306.01266
Abstract Session-based recommendation, which aims to predict the next item of users' interest as per an existing sequence interaction of items, has attracted growing applications of Contrastive Learning (CL) with improved user and item representations. However, these contrastive objectives: (1) serve a similar role as the cross-entropy loss while ignoring the item representation space optimisation; and (2) commonly require complicated modelling, including complex positive/negative sample constructions and extra data augmentation. In this work, we introduce Self-Contrastive Learning (SCL), which simplifies the application of CL and enhances the performance of state-of-the-art CL-based recommendation techniques. Specifically, SCL is formulated as an objective function that directly promotes a uniform distribution among item representations and efficiently replaces all the existing contrastive objective components of state-of-the-art models. Unlike previous works, SCL eliminates the need for any positive/negative sample construction or data augmentation, leading to enhanced interpretability of the item representation space and facilitating its extensibility to existing recommender systems. Through experiments on three benchmark datasets, we demonstrate that SCL consistently improves the performance of state-of-the-art models with statistical significance. Notably, our experiments show that SCL improves the performance of two best-performing models by 8.2% and 9.5% in P@10 (Precision) and 9.9% and 11.2% in MRR@10 (Mean Reciprocal Rank) on average across different benchmarks. Additionally, our analysis elucidates the improvement in terms of alignment and uniformity of representations, as well as the effectiveness of SCL with a low computational cost.
Efficient volumetric mapping of multi-scale environments using wavelet-based compression
Authors: Victor Reijgwart, Cesar Cadena, Roland Siegwart, Lionel Ott
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01279
Pdf link: https://arxiv.org/pdf/2306.01279
Abstract Volumetric maps are widely used in robotics due to their desirable properties in applications such as path planning, exploration, and manipulation. Constant advances in mapping technologies are needed to keep up with the improvements in sensor technology, generating increasingly vast amounts of precise measurements. Handling this data in a computationally and memory-efficient manner is paramount to representing the environment at the desired scales and resolutions. In this work, we express the desirable properties of a volumetric mapping framework through the lens of multi-resolution analysis. This shows that wavelets are a natural foundation for hierarchical and multi-resolution volumetric mapping. Based on this insight we design an efficient mapping system that uses wavelet decomposition. The efficiency of the system enables the use of uncertainty-aware sensor models, improving the quality of the maps. Experiments on both synthetic and real-world data provide mapping accuracy and runtime performance comparisons with state-of-the-art methods on both RGB-D and 3D LiDAR data. The framework is open-sourced to allow the robotics community at large to explore this approach.
Nonholonomic Motion Planning as Efficient as Piano Mover's
Authors: David Nister, Jaikrishna Soundararajan, Yizhou Wang, Harshad Sane
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01301
Pdf link: https://arxiv.org/pdf/2306.01301
Abstract We present an algorithm for non-holonomic motion planning (or 'parking a car') that is as computationally efficient as a simple approach to solving the famous Piano-mover's problem, where the non-holonomic constraints are ignored. The core of the approach is a graph-discretization of the problem. The graph-discretization is provably accurate in modeling the non-holonomic constraints, and yet is nearly as small as the straightforward regular grid discretization of the Piano-mover's problem into a 3D volume of 2D position plus angular orientation. Where the Piano mover's graph has one vertex and edges to six neighbors each, we have three vertices with a total of ten edges, increasing the graph size by less than a factor of two, and this factor does not depend on spatial or angular resolution. The local edge connections are organized so that they represent globally consistent turn and straight segments. The graph can be used with Dijkstra's algorithm, A*, value iteration or any other graph algorithm. Furthermore, the graph has a structure that lends itself to processing with deterministic massive parallelism. The turn and straight curves divide the configuration space into many parallel groups. We use this to develop a customized 'kernel-style' graph processing method. It results in an N-turn planner that requires no heuristics or load balancing and is as efficient as a simple solution to the Piano mover's problem even in sequential form. In parallel form it is many times faster than the sequential processing of the graph, and can run many times a second on a consumer grade GPU while exploring a configuration space pose grid with very high spatial and angular resolution. We prove approximation quality and computational complexity and demonstrate that it is a flexible, practical, reliable, and efficient component for a production solution.
Energy-efficient Rate Splitting for MIMO STAR-RIS-assisted Broadcast Channels with I/Q Imbalance
Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.01309
Pdf link: https://arxiv.org/pdf/2306.01309
Abstract This paper proposes an energy-efficient scheme for multicell multiple-input, multiple-output (MIMO) simultaneous transmit and reflect (STAR) reconfigurable intelligent surfaces (RIS)-assisted broadcast channels by employing rate splitting (RS) and improper Gaussian signaling (IGS). Regular RISs can only reflect signals. Thus, a regular RIS can assist only when the transmitter and receiver are in the reflection space of the RIS. However, a STAR-RIS can simultaneously transmit and reflect, thus providing a 360-degrees coverage. In this paper, we assume that transceivers may suffer from I/Q imbalance (IQI). To compensate for IQI, we employ IGS. Moreover, we employ RS to manage intracell interference. We show that RIS can significantly improve the energy efficiency (EE) of the system when RIS components are carefully optimized. Additionally, we show that STAR-RIS can significantly outperform a regular RIS when the regular RIS cannot cover all the users. We also show that RS can highly increase the EE comparing to treating interference as noise.
Resource-Efficient Federated Hyperdimensional Computing
Authors: Nikita Zeulin, Olga Galinina, Nageen Himayat, Sergey Andreev
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.01339
Pdf link: https://arxiv.org/pdf/2306.01339
Abstract In conventional federated hyperdimensional computing (HDC), training larger models usually results in higher predictive performance but also requires more computational, communication, and energy resources. If the system resources are limited, one may have to sacrifice the predictive performance by reducing the size of the HDC model. The proposed resource-efficient federated hyperdimensional computing (RE-FHDC) framework alleviates such constraints by training multiple smaller independent HDC sub-models and refining the concatenated HDC model using the proposed dropout-inspired procedure. Our numerical comparison demonstrates that the proposed framework achieves a comparable or higher predictive performance while consuming less computational and wireless resources than the baseline federated HDC implementation.
The Maximum Matrix Contraction Problem
Authors: Dimitri Watel (ENSIIE, CEDRIC - OC), Pierre-Louis Poirion (CEDRIC - OC)
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2306.01349
Pdf link: https://arxiv.org/pdf/2306.01349
Abstract In this paper, we introduce the Maximum Matrix Contraction problem, where we aim to contract as much as possible a binary matrix in order to maximize its density. We study the complexity and the polynomial approximability of the problem. Especially, we prove this problem to be NP-Complete and that every algorithm solving this problem is at most a $2\sqrt{n}$-approximation algorithm where n is the number of ones in the matrix. We then focus on efficient algorithms to solve the problem: an integer linear program and three heuristics.
Energy-Efficient UAV-Assisted IoT Data Collection via TSP-Based Solution Space Reduction
Authors: Sivaram Krishnan, Mahyar Nemati, Seng W. Loke, Jihong Park, Jinho Choi
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2306.01355
Pdf link: https://arxiv.org/pdf/2306.01355
Abstract This paper presents a wireless data collection framework that employs an unmanned aerial vehicle (UAV) to efficiently gather data from distributed IoT sensors deployed in a large area. Our approach takes into account the non-zero communication ranges of the sensors to optimize the flight path of the UAV, resulting in a variation of the Traveling Salesman Problem (TSP). We prove mathematically that the optimal waypoints for this TSP-variant problem are restricted to the boundaries of the sensor communication ranges, greatly reducing the solution space. Building on this finding, we develop a low-complexity UAV-assisted sensor data collection algorithm, and demonstrate its effectiveness in a selected use case where we minimize the total energy consumption of the UAV and sensors by jointly optimizing the UAV's travel distance and the sensors' communication ranges.
DWT-CompCNN: Deep Image Classification Network for High Throughput JPEG 2000 Compressed Documents
Authors: Tejasvee Bisen, Mohammed Javed, Shashank Kirtania, P. Nagabhushan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2306.01359
Pdf link: https://arxiv.org/pdf/2306.01359
Abstract For any digital application with document images such as retrieval, the classification of document images becomes an essential stage. Conventionally for the purpose, the full versions of the documents, that is the uncompressed document images make the input dataset, which poses a threat due to the big volume required to accommodate the full versions of the documents. Therefore, it would be novel, if the same classification task could be accomplished directly (with some partial decompression) with the compressed representation of documents in order to make the whole process computationally more efficient. In this research work, a novel deep learning model, DWT CompCNN is proposed for classification of documents that are compressed using High Throughput JPEG 2000 (HTJ2K) algorithm. The proposed DWT-CompCNN comprises of five convolutional layers with filter sizes of 16, 32, 64, 128, and 256 consecutively for each increasing layer to improve learning from the wavelet coefficients extracted from the compressed images. Experiments are performed on two benchmark datasets- Tobacco-3482 and RVL-CDIP, which demonstrate that the proposed model is time and space efficient, and also achieves a better classification accuracy in compressed domain.
Granular Gym: High Performance Simulation for Robotic Tasks with Granular Materials
Authors: David Millard, Daniel Pastor, Joseph Bowkett, Paul Backes, Gaurav S. Sukhatme
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01369
Pdf link: https://arxiv.org/pdf/2306.01369
Abstract Granular materials are of critical interest to many robotic tasks in planetary science, construction, and manufacturing. However, the dynamics of granular materials are complex and often computationally very expensive to simulate. We propose a set of methodologies and a system for the fast simulation of granular materials on Graphics Processing Units (GPUs), and show that this simulation is fast enough for basic training with Reinforcement Learning algorithms, which currently require many dynamics samples to achieve acceptable performance. Our method models granular material dynamics using implicit timestepping methods for multibody rigid contacts, as well as algorithmic techniques for efficient parallel collision detection between pairs of particles and between particle and arbitrarily shaped rigid bodies, and programming techniques for minimizing warp divergence on Single-Instruction, Multiple-Thread (SIMT) chip architectures. We showcase our simulation system on several environments targeted toward robotic tasks, and release our simulator as an open-source tool.
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Authors: Borui Wan, Juntao Zhao, Chuan Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.01381
Pdf link: https://arxiv.org/pdf/2306.01381
Abstract Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as messages) across devices bring significant communication overhead for nodes with remote neighbors on other devices (marginal nodes) and unnecessary waiting time for nodes without remote neighbors (central nodes) in the training graph. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph GNN training. We stochastically quantize messages transferred across devices to lower-precision integers for communication traffic reduction and advocate communication-computation parallelization between marginal nodes and central nodes. We provide theoretical analysis to prove fast training convergence (at the rate of O(T^{-1}) with T being the total number of training epochs) and design an adaptive quantization bit-width assignment scheme for each message based on the analysis, targeting a good trade-off between training convergence and efficiency. Extensive experiments on mainstream graph datasets show that AdaQP substantially improves distributed full-graph training's throughput (up to 3.01 X) with negligible accuracy drop (at most 0.30%) or even accuracy improvement (up to 0.19%) in most cases, showing significant advantages over the state-of-the-art works.
Chemical Property-Guided Neural Networks for Naphtha Composition Prediction
Authors: Chonghyo Joo, Jeongdong Kim, Hyungtae Cho, Jaewon Lee, Sungho Suh, Junghwan Kim
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2306.01391
Pdf link: https://arxiv.org/pdf/2306.01391
Abstract The naphtha cracking process heavily relies on the composition of naphtha, which is a complex blend of different hydrocarbons. Predicting the naphtha composition accurately is crucial for efficiently controlling the cracking process and achieving maximum performance. Traditional methods, such as gas chromatography and true boiling curve, are not feasible due to the need for pilot-plant-scale experiments or cost constraints. In this paper, we propose a neural network framework that utilizes chemical property information to improve the performance of naphtha composition prediction. Our proposed framework comprises two parts: a Watson K factor estimation network and a naphtha composition prediction network. Both networks share a feature extraction network based on Convolutional Neural Network (CNN) architecture, while the output layers use Multi-Layer Perceptron (MLP) based networks to generate two different outputs - Watson K factor and naphtha composition. The naphtha composition is expressed in percentages, and its sum should be 100%. To enhance the naphtha composition prediction, we utilize a distillation simulator to obtain the distillation curve from the naphtha composition, which is dependent on its chemical properties. By designing a loss function between the estimated and simulated Watson K factors, we improve the performance of both Watson K estimation and naphtha composition prediction. The experimental results show that our proposed framework can predict the naphtha composition accurately while reflecting real naphtha chemical properties.
Matrix Inference in Growing Rank Regimes
Authors: Farzad Pourkamali, Jean Barbier, Nicolas Macris
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2306.01412
Pdf link: https://arxiv.org/pdf/2306.01412
Abstract The inference of a large symmetric signal-matrix $\mathbf{S} \in \mathbb{R}^{N\times N}$ corrupted by additive Gaussian noise, is considered for two regimes of growth of the rank $M$ as a function of $N$. For sub-linear ranks $M=\Theta(N^\alpha)$ with $\alpha\in(0,1)$ the mutual information and minimum mean-square error (MMSE) are derived for two classes of signal-matrices: (a) $\mathbf{S}=\mathbf{X}\mathbf{X}^\intercal$ with entries of $\mathbf{X}\in\mathbb{R}^{N\times M}$ independent identically distributed; (b) $\mathbf{S}$ sampled from a rotationally invariant distribution. Surprisingly, the formulas match the rank-one case. Two efficient algorithms are explored and conjectured to saturate the MMSE when no statistical-to-computational gap is present: (1) Decimation Approximate Message Passing; (2) a spectral algorithm based on a Rotation Invariant Estimator. For linear ranks $M=\Theta(N)$ the mutual information is rigorously derived for signal-matrices from a rotationally invariant distribution. Close connections with scalar inference in free probability are uncovered, which allow to deduce a simple formula for the MMSE as an integral involving the limiting spectral measure of the data matrix only. An interesting issue is whether the known information theoretic phase transitions for rank-one, and hence also sub-linear-rank, still persist in linear-rank. Our analysis suggests that only a smoothed-out trace of the transitions persists. Furthermore, the change of behavior between low and truly high-rank regimes only happens at the linear scale $\alpha=1$.
Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation
Authors: Federico Nocentini, Claudio Ferrari, Stefano Berretti
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01415
Pdf link: https://arxiv.org/pdf/2306.01415
Abstract This paper presents a novel approach for generating 3D talking heads from raw audio inputs. Our method grounds on the idea that speech related movements can be comprehensively and efficiently described by the motion of a few control points located on the movable parts of the face, i.e., landmarks. The underlying musculoskeletal structure then allows us to learn how their motion influences the geometrical deformations of the whole face. The proposed method employs two distinct models to this aim: the first one learns to generate the motion of a sparse set of landmarks from the given audio. The second model expands such landmarks motion to a dense motion field, which is utilized to animate a given 3D mesh in neutral state. Additionally, we introduce a novel loss function, named Cosine Loss, which minimizes the angle between the generated motion vectors and the ground truth ones. Using landmarks in 3D talking head generation offers various advantages such as consistency, reliability, and obviating the need for manual-annotation. Our approach is designed to be identity-agnostic, enabling high-quality facial animations for any users without additional data or training.
Multi-Objective Population Based Training
Authors: Arkadiy Dushatskiy, Alexander Chebykin, Tanja Alderliesten, Peter A.N. Bosman
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2306.01436
Pdf link: https://arxiv.org/pdf/2306.01436
Abstract Population Based Training (PBT) is an efficient hyperparameter optimization algorithm. PBT is a single-objective algorithm, but many real-world hyperparameter optimization problems involve two or more conflicting objectives. In this work, we therefore introduce a multi-objective version of PBT, MO-PBT. Our experiments on diverse multi-objective hyperparameter optimization problems (Precision/Recall, Accuracy/Fairness, Accuracy/Adversarial Robustness) show that MO-PBT outperforms random search, single-objective PBT, and the state-of-the-art multi-objective hyperparameter optimization algorithm MO-ASHA.
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
Authors: Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, Yanyong Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01438
Pdf link: https://arxiv.org/pdf/2306.01438
Abstract LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints. Though seemingly natural, how to efficiently combine them for improved feature representation is still unclear. The main challenge arises from that Radar data are extremely sparse and lack height information. Therefore, directly integrating Radar features into LiDAR-centric detection networks is not optimal. In this work, we introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects. Technically, Bi-LRFusion involves two steps: first, it enriches Radar's local features by learning important details from the LiDAR branch to alleviate the problems caused by the absence of height information and extreme sparsity; second, it combines LiDAR features with the enhanced Radar features in a unified bird's-eye-view representation. We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects. Notably, Radar data in these two datasets have different formats, which demonstrates the generalizability of our method. Codes are available at https://github.com/JessieW0806/BiLRFusion.
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Authors: Fabian Kögel, Bac Nguyen, Fabien Cardinaux
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.01442
Pdf link: https://arxiv.org/pdf/2306.01442
Abstract State-of-the-art non-autoregressive text-to-speech (TTS) models based on FastSpeech 2 can efficiently synthesise high-fidelity and natural speech. For expressive speech datasets however, we observe characteristic audio distortions. We demonstrate that such artefacts are introduced to the vocoder reconstruction by over-smooth mel-spectrogram predictions, which are induced by the choice of mean-squared-error (MSE) loss for training the mel-spectrogram decoder. With MSE loss FastSpeech 2 is limited to learn conditional averages of the training distribution, which might not lie close to a natural sample if the distribution still appears multimodal after all conditioning signals. To alleviate this problem, we introduce TVC-GMM, a mixture model of Trivariate-Chain Gaussian distributions, to model the residual multimodality. TVC-GMM reduces spectrogram smoothness and improves perceptual audio quality in particular for expressive datasets as shown by both objective and subjective evaluation.
dugMatting: Decomposed-Uncertainty-Guided Matting
Authors: Jiawei Wu, Changqing Zhang, Zuoyong Li, Huazhu Fu, Xi Peng, Joey Tianyi Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01452
Pdf link: https://arxiv.org/pdf/2306.01452
Abstract Cutting out an object and estimating its opacity mask, known as image matting, is a key task in image and video editing. Due to the highly ill-posed issue, additional inputs, typically user-defined trimaps or scribbles, are usually needed to reduce the uncertainty. Although effective, it is either time consuming or only suitable for experienced users who know where to place the strokes. In this work, we propose a decomposed-uncertainty-guided matting (dugMatting) algorithm, which explores the explicitly decomposed uncertainties to efficiently and effectively improve the results. Basing on the characteristic of these uncertainties, the epistemic uncertainty is reduced in the process of guiding interaction (which introduces prior knowledge), while the aleatoric uncertainty is reduced in modeling data distribution (which introduces statistics for both data and possible noise). The proposed matting framework relieves the requirement for users to determine the interaction areas by using simple and efficient labeling. Extensively quantitative and qualitative results validate that the proposed method significantly improves the original matting algorithms in terms of both efficiency and efficacy.
One for All: Unified Workload Prediction for Dynamic Multi-tenant Edge Cloud Platforms
Authors: Shaoyuan Huang, Zheng Wang, Heng Zhang, Xiaofei Wang, Cheng Zhang, Wenyu Wang
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01507
Pdf link: https://arxiv.org/pdf/2306.01507
Abstract Workload prediction in multi-tenant edge cloud platforms (MT-ECP) is vital for efficient application deployment and resource provisioning. However, the heterogeneous application patterns, variable infrastructure performance, and frequent deployments in MT-ECP pose significant challenges for accurate and efficient workload prediction. Clustering-based methods for dynamic MT-ECP modeling often incur excessive costs due to the need to maintain numerous data clusters and models, which leads to excessive costs. Existing end-to-end time series prediction methods are challenging to provide consistent prediction performance in dynamic MT-ECP. In this paper, we propose an end-to-end framework with global pooling and static content awareness, DynEformer, to provide a unified workload prediction scheme for dynamic MT-ECP. Meticulously designed global pooling and information merging mechanisms can effectively identify and utilize global application patterns to drive local workload predictions. The integration of static content-aware mechanisms enhances model robustness in real-world scenarios. Through experiments on five real-world datasets, DynEformer achieved state-of-the-art in the dynamic scene of MT-ECP and provided a unified end-to-end prediction scheme for MT-ECP.
Does it pay to optimize AUC?
Authors: Baojian Zhou, Steven Skiena
Subjects: Computational Geometry (cs.CG); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01528
Pdf link: https://arxiv.org/pdf/2306.01528
Abstract The Area Under the ROC Curve (AUC) is an important model metric for evaluating binary classifiers, and many algorithms have been proposed to optimize AUC approximately. It raises the question of whether the generally insignificant gains observed by previous studies are due to inherent limitations of the metric or the inadequate quality of optimization. To better understand the value of optimizing for AUC, we present an efficient algorithm, namely AUC-opt, to find the provably optimal AUC linear classifier in $\mathbb{R}^2$, which runs in $\mathcal{O}(n+ n- \log (n+ n-))$ where $n+$ and $n-$ are the number of positive and negative samples respectively. Furthermore, it can be naturally extended to $\mathbb{R}^d$ in $\mathcal{O}((n+n-)^{d-1}\log (n+n-))$ by calling AUC-opt in lower-dimensional spaces recursively. We prove the problem is NP-complete when $d$ is not fixed, reducing from the \textit{open hemisphere problem}. Experiments show that compared with other methods, AUC-opt achieves statistically significant improvements on between 17 to 40 in $\mathbb{R}^2$ and between 4 to 42 in $\mathbb{R}^3$ of 50 t-SNE training datasets. However, generally the gain proves insignificant on most testing datasets compared to the best standard classifiers. Similar observations are found for nonlinear AUC methods under real-world datasets.
CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities
Authors: Ayush Agrawal, Raghav Arora, Ahana Datta, Snehasis Banerjee, Brojeshwar Bhowmick, Krishna Murthy Jatavallabhula, Mohan Sridharan, Madhava Krishna
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01540
Pdf link: https://arxiv.org/pdf/2306.01540
Abstract This paper introduces a novel method for determining the best room to place an object in, for embodied scene rearrangement. While state-of-the-art approaches rely on large language models (LLMs) or reinforcement learned (RL) policies for this task, our approach, CLIPGraphs, efficiently combines commonsense domain knowledge, data-driven methods, and recent advances in multimodal learning. Specifically, it (a)encodes a knowledge graph of prior human preferences about the room location of different objects in home environments, (b) incorporates vision-language features to support multimodal queries based on images or text, and (c) uses a graph network to learn object-room affinities based on embeddings of the prior knowledge and the vision-language features. We demonstrate that our approach provides better estimates of the most appropriate location of objects from a benchmark set of object categories in comparison with state-of-the-art baselines
Strong tractability for multivariate integration in a subspace of the Wiener algebra
Authors: Takashi Goda
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01541
Pdf link: https://arxiv.org/pdf/2306.01541
Abstract Building upon recent work by the author, we prove that multivariate integration in the following subspace of the Wiener algebra over $[0,1)^d$ is strongly polynomially tractable: [ Fd:=\left{ f\in C([0,1)^d)\:\middle| \: |f|:=\sum{\boldsymbol{k}\in \mathbb{Z}^{d}}|\hat{f}(\boldsymbol{k})|\max\left(\mathrm{width}(\mathrm{supp}(\boldsymbol{k})),\min_{j\in \mathrm{supp}(\boldsymbol{k})}\log |k_j|\right)<\infty \right},] with $\hat{f}(\boldsymbol{k})$ being the $\boldsymbol{k}$-th Fourier coefficient of $f$, $\mathrm{supp}(\boldsymbol{k}):={j\in {1,\ldots,d}\mid kj\neq 0}$, and $\mathrm{width}: 2^{{1,\ldots,d}}\to {1,\ldots,d}$ being defined by [ \mathrm{width}(u):=\max{j\in u}j-\min_{j\in u}j+1,] for non-empty subset $u\subseteq {1,\ldots,d}$ and $\mathrm{width}(\emptyset):=1$. Strong polynomial tractability is achieved by an explicit quasi-Monte Carlo rule using a multiset union of Korobov's $p$-sets. We also show that, if we replace $\mathrm{width}(\mathrm{supp}(\boldsymbol{k}))$ with 1 for all $\boldsymbol{k}\in \mathbb{Z}^d$ in the above definition of norm, multivariate integration is polynomially tractable but not strongly polynomially tractable.
Blockchain Model for Environment/Infrastructure Monitoring in Cloud-Enabled High-Altitude Platform Systems
Authors: Khaleel Mershad, Hayssam Dahrouj
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2306.01616
Pdf link: https://arxiv.org/pdf/2306.01616
Abstract The recently accentuated features of augmenting conventional wireless networks with high altitude platform systems (HAPS) have fueled a plethora of applications, which promise to offer new services to ground users, as well to enhance the efficiency and pervasion of existing applications. Cloud-enabled HAPS, which aims to create HAPS-based datacenters that offer cloud services to users, has particularly emerged as a promising key enabler to provide large-scale equitable services from the sky. Although offering cloud services from the HAPS proves to be efficient, its practical deployment at the stratosphere level still faces many challenges such as high energy requirements, physical maintenance, and is particularly prone to security considerations. Safeguarding the cloud-enabled HAPS against various cyberattacks is a necessity to guarantee its safe operation. This paper proposes a blockchain model to secure cloud-enabled HAPS networks that contain a large number of HAPS stations from recurring cyberattacks within the context of the environment and infrastructure monitoring (EIM) application. To this end, the paper first presents a detailed blockchain framework, and describes the ways of integrating the developed framework into the various system components. We then discuss the details of the system implementation, including the storing and consuming of cloud transactions, the generation of new blocks, and the blockchain consensus protocol that is tailored to the EIM requirements. Finally, we present numerical simulations that illustrate the performance of the system in terms of throughput, latency, and resilience to attacks.
HomE: Homography-Equivariant Video Representation Learning
Authors: Anirudh Sriram, Adrien Gaidon, Jiajun Wu, Juan Carlos Niebles, Li Fei-Fei, Ehsan Adeli
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01623
Pdf link: https://arxiv.org/pdf/2306.01623
Abstract Recent advances in self-supervised representation learning have enabled more efficient and robust model performance without relying on extensive labeled data. However, most works are still focused on images, with few working on videos and even fewer on multi-view videos, where more powerful inductive biases can be leveraged for self-supervision. In this work, we propose a novel method for representation learning of multi-view videos, where we explicitly model the representation space to maintain Homography Equivariance (HomE). Our method learns an implicit mapping between different views, culminating in a representation space that maintains the homography relationship between neighboring views. We evaluate our HomE representation via action recognition and pedestrian intent prediction as downstream tasks. On action classification, our method obtains 96.4% 3-fold accuracy on the UCF101 dataset, better than most state-of-the-art self-supervised learning methods. Similarly, on the STIP dataset, we outperform the state-of-the-art by 6% for pedestrian intent prediction one second into the future while also obtaining an accuracy of 91.2% for pedestrian action (cross vs. not-cross) classification. Code is available at https://github.com/anirudhs123/HomE.
Quantifying synergy and redundancy in multiplex networks
Authors: Andrea I. Luppi, Eckehard Olbrich, Conor Finn, Laura E. Suárez, Fernando E. Rosas, Pedro A.M. Mediano, Jürgen Jost
Subjects: Social and Information Networks (cs.SI); Information Theory (cs.IT); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2306.01645
Pdf link: https://arxiv.org/pdf/2306.01645
Abstract Understanding how different networks relate to each other is key for obtaining a greater insight into complex systems. Here, we introduce an intuitive yet powerful framework to characterise the relationship between two networks, comprising the same nodes. We showcase our framework by decomposing the shortest paths between nodes as being contributed uniquely by one or the other source network, or redundantly by either, or synergistically by the two together. Our approach takes into account the networks' full topology, but it also provides insights at multiple levels of resolution: from global statistics, to individual paths of different length. We show that this approach is widely applicable, from brains to the London transport system. In humans and across $123$ other species, we demonstrate that reliance on unique contributions by long-range white matter fibers is a conserved feature of mammalian structural connectomes. Across species, we also find that efficient communication relies on significantly greater synergy between long-range and short-range fibers than expected by chance, and significantly less redundancy. Our framework may find applications to help decide how to trade-off different desiderata when designing network systems, or to evaluate their relative presence in existing systems, whether biological or artificial.
Federated Multi-Sequence Stochastic Approximation with Local Hypergradient Estimation
Authors: Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, Samet Oymak
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.01648
Pdf link: https://arxiv.org/pdf/2306.01648
Abstract Stochastic approximation with multiple coupled sequences (MSA) has found broad applications in machine learning as it encompasses a rich class of problems including bilevel optimization (BLO), multi-level compositional optimization (MCO), and reinforcement learning (specifically, actor-critic methods). However, designing provably-efficient federated algorithms for MSA has been an elusive question even for the special case of double sequence approximation (DSA). Towards this goal, we develop FedMSA which is the first federated algorithm for MSA, and establish its near-optimal communication complexity. As core novelties, (i) FedMSA enables the provable estimation of hypergradients in BLO and MCO via local client updates, which has been a notable bottleneck in prior theory, and (ii) our convergence guarantees are sensitive to the heterogeneity-level of the problem. We also incorporate momentum and variance reduction techniques to achieve further acceleration leading to near-optimal rates. Finally, we provide experiments that support our theory and demonstrate the empirical benefits of FedMSA. As an example, FedMSA enables order-of-magnitude savings in communication rounds compared to prior federated BLO schemes.
Fair multilingual vandalism detection system for Wikipedia
Authors: Mykola Trokhymovych, Muniza Aslam, Ai-Jou Chou, Ricardo Baeza-Yates, Diego Saez-Trumper
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01650
Pdf link: https://arxiv.org/pdf/2306.01650
Abstract This paper presents a novel design of the system aimed at supporting the Wikipedia community in addressing vandalism on the platform. To achieve this, we collected a massive dataset of 47 languages, and applied advanced filtering and feature engineering techniques, including multilingual masked language modeling to build the training dataset from human-generated data. The performance of the system was evaluated through comparison with the one used in production in Wikipedia, known as ORES. Our research results in a significant increase in the number of languages covered, making Wikipedia patrolling more efficient to a wider range of communities. Furthermore, our model outperforms ORES, ensuring that the results provided are not only more accurate but also less biased against certain groups of contributors.
On the Coverage of Cognitive mmWave Networks with Directional Sensing and Communication
Authors: Shuchi Tripathi, Abhishek K. Gupta, SaiDhiraj Amuru
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.01652
Pdf link: https://arxiv.org/pdf/2306.01652
Abstract Millimeter-waves' propagation characteristics create prospects for spatial and temporal spectrum sharing in a variety of contexts, including cognitive spectrum sharing (CSS). However, CSS along with omnidirectional sensing, is not efficient at mmWave frequencies due to their directional nature of transmission, as this limits secondary networks' ability to access the spectrum. This inspired us to create an analytical approach using stochastic geometry to examine the implications of directional cognitive sensing in mmWave networks. We explore a scenario where multiple secondary transmitter-receiver pairs coexist with a primary transmitter-receiver pair, forming a cognitive network. The positions of the secondary transmitters are modelled using a homogeneous Poisson point process (PPP) with corresponding secondary receivers located around them. A threshold on directional transmission is imposed on each secondary transmitter in order to limit its interference at the primary receiver. We derive the medium-access-probability of a secondary user along with the fraction of the secondary transmitters active at a time-instant. To understand cognition's feasibility, we derive the coverage probabilities of primary and secondary links. We provide various design insights via numerical results. For example, we investigate the interference-threshold's optimal value while ensuring coverage for both links and its dependence on various parameters. We find that directionality improves both links' performance as a key factor. Further, allowing location-aware secondary directionality can help achieve similar coverage for all secondary links.
Towards In-context Scene Understanding
Authors: Ivana Balažević, David Steiner, Nikhil Parthasarathy, Relja Arandjelović, Olivier J. Hénaff
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01667
Pdf link: https://arxiv.org/pdf/2306.01667
Abstract In-context learning$\unicode{x2013}$the ability to configure a model's behavior with different prompts$\unicode{x2013}$has revolutionized the field of natural language processing, alleviating the need for task-specific models and paving the way for generalist models capable of assisting with any query. Computer vision, in contrast, has largely stayed in the former regime: specialized decoders and finetuning protocols are generally required to perform dense tasks such as semantic segmentation and depth estimation. In this work we explore a simple mechanism for in-context learning of such scene understanding tasks: nearest neighbor retrieval from a prompt of annotated features. We propose a new pretraining protocol$\unicode{x2013}$leveraging attention within and across images$\unicode{x2013}$which yields representations particularly useful in this regime. The resulting Hummingbird model, suitably prompted, performs various scene understanding tasks without modification while approaching the performance of specialists that have been finetuned for each task. Moreover, Hummingbird can be configured to perform new tasks much more efficiently than finetuned models, raising the possibility of scene understanding in the interactive assistant regime.
Domain Decomposition Methods for the Monge-Ampère equation
Authors: Yassine Boubendir, Jake Brusca, Brittany Froese Hamfeldt, Tadanaga Takahashi
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01677
Pdf link: https://arxiv.org/pdf/2306.01677
Abstract We introduce a new overlapping Domain Decomposition Method (DDM) to solve the fully nonlinear Monge-Amp`ere equation. While DDMs have been extensively studied for linear problems, their application to fully nonlinear partial differential equations (PDE) remains limited in the literature. To address this gap, we establish a proof of global convergence of these new iterative algorithms using a discrete comparison principle argument. Several numerical tests are performed to validate the convergence theorem. These numerical experiments involve examples of varying regularity. Computational experiments show that method is efficient, robust, and requires relatively few iterations to converge. The results reveal great potential for DDM methods to lead to highly efficient and parallelizable solvers for large-scale problems that are computationally intractable using existing solution methods.
Balancing Exploration and Exploitation: Disentangled $β$-CVAE in De Novo Drug Design
Authors: Guang Jun Nicholas Ang, De Tao Irwin Chin, Bingquan Shen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
Arxiv link: https://arxiv.org/abs/2306.01683
Pdf link: https://arxiv.org/pdf/2306.01683
Abstract Deep generative models have recently emerged as a promising de novo drug design method. In this respect, deep generative conditional variational autoencoder (CVAE) models are a powerful approach for generating novel molecules with desired drug-like properties. However, molecular graph-based models with disentanglement and multivariate explicit latent conditioning have not been fully elucidated. To address this, we proposed a molecular-graph $\beta$-CVAE model for de novo drug design. Here, we empirically tuned the value of disentanglement and assessed its ability to generate molecules with optimised univariate- or-multivariate properties. In particular, we optimised the octanol-water partition coefficient (ClogP), molar refractivity (CMR), quantitative estimate of drug-likeness (QED), and synthetic accessibility score (SAS). Results suggest that a lower $\beta$ value increases the uniqueness of generated molecules (exploration). Univariate optimisation results showed our model generated molecular property averages of ClogP = 41.07% $\pm$ 0.01% and CMR 66.76% $\pm$ 0.01% by the Ghose filter. Multivariate property optimisation results showed that our model generated an average of 30.07% $\pm$ 0.01% molecules for both desired properties. Furthermore, our model improved the QED and SAS (exploitation) of molecules generated. Together, these results suggest that the $\beta$-CVAE could balance exploration and exploitation through disentanglement and is a promising model for de novo drug design, thus providing a basis for future studies.
Keyword: faster

Physics-informed UNets for Discovering Hidden Elasticity in Heterogeneous Materials
Authors: Ali Kamali, Kaveh Laksari
Subjects: Machine Learning (cs.LG); Soft Condensed Matter (cond-mat.soft)
Arxiv link: https://arxiv.org/abs/2306.01204
Pdf link: https://arxiv.org/pdf/2306.01204
Abstract Soft biological tissues often have complex mechanical properties due to variation in structural components. In this paper, we develop a novel UNet-based neural network model for inversion in elasticity (El-UNet) to infer the spatial distributions of mechanical parameters from strain maps as input images, normal stress boundary conditions, and domain physics information. We show superior performance, both in terms of accuracy and computational cost, by El-UNet compared to fully-connected physics-informed neural networks in estimating unknown parameters and stress distributions for isotropic linear elasticity. We characterize different variations of El-UNet and propose a self-adaptive spatial loss weighting approach. To validate our inversion models, we performed various finite-element simulations of isotropic domains with heterogenous distributions of material parameters to generate synthetic data. El-UNet is faster and more accurate than the fully-connected physics-informed implementation in resolving the distribution of unknown fields. Among the tested models, the self-adaptive spatially weighted models had the most accurate reconstructions in equal computation times. The learned spatial weighting distribution visibly corresponded to regions that the unweighted models were resolving inaccurately. Our work demonstrates a computationally efficient inversion algorithm for elasticity imaging using convolutional neural networks and presents a potential fast framework for three-dimensional inverse elasticity problems that have proven unachievable through previously proposed methods.
Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
Authors: Yu Yang, Hao Kang, Baharan Mirzasoleiman
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01244
Pdf link: https://arxiv.org/pdf/2306.01244
Abstract To improve the efficiency and sustainability of learning deep models, we propose CREST, the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for training non-convex models, particularly deep networks. To guarantee convergence to a stationary point of a non-convex function, CREST models the non-convex loss as a series of quadratic functions and extracts a coreset for each quadratic sub-region. In addition, to ensure faster convergence of stochastic gradient methods such as (mini-batch) SGD, CREST iteratively extracts multiple mini-batch coresets from larger random subsets of training data, to ensure nearly-unbiased gradients with small variances. Finally, to further improve scalability and efficiency, CREST identifies and excludes the examples that are learned from the coreset selection pipeline. Our extensive experiments on several deep networks trained on vision and NLP datasets, including CIFAR-10, CIFAR-100, TinyImageNet, and SNLI, confirm that CREST speeds up training deep networks on very large datasets, by 1.7x to 2.5x with minimum loss in the performance. By analyzing the learning difficulty of the subsets selected by CREST, we show that deep models benefit the most by learning from subsets of increasing difficulty levels.
Nonholonomic Motion Planning as Efficient as Piano Mover's
Authors: David Nister, Jaikrishna Soundararajan, Yizhou Wang, Harshad Sane
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01301
Pdf link: https://arxiv.org/pdf/2306.01301
Abstract We present an algorithm for non-holonomic motion planning (or 'parking a car') that is as computationally efficient as a simple approach to solving the famous Piano-mover's problem, where the non-holonomic constraints are ignored. The core of the approach is a graph-discretization of the problem. The graph-discretization is provably accurate in modeling the non-holonomic constraints, and yet is nearly as small as the straightforward regular grid discretization of the Piano-mover's problem into a 3D volume of 2D position plus angular orientation. Where the Piano mover's graph has one vertex and edges to six neighbors each, we have three vertices with a total of ten edges, increasing the graph size by less than a factor of two, and this factor does not depend on spatial or angular resolution. The local edge connections are organized so that they represent globally consistent turn and straight segments. The graph can be used with Dijkstra's algorithm, A*, value iteration or any other graph algorithm. Furthermore, the graph has a structure that lends itself to processing with deterministic massive parallelism. The turn and straight curves divide the configuration space into many parallel groups. We use this to develop a customized 'kernel-style' graph processing method. It results in an N-turn planner that requires no heuristics or load balancing and is as efficient as a simple solution to the Piano mover's problem even in sequential form. In parallel form it is many times faster than the sequential processing of the graph, and can run many times a second on a consumer grade GPU while exploring a configuration space pose grid with very high spatial and angular resolution. We prove approximation quality and computational complexity and demonstrate that it is a flexible, practical, reliable, and efficient component for a production solution.
Hyperparameters in Reinforcement Learning and How To Tune Them
Authors: Theresa Eimer, Marius Lindauer, Roberta Raileanu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01324
Pdf link: https://arxiv.org/pdf/2306.01324
Abstract In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting. However, the process of hyperparameter optimization still varies widely across papers, which makes it challenging to compare RL algorithms fairly. In this paper, we show that hyperparameter choices in RL can significantly affect the agent's final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which may lead to overfitting. We therefore propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds, as well as principled hyperparameter optimization (HPO) across a broad search space. We support this by comparing multiple state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts, demonstrating that HPO approaches often have higher performance and lower compute overhead. As a result of our findings, we recommend a set of best practices for the RL community, which should result in stronger empirical results with fewer computational costs, better reproducibility, and thus faster progress. In order to encourage the adoption of these practices, we provide plug-and-play implementations of the tuning algorithms used in this paper at https://github.com/facebookresearch/how-to-autorl.
Resolving Interference When Merging Models
Authors: Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01708
Pdf link: https://arxiv.org/pdf/2306.01708
Abstract Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model merging techniques have emerged as a solution to combine multiple task-specific models into a single multitask model without performing additional training. However, existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter's values across models. To address this, we propose our method, TrIm, Elect Sign & Merge (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. We find that TIES-Merging outperforms several existing methods in diverse settings covering a range of modalities, domains, number of tasks, model sizes, architectures, and fine-tuning settings. We further analyze the impact of different types of interference on model parameters, highlight the importance of resolving sign interference. Our code is available at https://github.com/prateeky2806/ties-merging
Distilling Efficient Language-Specific Models for Cross-Lingual Transfer
Authors: Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, Ivan Vulić
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.01709
Pdf link: https://arxiv.org/pdf/2306.01709
Abstract Massively multilingual Transformers (MMTs), such as mBERT and XLM-R, are widely used for cross-lingual transfer learning. While these are pretrained to represent hundreds of languages, end users of NLP systems are often interested only in individual languages. For such purposes, the MMTs' language coverage makes them unnecessarily expensive to deploy in terms of model size, inference time, energy, and hardware cost. We thus propose to extract compressed, language-specific models from MMTs which retain the capacity of the original MMTs for cross-lingual transfer. This is achieved by distilling the MMT bilingually, i.e., using data from only the source and target language of interest. Specifically, we use a two-phase distillation approach, termed BiStil: (i) the first phase distils a general bilingual model from the MMT, while (ii) the second, task-specific phase sparsely fine-tunes the bilingual "student" model using a task-tuned variant of the original MMT as its "teacher". We evaluate this distillation technique in zero-shot cross-lingual transfer across a number of standard cross-lingual benchmarks. The key results indicate that the distilled models exhibit minimal degradation in target language performance relative to the base MMT despite being significantly smaller and faster. Furthermore, we find that they outperform multilingually distilled models such as DistilmBERT and MiniLMv2 while having a very modest training budget in comparison, even on a per-language basis. We also show that bilingual models distilled from MMTs greatly outperform bilingual models trained from scratch. Our code and models are available at https://github.com/AlanAnsell/bistil.
OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection
Authors: Zhangyang Qi, Jiaqi Wang, Xiaoyang Wu, Hengshuang Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01738
Pdf link: https://arxiv.org/pdf/2306.01738
Abstract Multi-view 3D object detection is becoming popular in autonomous driving due to its high effectiveness and low cost. Most of the current state-of-the-art detectors follow the query-based bird's-eye-view (BEV) paradigm, which benefits from both BEV's strong perception power and end-to-end pipeline. Despite achieving substantial progress, existing works model objects via globally leveraging temporal and spatial information of BEV features, resulting in problems when handling the challenging complex and dynamic autonomous driving scenarios. In this paper, we proposed an Object-Centric query-BEV detector OCBEV, which can carve the temporal and spatial cues of moving targets more effectively. OCBEV comprises three designs: Object Aligned Temporal Fusion aligns the BEV feature based on ego-motion and estimated current locations of moving objects, leading to a precise instance-level feature fusion. Object Focused Multi-View Sampling samples more 3D features from an adaptive local height ranges of objects for each scene to enrich foreground information. Object Informed Query Enhancement replaces part of pre-defined decoder queries in common DETR-style decoders with positional features of objects on high-confidence locations, introducing more direct object positional priors. Extensive experimental evaluations are conducted on the challenging nuScenes dataset. Our approach achieves a state-of-the-art result, surpassing the traditional BEVFormer by 1.5 NDS points. Moreover, we have a faster convergence speed and only need half of the training iterations to get comparable performance, which further demonstrates its effectiveness.
Keyword: mobile

How Should We Support Designing Privacy-Friendly Apps for Children? Using a Research through Design Process to Understand Developers' Needs and Challenges
Authors: Anirudh Ekambaranathan, Jun Zhao, Max Van Kleek
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2306.01152
Pdf link: https://arxiv.org/pdf/2306.01152
Abstract Mobile apps used by children often make use of harmful techniques, such as data tracking and targeted advertising. Previous research has suggested that developers face several systemic challenges in designing apps that prioritise children's best interests. To understand how developers can be better supported, we used a Research through Design (RtD) method to explore what the future of privacy-friendly app development could look like. We performed an elicitation study with 20 children's app developers to understand their needs and requirements. We found a number of specific technical requirements from the participants about how they would like to be supported, such as having actionable transnational design guidelines and easy-to-use development libraries. However, participants were reluctant to adopt these design ideas in their development practices due to perceived financial risks associated with increased privacy in apps. To overcome this critical gap, participants formulated socio-technical requirements that extend to other stakeholders in the mobile industry, including parents and marketplaces. Our findings provide important immediate and long-term design opportunities for the HCI community, and indicate that support for changing app developers' practices must be designed in the context of their relationship with other stakeholders.
Optimal Path Planning in Distinct Topo-Geometric Classes using Neighborhood-augmented Graph and its Application to Path Planning for a Tethered Robot in 3D
Authors: Alp Sahin, Subhrajit Bhattacharya
Subjects: Robotics (cs.RO); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2306.01203
Pdf link: https://arxiv.org/pdf/2306.01203
Abstract Many robotics applications benefit from being able to compute multiple locally optimal paths in a given configuration space. Examples include path planning for of tethered robots with cable-length constraints, systems involving cables, multi-robot topological exploration & coverage, and, congestion reduction for mobile robots navigation without inter-robot coordination. Existing paradigm is to use topological path planning methods that can provide optimal paths from distinct topological classes available in the underlying configuration space. However, these methods usually require non-trivial and non-universal geometrical constructions, which are prohibitively complex or expensive in 3 or higher dimensional configuration spaces with complex topology. Furthermore, topological methods are unable to distinguish between locally optimal paths that belong to the same topological class but are distinct because of genus-zero obstacles in 3D or due to high-cost or high-curvature regions. In this paper we propose an universal and generalized approach to multi-class path planning using the concept of a novel neighborhood-augmented graph, search-based planning in which can compute paths in distinct topo-geometric classes. This approach can find desired number of locally optimal paths in a wider variety of configuration spaces without requiring any complex pre-processing or geometric constructions. Unlike the existing topological methods, resulting optimal paths are not restricted to distinct topological classes, thus making the algorithm applicable to many other problems where locally optimal and geometrically distinct paths are of interest. For the demonstration of an application of the proposed approach, we implement our algorithm to planning for shortest traversible paths for a tethered robot with cable-length constraint navigating in 3D and validate it in simulations & experiments.
SelFLoc: Selective Feature Fusion for Large-scale Point Cloud-based Place Recognition
Authors: Qibo Qiu, Haiming Gao, Wenxiao Wang, Zhiyi Su, Tian Xie, Wei Hua, Xiaofei He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01205
Pdf link: https://arxiv.org/pdf/2306.01205
Abstract Point cloud-based place recognition is crucial for mobile robots and autonomous vehicles, especially when the global positioning sensor is not accessible. LiDAR points are scattered on the surface of objects and buildings, which have strong shape priors along different axes. To enhance message passing along particular axes, Stacked Asymmetric Convolution Block (SACB) is designed, which is one of the main contributions in this paper. Comprehensive experiments demonstrate that asymmetric convolution and its corresponding strategies employed by SACB can contribute to the more effective representation of point cloud feature. On this basis, Selective Feature Fusion Block (SFFB), which is formed by stacking point- and channel-wise gating layers in a predefined sequence, is proposed to selectively boost salient local features in certain key regions, as well as to align the features before fusion phase. SACBs and SFFBs are combined to construct a robust and accurate architecture for point cloud-based place recognition, which is termed SelFLoc. Comparative experimental results show that SelFLoc achieves the state-of-the-art (SOTA) performance on the Oxford and other three in-house benchmarks with an improvement of 1.6 absolute percentages on mean average recall@1.
Keyword: pruning

PV2TEA: Patching Visual Modality to Textual-Established Information Extraction
Authors: Hejie Cui, Rongmei Lin, Nasser Zalmout, Chenwei Zhang, Jingbo Shang, Carl Yang, Xian Li
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2306.01016
Pdf link: https://arxiv.org/pdf/2306.01016
Abstract Information extraction, e.g., attribute value extraction, has been extensively studied and formulated based only on text. However, many attributes can benefit from image-based extraction, like color, shape, pattern, among others. The visual modality has long been underutilized, mainly due to multimodal annotation difficulty. In this paper, we aim to patch the visual modality to the textual-established attribute information extractor. The cross-modality integration faces several unique challenges: (C1) images and textual descriptions are loosely paired intra-sample and inter-samples; (C2) images usually contain rich backgrounds that can mislead the prediction; (C3) weakly supervised labels from textual-established extractors are biased for multimodal training. We present PV2TEA, an encoder-decoder architecture equipped with three bias reduction schemes: (S1) Augmented label-smoothed contrast to improve the cross-modality alignment for loosely-paired image and text; (S2) Attention-pruning that adaptively distinguishes the visual foreground; (S3) Two-level neighborhood regularization that mitigates the label textual bias via reliability estimation. Empirical results on real-world e-Commerce datasets demonstrate up to 11.74% absolute (20.97% relatively) F1 increase over unimodal baselines.
Robust low-rank training via approximate orthonormal constraints
Authors: Dayana Savostianova, Emanuele Zangrando, Gianluca Ceruti, Francesco Tudisco
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.01485
Pdf link: https://arxiv.org/pdf/2306.01485
Abstract With the growth of model and data sizes, a broad effort has been made to design pruning techniques that reduce the resource demand of deep learning pipelines, while retaining model performance. In order to reduce both inference and training costs, a prominent line of work uses low-rank matrix factorizations to represent the network weights. Although able to retain accuracy, we observe that low-rank methods tend to compromise model robustness against adversarial perturbations. By modeling robustness in terms of the condition number of the neural network, we argue that this loss of robustness is due to the exploding singular values of the low-rank weight matrices. Thus, we introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold while simultaneously enforcing approximate orthonormal constraints. The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy. This is shown by extensive numerical evidence and by our main approximation theorem that shows the computed robust low-rank network well-approximates the ideal full model, provided a highly performing low-rank sub-network exists.
Group channel pruning and spatial attention distilling for object detection
Authors: Yun Chu, Pu Li, Yong Bai, Zhuhua Hu, Yongqing Chen, Jiafeng Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01526
Pdf link: https://arxiv.org/pdf/2306.01526
Abstract Due to the over-parameterization of neural networks, many model compression methods based on pruning and quantization have emerged. They are remarkable in reducing the size, parameter number, and computational complexity of the model. However, most of the models compressed by such methods need the support of special hardware and software, which increases the deployment cost. Moreover, these methods are mainly used in classification tasks, and rarely directly used in detection tasks. To address these issues, for the object detection network we introduce a three-stage model compression method: dynamic sparse training, group channel pruning, and spatial attention distilling. Firstly, to select out the unimportant channels in the network and maintain a good balance between sparsity and accuracy, we put forward a dynamic sparse training method, which introduces a variable sparse rate, and the sparse rate will change with the training process of the network. Secondly, to reduce the effect of pruning on network accuracy, we propose a novel pruning method called group channel pruning. In particular, we divide the network into multiple groups according to the scales of the feature layer and the similarity of module structure in the network, and then we use different pruning thresholds to prune the channels in each group. Finally, to recover the accuracy of the pruned network, we use an improved knowledge distillation method for the pruned network. Especially, we extract spatial attention information from the feature maps of specific scales in each group as knowledge for distillation. In the experiments, we use YOLOv4 as the object detection network and PASCAL VOC as the training dataset. Our method reduces the parameters of the model by 64.7 % and the calculation by 34.9%.
Keyword: diffusion

DiffLoad: Uncertainty Quantification in Load Forecasting with Diffusion Model
Authors: Zhixian Wang, Qingsong Wen, Chaoli Zhang, Liang Sun, Yi Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.01001
Pdf link: https://arxiv.org/pdf/2306.01001
Abstract Electrical load forecasting is of great significance for the decision makings in power systems, such as unit commitment and energy management. In recent years, various self-supervised neural network-based methods have been applied to electrical load forecasting to improve forecasting accuracy and capture uncertainties. However, most current methods are based on Gaussian likelihood methods, which aim to accurately estimate the distribution expectation under a given covariate. This kind of approach is difficult to adapt to situations where temporal data has a distribution shift and outliers. In this paper, we propose a diffusion-based Seq2seq structure to estimate epistemic uncertainty and use the robust additive Cauchy distribution to estimate aleatoric uncertainty. Rather than accurately forecasting conditional expectations, we demonstrate our method's ability in separating two types of uncertainties and dealing with the mutant scenarios.
Addressing Discrepancies in Semantic and Visual Alignment in Neural Networks
Authors: Natalie Abreu, Nathan Vaska, Victoria Helus
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01148
Pdf link: https://arxiv.org/pdf/2306.01148
Abstract For the task of image classification, neural networks primarily rely on visual patterns. In robust networks, we would expect for visually similar classes to be represented similarly. We consider the problem of when semantically similar classes are visually dissimilar, and when visual similarity is present among non-similar classes. We propose a data augmentation technique with the goal of better aligning semantically similar classes with arbitrary (non-visual) semantic relationships. We leverage recent work in diffusion-based semantic mixing to generate semantic hybrids of two classes, and these hybrids are added to the training set as augmented data. We evaluate whether the method increases semantic alignment by evaluating model performance on adversarially perturbed data, with the idea that it should be easier for an adversary to switch one class to a similarly represented class. Results demonstrate that there is an increase in alignment of semantically similar classes when using our proposed data augmentation method.
Generative AI for Product Design: Getting the Right Design and the Design Right
Authors: Matthew K. Hong, Shabnam Hakimi, Yan-Ying Chen, Heishiro Toyoda, Charlene Wu, Matt Klenk
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2306.01217
Pdf link: https://arxiv.org/pdf/2306.01217
Abstract Generative AI (GenAI) models excel in their ability to recognize patterns in existing data and generate new and unexpected content. Recent advances have motivated applications of GenAI tools (e.g., Stable Diffusion, ChatGPT) to professional practice across industries, including product design. While these generative capabilities may seem enticing on the surface, certain barriers limit their practical application for real-world use in industry settings. In this position paper, we articulate and situate these barriers within two phases of the product design process, namely "getting the right design" and "getting the design right," and propose a research agenda to stimulate discussions around opportunities for realizing the full potential of GenAI tools in product design.
Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models
Authors: Virginia Fernandez, Pedro Sanchez, Walter Hugo Lopez Pinaya, Grzegorz Jacenków, Sotirios A. Tsaftaris, Jorge Cardoso
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01322
Pdf link: https://arxiv.org/pdf/2306.01322
Abstract Knowledge distillation in neural networks refers to compressing a large model or dataset into a smaller version of itself. We introduce Privacy Distillation, a framework that allows a text-to-image generative model to teach another model without exposing it to identifiable data. Here, we are interested in the privacy issue faced by a data provider who wishes to share their data via a multimodal generative model. A question that immediately arises is ``How can a data provider ensure that the generative model is not leaking identifiable information about a patient?''. Our solution consists of (1) training a first diffusion model on real data (2) generating a synthetic dataset using this model and filtering it to exclude images with a re-identifiability risk (3) training a second diffusion model on the filtered synthetic data only. We showcase that datasets sampled from models trained with privacy distillation can effectively reduce re-identification risk whilst maintaining downstream performance.
Quantifying Sample Anonymity in Score-Based Generative Models with Adversarial Fingerprinting
Authors: Mischa Dombrowski, Bernhard Kainz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01363
Pdf link: https://arxiv.org/pdf/2306.01363
Abstract Recent advances in score-based generative models have led to a huge spike in the development of downstream applications using generative models ranging from data augmentation over image and video generation to anomaly detection. Despite publicly available trained models, their potential to be used for privacy preserving data sharing has not been fully explored yet. Training diffusion models on private data and disseminating the models and weights rather than the raw dataset paves the way for innovative large-scale data-sharing strategies, particularly in healthcare, where safeguarding patients' personal health information is paramount. However, publishing such models without individual consent of, e.g., the patients from whom the data was acquired, necessitates guarantees that identifiable training samples will never be reproduced, thus protecting personal health data and satisfying the requirements of policymakers and regulatory bodies. This paper introduces a method for estimating the upper bound of the probability of reproducing identifiable training images during the sampling process. This is achieved by designing an adversarial approach that searches for anatomic fingerprints, such as medical devices or dermal art, which could potentially be employed to re-identify training images. Our method harnesses the learned score-based model to estimate the probability of the entire subspace of the score function that may be utilized for one-to-one reproduction of training samples. To validate our estimates, we generate anomalies containing a fingerprint and investigate whether generated samples from trained generative models can be uniquely mapped to the original training samples. Overall our results show that privacy-breaching images are reproduced at sampling time if the models were trained without care.
PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models
Authors: Jiacheng Chen, Ruizhi Deng, Yasutaka Furukawa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01461
Pdf link: https://arxiv.org/pdf/2306.01461
Abstract This paper presents PolyDiffuse, a novel structured reconstruction algorithm that transforms visual sensor data into polygonal shapes with Diffusion Models (DM), an emerging machinery amid exploding generative AI, while formulating reconstruction as a generation process conditioned on sensor data. The task of structured reconstruction poses two fundamental challenges to DM: 1) A structured geometry is a set'' (e.g., a set of polygons for a floorplan geometry), where a sample of $N$ elements has $N!$ different but equivalent representations, making the denoising highly ambiguous; and 2) Areconstruction'' task has a single solution, where an initial noise needs to be chosen carefully, while any initial noise works for a generation task. Our technical contribution is the introduction of a Guided Set Diffusion Model where 1) the forward diffusion process learns guidance networks to control noise injection so that one representation of a sample remains distinct from its other permutation variants, thus resolving denoising ambiguity; and 2) the reverse denoising process reconstructs polygonal shapes, initialized and directed by the guidance networks, as a conditional generation process subject to the sensor data. We have evaluated our approach for reconstructing two types of polygonal shapes: floorplan as a set of polygons and HD map for autonomous cars as a set of polylines. Through extensive experiments on standard benchmarks, we demonstrate that PolyDiffuse significantly advances the current state of the art and enables broader practical applications.
Influence Maximization with Fairness at Scale (Extended Version)
Authors: Yuting Feng, Ankitkumar Patel, Bogdan Cautis, Hossein Vahabi
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2306.01587
Pdf link: https://arxiv.org/pdf/2306.01587
Abstract In this paper, we revisit the problem of influence maximization with fairness, which aims to select k influential nodes to maximise the spread of information in a network, while ensuring that selected sensitive user attributes are fairly affected, i.e., are proportionally similar between the original network and the affected users. Recent studies on this problem focused only on extremely small networks, hence the challenge remains on how to achieve a scalable solution, applicable to networks with millions or billions of nodes. We propose an approach that is based on learning node representations for fair spread from diffusion cascades, instead of the social connectivity s.t. we can deal with very large graphs. We propose two data-driven approaches: (a) fairness-based participant sampling (FPS), and (b) fairness as context (FAC). Spread related user features, such as the probability of diffusing information to others, are derived from the historical information cascades, using a deep neural network. The extracted features are then used in selecting influencers that maximize the influence spread, while being also fair with respect to the chosen sensitive attributes. In FPS, fairness and cascade length information are considered independently in the decision-making process, while FAC considers these information facets jointly and considers correlations between them. The proposed algorithms are generic and represent the first policy-driven solutions that can be applied to arbitrary sets of sensitive attributes at scale. We evaluate the performance of our solutions on a real-world public dataset (Sina Weibo) and on a hybrid real-synthethic dataset (Digg), which exhibit all the facets that we exploit, namely diffusion network, diffusion traces, and user profiles. These experiments show that our methods outperform the state-the-art solutions in terms of spread, fairness, and scalability.
DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation
Authors: Guanqun Bi, Lei Shen, Yanan Cao, Meng Chen, Yuqiang Xie, Zheng Lin, Xiaodong He
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01657
Pdf link: https://arxiv.org/pdf/2306.01657
Abstract Empathy is a crucial factor in open-domain conversations, which naturally shows one's caring and understanding to others. Though several methods have been proposed to generate empathetic responses, existing works often lead to monotonous empathy that refers to generic and safe expressions. In this paper, we propose to use explicit control to guide the empathy expression and design a framework DiffusEmp based on conditional diffusion language model to unify the utilization of dialogue context and attribute-oriented control signals. Specifically, communication mechanism, intent, and semantic frame are imported as multi-grained signals that control the empathy realization from coarse to fine levels. We then design a specific masking strategy to reflect the relationship between multi-grained signals and response tokens, and integrate it into the diffusion model to influence the generative process. Experimental results on a benchmark dataset EmpatheticDialogue show that our framework outperforms competitive baselines in terms of controllability, informativeness, and diversity without the loss of context-relatedness.
Denoising Diffusion Semantic Segmentation with Mask Prior Modeling
Authors: Zeqiang Lai, Yuchen Duan, Jifeng Dai, Ziheng Li, Ying Fu, Hongsheng Li, Yu Qiao, Wenhai Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01721
Pdf link: https://arxiv.org/pdf/2306.01721
Abstract The evolution of semantic segmentation has long been dominated by learning more discriminative image representations for classifying each pixel. Despite the prominent advancements, the priors of segmentation masks themselves, e.g., geometric and semantic constraints, are still under-explored. In this paper, we propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a recently-developed denoising diffusion generative model. Beginning with a unified architecture that adapts diffusion models for mask prior modeling, we focus this work on a specific instantiation with discrete diffusion and identify a variety of key design choices for its successful application. Our exploratory analysis revealed several important findings, including: (1) a simple integration of diffusion models into semantic segmentation is not sufficient, and a poorly-designed diffusion process might lead to degradation in segmentation performance; (2) during the training, the object to which noise is added is more important than the type of noise; (3) during the inference, the strict diffusion denoising scheme may not be essential and can be relaxed to a simpler scheme that even works better. We evaluate the proposed prior modeling with several off-the-shelf segmentors, and our experimental results on ADE20K and Cityscapes demonstrate that our approach could achieve competitively quantitative performance and more appealing visual quality.
Video Colorization with Pre-trained Text-to-Image Diffusion Models
Authors: Hanyuan Liu, Minshan Xie, Jinbo Xing, Chengze Li, Tien-Tsin Wong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2306.01732
Pdf link: https://arxiv.org/pdf/2306.01732
Abstract Video colorization is a challenging task that involves inferring plausible and temporally consistent colors for grayscale frames. In this paper, we present ColorDiffuser, an adaptation of a pre-trained text-to-image latent diffusion model for video colorization. With the proposed adapter-based approach, we repropose the pre-trained text-to-image model to accept input grayscale video frames, with the optional text description, for video colorization. To enhance the temporal coherence and maintain the vividness of colorization across frames, we propose two novel techniques: the Color Propagation Attention and Alternated Sampling Strategy. Color Propagation Attention enables the model to refine its colorization decision based on a reference latent frame, while Alternated Sampling Strategy captures spatiotemporal dependencies by using the next and previous adjacent latent frames alternatively as reference during the generative diffusion sampling steps. This encourages bidirectional color information propagation between adjacent video frames, leading to improved color consistency across frames. We conduct extensive experiments on benchmark datasets, and the results demonstrate the effectiveness of our proposed framework. The evaluations show that ColorDiffuser achieves state-of-the-art performance in video colorization, surpassing existing methods in terms of color fidelity, temporal consistency, and visual quality.
Keyword: dynamic

Towards Fair Disentangled Online Learning for Changing Environments
Authors: Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Christan Grant, Feng Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01007
Pdf link: https://arxiv.org/pdf/2306.01007
Abstract In the problem of online learning for changing environments, data are sequentially received one after another over time, and their distribution assumptions may vary frequently. Although existing methods demonstrate the effectiveness of their learning algorithms by providing a tight bound on either dynamic regret or adaptive regret, most of them completely ignore learning with model fairness, defined as the statistical parity across different sub-population (e.g., race and gender). Another drawback is that when adapting to a new environment, an online learner needs to update model parameters with a global change, which is costly and inefficient. Inspired by the sparse mechanism shift hypothesis, we claim that changing environments in online learning can be attributed to partial changes in learned parameters that are specific to environments and the rest remain invariant to changing environments. To this end, in this paper, we propose a novel algorithm under the assumption that data collected at each time can be disentangled with two representations, an environment-invariant semantic factor and an environment-specific variation factor. The semantic factor is further used for fair prediction under a group fairness constraint. To evaluate the sequence of model parameters generated by the learner, a novel regret is proposed in which it takes a mixed form of dynamic and static regret metrics followed by a fairness-aware long-term constraint. The detailed analysis provides theoretical guarantees for loss regret and violation of cumulative fairness constraints. Empirical evaluations on real-world datasets demonstrate our proposed method sequentially outperforms baseline methods in model accuracy and fairness.
Physics-informed machine learning of redox flow battery based on a two-dimensional unit cell model
Authors: Wenqian Chen, Yucheng Fu, Panos Stinis
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
Arxiv link: https://arxiv.org/abs/2306.01010
Pdf link: https://arxiv.org/pdf/2306.01010
Abstract In this paper, we present a physics-informed neural network (PINN) approach for predicting the performance of an all-vanadium redox flow battery, with its physics constraints enforced by a two-dimensional (2D) mathematical model. The 2D model, which includes 6 governing equations and 24 boundary conditions, provides a detailed representation of the electrochemical reactions, mass transport and hydrodynamics occurring inside the redox flow battery. To solve the 2D model with the PINN approach, a composite neural network is employed to approximate species concentration and potentials; the input and output are normalized according to prior knowledge of the battery system; the governing equations and boundary conditions are first scaled to an order of magnitude around 1, and then further balanced with a self-weighting method. Our numerical results show that the PINN is able to predict cell voltage correctly, but the prediction of potentials shows a constant-like shift. To fix the shift, the PINN is enhanced by further constrains derived from the current collector boundary. Finally, we show that the enhanced PINN can be even further improved if a small number of labeled data is available.
Data-driven modeling and parameter estimation of Nonlinear systems
Authors: Kaushal Kumar
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.01011
Pdf link: https://arxiv.org/pdf/2306.01011
Abstract Nonlinear systems are prevalent in many fields of science and engineering, and understanding their behavior is essential for developing effective control and prediction strategies. In this paper, we present a novel data-driven approach for accurately modeling and estimating parameters of nonlinear systems using trust region optimization. Our method is applied to three classic systems: the Van der Pol oscillator, the Damped oscillator, and the Lorenz system, which have broad applications in various fields, including engineering, physics, and biology. Our results demonstrate that our approach can accurately identify the parameters of these nonlinear systems, providing a reliable characterization of their behavior. We show that the ability to capture the dynamics on the attractor is crucial for these systems, especially in chaotic systems like the Lorenz system. Overall, this article presents a robust data-driven approach for parameter estimation of nonlinear dynamical systems, with promising potential for real-world applications.
Graph-Level Embedding for Time-Evolving Graphs
Authors: Lili Wang, Chenghan Huang, Weicheng Ma, Xinyuan Cao, Soroush Vosoughi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2306.01012
Pdf link: https://arxiv.org/pdf/2306.01012
Abstract Graph representation learning (also known as network embedding) has been extensively researched with varying levels of granularity, ranging from nodes to graphs. While most prior work in this area focuses on node-level representation, limited research has been conducted on graph-level embedding, particularly for dynamic or temporal networks. However, learning low-dimensional graph-level representations for dynamic networks is critical for various downstream graph retrieval tasks such as temporal graph similarity ranking, temporal graph isomorphism, and anomaly detection. In this paper, we present a novel method for temporal graph-level embedding that addresses this gap. Our approach involves constructing a multilayer graph and using a modified random walk with temporal backtracking to generate temporal contexts for the graph's nodes. We then train a "document-level" language model on these contexts to generate graph-level embeddings. We evaluate our proposed model on five publicly available datasets for the task of temporal graph similarity ranking, and our model outperforms baseline methods. Our experimental results demonstrate the effectiveness of our method in generating graph-level embeddings for dynamic networks.
A Vitual-Force Based Swarm Algorithm for Balanced Circular Bin Packing Problems
Authors: Juliette Gamot, Mathieu Balesdent, Romain Wuilbercq, Arnault Tremolet, Nouredine Melab, El-Ghazali Talbi
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2306.01021
Pdf link: https://arxiv.org/pdf/2306.01021
Abstract Balanced circular bin packing problems consist in positioning a given number of weighted circles in order to minimize the radius of a circular container while satisfying equilibrium constraints. These problems are NP-hard, highly constrained and dimensional. This paper describes a swarm algorithm based on a virtual-force system in order to solve balanced circular bin packing problems. In the proposed approach, a system of forces is applied to each component allowing to take into account the constraints and minimizing the objective function using the fundamental principle of dynamics. The proposed algorithm is experimented and validated on benchmarks of various balanced circular bin packing problems with up to 300 circles. The reported results allow to assess the effectiveness of the proposed approach compared to existing results from the literature.
Chaos persists in large-scale multi-agent learning despite adaptive learning rates
Authors: Emmanouil-Vasileios Vlatakis-Gkaragkounis, Lampros Flokas, Georgios Piliouras
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.01032
Pdf link: https://arxiv.org/pdf/2306.01032
Abstract Multi-agent learning is intrinsically harder, more unstable and unpredictable than single agent optimization. For this reason, numerous specialized heuristics and techniques have been designed towards the goal of achieving convergence to equilibria in self-play. One such celebrated approach is the use of dynamically adaptive learning rates. Although such techniques are known to allow for improved convergence guarantees in small games, it has been much harder to analyze them in more relevant settings with large populations of agents. These settings are particularly hard as recent work has established that learning with fixed rates will become chaotic given large enough populations.In this work, we show that chaos persists in large population congestion games despite using adaptive learning rates even for the ubiquitous Multiplicative Weight Updates algorithm, even in the presence of only two strategies. At a technical level, due to the non-autonomous nature of the system, our approach goes beyond conventional period-three techniques Li-Yorke by studying fundamental properties of the dynamics including invariant sets, volume expansion and turbulent sets. We complement our theoretical insights with experiments showcasing that slight variations to system parameters lead to a wide variety of unpredictable behaviors.
Investigating Navigation Strategies in the Morris Water Maze through Deep Reinforcement Learning
Authors: Andrew Liu, Alla Borisyuk
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2306.01066
Pdf link: https://arxiv.org/pdf/2306.01066
Abstract Navigation is a complex skill with a long history of research in animals and humans. In this work, we simulate the Morris Water Maze in 2D to train deep reinforcement learning agents. We perform automatic classification of navigation strategies, analyze the distribution of strategies used by artificial agents, and compare them with experimental data to show similar learning dynamics as those seen in humans and rodents. We develop environment-specific auxiliary tasks and examine factors affecting their usefulness. We suggest that the most beneficial tasks are potentially more biologically feasible for real agents to use. Lastly, we explore the development of internal representations in the activations of artificial agent neural networks. These representations resemble place cells and head-direction cells found in mouse brains, and their presence has correlation to the navigation strategies that artificial agents employ.
Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints
Authors: Jiachen Li, Xinwei Shi, Feiyu Chen, Jonathan Stroud, Zhishuai Zhang, Tian Lan, Junhua Mao, Jeonhyung Kang, Khaled S. Refaat, Weilong Yang, Eugene Ie, Congcong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01075
Pdf link: https://arxiv.org/pdf/2306.01075
Abstract Accurate understanding and prediction of human behaviors are critical prerequisites for autonomous vehicles, especially in highly dynamic and interactive scenarios such as intersections in dense urban areas. In this work, we aim at identifying crossing pedestrians and predicting their future trajectories. To achieve these goals, we not only need the context information of road geometry and other traffic participants but also need fine-grained information of the human pose, motion and activity, which can be inferred from human keypoints. In this paper, we propose a novel multi-task learning framework for pedestrian crossing action recognition and trajectory prediction, which utilizes 3D human keypoints extracted from raw sensor data to capture rich information on human pose and activity. Moreover, we propose to apply two auxiliary tasks and contrastive learning to enable auxiliary supervisions to improve the learned keypoints representation, which further enhances the performance of major tasks. We validate our approach on a large-scale in-house dataset, as well as a public benchmark dataset, and show that our approach achieves state-of-the-art performance on a wide range of evaluation metrics. The effectiveness of each model component is validated in a detailed ablation study.
4DSR-GCN: 4D Video Point Cloud Upsampling using Graph Convolutional Networks
Authors: Lorenzo Berlincioni, Stefano Berretti, Marco Bertini, Alberto Del Bimbo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2306.01081
Pdf link: https://arxiv.org/pdf/2306.01081
Abstract Time varying sequences of 3D point clouds, or 4D point clouds, are now being acquired at an increasing pace in several applications (e.g., LiDAR in autonomous or assisted driving). In many cases, such volume of data is transmitted, thus requiring that proper compression tools are applied to either reduce the resolution or the bandwidth. In this paper, we propose a new solution for upscaling and restoration of time-varying 3D video point clouds after they have been heavily compressed. In consideration of recent growing relevance of 3D applications, %We focused on a model allowing user-side upscaling and artifact removal for 3D video point clouds, a real-time stream of which would require . Our model consists of a specifically designed Graph Convolutional Network (GCN) that combines Dynamic Edge Convolution and Graph Attention Networks for feature aggregation in a Generative Adversarial setting. By taking inspiration PointNet++, We present a different way to sample dense point clouds with the intent to make these modules work in synergy to provide each node enough features about its neighbourhood in order to later on generate new vertices. Compared to other solutions in the literature that address the same task, our proposed model is capable of obtaining comparable results in terms of quality of the reconstruction, while using a substantially lower number of parameters (about 300KB), making our solution deployable in edge computing devices such as LiDAR.
Extended-XRI Body Interfaces for Hyper-Connected Metaverse Environments
Authors: Jie Guan, Alexis Morris
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2306.01096
Pdf link: https://arxiv.org/pdf/2306.01096
Abstract Hybrid mixed-reality (XR) internet-of-things (IoT) research, here called XRI, aims at a strong integration between physical and virtual objects, environments, and agents wherein IoT-enabled edge devices are deployed for sensing, context understanding, networked communication and control of device actuators. Likewise, as augmented reality systems provide an immersive overlay on the environments, and virtual reality provides fully immersive environments, the merger of these domains leads to immersive smart spaces that are hyper-connected, adaptive and dynamic components that anchor the metaverse to real-world constructs. Enabling the human-in-the-loop to remain engaged and connected across these virtual-physical hybrid environments requires advances in user interaction that are multi-dimensional. This work investigates the potential to transition the user interface to the human body as an extended-reality avatar with hybrid extended-body interfaces that can interact both with the physical and virtual sides of the metaverse. It contributes: i) an overview of metaverses, XRI, and avatarization concepts, ii) a taxonomy landscape for extended XRI body interfaces, iii) an architecture and potential interactions for XRI body designs, iv) a prototype XRI body implementation based on the architecture, v) a design-science evaluation, toward enabling future design research directions.
A Neural RDE-based model for solving path-dependent PDEs
Authors: Bowen Fang, Hao Ni, Yue Wu
Subjects: Machine Learning (cs.LG); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2306.01123
Pdf link: https://arxiv.org/pdf/2306.01123
Abstract The concept of the path-dependent partial differential equation (PPDE) was first introduced in the context of path-dependent derivatives in financial markets. Its semilinear form was later identified as a non-Markovian backward stochastic differential equation (BSDE). Compared to the classical PDE, the solution of a PPDE involves an infinite-dimensional spatial variable, making it challenging to approximate, if not impossible. In this paper, we propose a neural rough differential equation (NRDE)-based model to learn PPDEs, which effectively encodes the path information through the log-signature feature while capturing the fundamental dynamics. The proposed continuous-time model for the PPDE solution offers the benefits of efficient memory usage and the ability to scale with dimensionality. Several numerical experiments, provided to validate the performance of the proposed model in comparison to the strong baseline in the literature, are used to demonstrate its effectiveness.
Numerical Investigation of the Fractional Oscillation Equations under the Context of Variable Order Caputo Fractional Derivative via Fractional Order Bernstein Wavelets
Authors: Ashish Rayal, Bhagawati Prasad Joshi, Mukesh Pandey, Delfim F. M. Torres
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01124
Pdf link: https://arxiv.org/pdf/2306.01124
Abstract This article describes an approximation technique based on fractional order Bernstein wavelets for the numerical simulations of fractional oscillation equations under variable order, and the fractional order Bernstein wavelets are derived by means of fractional Bernstein polynomials. The oscillation equation describes electrical circuits and exhibits a wide range of nonlinear dynamical behaviors. The proposed variable order model is of current interest in a lot of application areas in engineering and applied sciences. The purpose of this study is to analyze the behavior of the fractional force-free and forced oscillation equations under the variable-order fractional operator. The basic idea behind using the approximation technique is that it converts the proposed model into non-linear algebraic equations with the help of collocation nodes for easy computation. Different cases of the proposed model are examined under the selected variable order parameters for the first time in order to show the precision and performance of the mentioned scheme. The dynamic behavior and results are presented via tables and graphs to ensure the validity of the mentioned scheme. Further, the behavior of the obtained solutions for the variable order is also depicted. From the calculated results, it is observed that the mentioned scheme is extremely simple and efficient for examining the behavior of nonlinear random (constant or variable) order fractional models occurring in engineering and science.
A space-time discontinuous Galerkin method for coupled poroelasticity-elasticity problems
Authors: Paola F. Antonietti, Michele Botti, Ilario Mazzieri
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01140
Pdf link: https://arxiv.org/pdf/2306.01140
Abstract This work is concerned with the analysis of a space-time finite element discontinuous Galerkin method on polytopal meshes (XT-PolydG) for the numerical discretization of wave propagation in coupled poroelastic-elastic media. The mathematical model consists of the low-frequency Biot's equations in the poroelastic medium and the elastodynamics equation for the elastic one. To realize the coupling, suitable transmission conditions on the interface between the two domains are (weakly) embedded in the formulation. The proposed PolydG discretization in space is then coupled with a dG time integration scheme, resulting in a full space-time dG discretization. We present the stability analysis for both the continuous and the semidiscrete formulations, and we derive error estimates for the semidiscrete formulation in a suitable energy norm. The method is applied to a wide set of numerical test cases to verify the theoretical bounds. Examples of physical interest are also presented to investigate the capability of the proposed method in relevant geophysical scenarios.
AI Liability Insurance With an Example in AI-Powered E-diagnosis System
Authors: Yunfei Ge, Quanyan Zhu
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2306.01149
Pdf link: https://arxiv.org/pdf/2306.01149
Abstract Artificial Intelligence (AI) has received an increasing amount of attention in multiple areas. The uncertainties and risks in AI-powered systems have created reluctance in their wild adoption. As an economic solution to compensate for potential damages, AI liability insurance is a promising market to enhance the integration of AI into daily life. In this work, we use an AI-powered E-diagnosis system as an example to study AI liability insurance. We provide a quantitative risk assessment model with evidence-based numerical analysis. We discuss the insurability criteria for AI technologies and suggest necessary adjustments to accommodate the features of AI products. We show that AI liability insurance can act as a regulatory mechanism to incentivize compliant behaviors and serve as a certificate of high-quality AI systems. Furthermore, we suggest premium adjustment to reflect the dynamic evolution of the inherent uncertainty in AI. Moral hazard problems are discussed and suggestions for AI liability insurance are provided.
The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks
Authors: Can Yaras, Peng Wang, Wei Hu, Zhihui Zhu, Laura Balzano, Qing Qu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01154
Pdf link: https://arxiv.org/pdf/2306.01154
Abstract Over the past few years, an extensively studied phenomenon in training deep networks is the implicit bias of gradient descent towards parsimonious solutions. In this work, we investigate this phenomenon by narrowing our focus to deep linear networks. Through our analysis, we reveal a surprising "law of parsimony" in the learning dynamics when the data possesses low-dimensional structures. Specifically, we show that the evolution of gradient descent starting from orthogonal initialization only affects a minimal portion of singular vector spaces across all weight matrices. In other words, the learning process happens only within a small invariant subspace of each weight matrix, despite the fact that all weight parameters are updated throughout training. This simplicity in learning dynamics could have significant implications for both efficient training and a better understanding of deep networks. First, the analysis enables us to considerably improve training efficiency by taking advantage of the low-dimensional structure in learning dynamics. We can construct smaller, equivalent deep linear networks without sacrificing the benefits associated with the wider counterparts. Second, it allows us to better understand deep representation learning by elucidating the linear progressive separation and concentration of representations from shallow to deep layers. We also conduct numerical experiments to support our theoretical results. The code for our experiments can be found at https://github.com/cjyaras/lawofparsimony.
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
Authors: Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.01160
Pdf link: https://arxiv.org/pdf/2306.01160
Abstract Transformer-based language models have found many diverse applications requiring them to process sequences of increasing length. For these applications, the causal self-attention -- which is the only component scaling quadratically w.r.t. the sequence length -- becomes a central concern. While many works have proposed schemes to sparsify the attention patterns and reduce the computational overhead of self-attention, those are often limited by implementations concerns and end up imposing a simple and static structure over the attention matrix. Conversely, implementing more dynamic sparse attentions often results in runtimes significantly slower than computing the full attention using the Flash implementation from Dao et al. (2022). We extend FlashAttention to accommodate a large class of attention sparsity patterns that, in particular, encompass key/query dropping and hashing-based attention. This leads to implementations with no computational complexity overhead and a multi-fold runtime speedup on top of FlashAttention. Even with relatively low degrees of sparsity, our method improves visibly upon FlashAttention as the sequence length increases. Without sacrificing perplexity, we increase the training speed of a transformer language model by $2.0\times$ and $3.3\times$ for sequences of respectively $8k$ and $16k$ tokens.
Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations
Authors: Anudhyan Boral, Zhong Yi Wan, Leonardo Zepeda-Núñez, James Lottes, Qing Wang, Yi-fan Chen, John Roberts Anderson, Fei Sha
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01174
Pdf link: https://arxiv.org/pdf/2306.01174
Abstract We introduce a data-driven learning framework that assimilates two powerful ideas: ideal large eddy simulation (LES) from turbulence closure modeling and neural stochastic differential equations (SDE) for stochastic modeling. The ideal LES models the LES flow by treating each full-order trajectory as a random realization of the underlying dynamics, as such, the effect of small-scales is marginalized to obtain the deterministic evolution of the LES state. However, ideal LES is analytically intractable. In our work, we use a latent neural SDE to model the evolution of the stochastic process and an encoder-decoder pair for transforming between the latent space and the desired ideal flow field. This stands in sharp contrast to other types of neural parameterization of closure models where each trajectory is treated as a deterministic realization of the dynamics. We show the effectiveness of our approach (niLES - neural ideal LES) on a challenging chaotic dynamical system: Kolmogorov flow at a Reynolds number of 20,000. Compared to competing methods, our method can handle non-uniform geometries using unstructured meshes seamlessly. In particular, niLES leads to trajectories with more accurate statistics and enhances stability, particularly for long-horizon rollouts.
The Benefits of Interaction Constraints in Distributed Autonomous Systems
Authors: Michael Crosscombe, Jonathan Lawry
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2306.01179
Pdf link: https://arxiv.org/pdf/2306.01179
Abstract The design of distributed autonomous systems often omits consideration of the underlying network dynamics. Recent works in multi-agent systems and swarm robotics alike have highlighted the impact that the interactions between agents have on the collective behaviours exhibited by the system. In this paper, we seek to highlight the role that the underlying interaction network plays in determining the performance of the collective behaviour of a system, comparing its impact with that of the physical network. We contextualise this by defining a collective learning problem in which agents must reach a consensus about their environment in the presence of noisy information. We show that the physical connectivity of the agents plays a less important role than when an interaction network of limited connectivity is imposed on the system to constrain agent communication. Constraining agent interactions in this way drastically improves the performance of the system in a collective learning context. Additionally, we provide further evidence for the idea that `less is more' when it comes to propagating information in distributed autonomous systems for the purpose of collective learning.
Training neural operators to preserve invariant measures of chaotic attractors
Authors: Ruoxi Jiang, Peter Y. Lu, Elena Orlova, Rebecca Willett
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2306.01187
Pdf link: https://arxiv.org/pdf/2306.01187
Abstract Chaotic systems make long-horizon forecasts difficult because small perturbations in initial conditions cause trajectories to diverge at an exponential rate. In this setting, neural operators trained to minimize squared error losses, while capable of accurate short-term forecasts, often fail to reproduce statistical or structural properties of the dynamics over longer time horizons and can yield degenerate results. In this paper, we propose an alternative framework designed to preserve invariant measures of chaotic attractors that characterize the time-invariant statistical properties of the dynamics. Specifically, in the multi-environment setting (where each sample trajectory is governed by slightly different dynamics), we consider two novel approaches to training with noisy data. First, we propose a loss based on the optimal transport distance between the observed dynamics and the neural operator outputs. This approach requires expert knowledge of the underlying physics to determine what statistical features should be included in the optimal transport loss. Second, we show that a contrastive learning framework, which does not require any specialized prior knowledge, can preserve statistical properties of the dynamics nearly as well as the optimal transport approach. On a variety of chaotic systems, our method is shown empirically to preserve invariant measures of chaotic attractors.
Event-based Visual Odometry with Full Temporal Resolution via Continuous-time Gaussian Process Regression
Authors: Jianeng Wang, Jonathan D. Gammell
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01188
Pdf link: https://arxiv.org/pdf/2306.01188
Abstract Event-based cameras asynchronously capture individual visual changes in a scene. This makes them more robust than traditional frame-based cameras to highly dynamic motions and poor illumination. It also means that every measurement in a scene can occur at a unique time. Handling these different measurement times is a major challenge of using event-based cameras. It is often addressed in visual odometry (VO) pipelines by approximating temporally close measurements as occurring at one common time. This grouping simplifies the estimation problem but sacrifices the inherent temporal resolution of event-based cameras. This paper instead presents a complete stereo VO pipeline that estimates directly with individual event-measurement times without requiring any grouping or approximation. It uses continuous-time trajectory estimation to maintain the temporal fidelity and asynchronous nature of event-based cameras through Gaussian process regression with a physically motivated prior. Its performance is evaluated on the MVSEC dataset, where it achieves 7.9e-3 and 5.9e-3 RMS relative error on two independent sequences, outperforming the existing publicly available event-based stereo VO pipeline by two and four times, respectively.
Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models
Authors: Liam Dugan, Anshul Wadhawan, Kyle Spence, Chris Callison-Burch, Morgan McGuire, Victor Zordan
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.01201
Pdf link: https://arxiv.org/pdf/2306.01201
Abstract Recent work in speech-to-speech translation (S2ST) has focused primarily on offline settings, where the full input utterance is available before any output is given. This, however, is not reasonable in many real-world scenarios. In latency-sensitive applications, rather than waiting for the full utterance, translations should be spoken as soon as the information in the input is present. In this work, we introduce a system for simultaneous S2ST targeting real-world use cases. Our system supports translation from 57 languages to English with tunable parameters for dynamically adjusting the latency of the output -- including four policies for determining when to speak an output sequence. We show that these policies achieve offline-level accuracy with minimal increases in latency over a Greedy (wait-$k$) baseline. We open-source our evaluation code and interactive test script to aid future SimulS2ST research and application development.
Multi-Robot Path Planning Combining Heuristics and Multi-Agent Reinforcement Learning
Authors: Shaoming Peng
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01270
Pdf link: https://arxiv.org/pdf/2306.01270
Abstract Multi-robot path finding in dynamic environments is a highly challenging classic problem. In the movement process, robots need to avoid collisions with other moving robots while minimizing their travel distance. Previous methods for this problem either continuously replan paths using heuristic search methods to avoid conflicts or choose appropriate collision avoidance strategies based on learning approaches. The former may result in long travel distances due to frequent replanning, while the latter may have low learning efficiency due to low sample exploration and utilization, and causing high training costs for the model. To address these issues, we propose a path planning method, MAPPOHR, which combines heuristic search, empirical rules, and multi-agent reinforcement learning. The method consists of two layers: a real-time planner based on the multi-agent reinforcement learning algorithm, MAPPO, which embeds empirical rules in the action output layer and reward functions, and a heuristic search planner used to create a global guiding path. During movement, the heuristic search planner replans new paths based on the instructions of the real-time planner. We tested our method in 10 different conflict scenarios. The experiments show that the planning performance of MAPPOHR is better than that of existing learning and heuristic methods. Due to the utilization of empirical knowledge and heuristic search, the learning efficiency of MAPPOHR is higher than that of existing learning methods.
Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training
Authors: Binghui Li, Yuanzhi Li
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.01271
Pdf link: https://arxiv.org/pdf/2306.01271
Abstract Adversarial training is a standard method to train deep neural networks to be robust to adversarial perturbation. Similar to surprising $\textit{clean generalization}$ ability in the standard deep learning setting, neural networks trained by adversarial training also generalize well for $\textit{unseen clean data}$. However, in constrast with clean generalization, while adversarial training method is able to achieve low $\textit{robust training error}$, there still exists a significant $\textit{robust generalization gap}$, which promotes us exploring what mechanism leads to both $\textit{clean generalization and robust overfitting (CGRO)}$ during learning process. In this paper, we provide a theoretical understanding of this CGRO phenomenon in adversarial training. First, we propose a theoretical framework of adversarial training, where we analyze $\textit{feature learning process}$ to explain how adversarial training leads network learner to CGRO regime. Specifically, we prove that, under our patch-structured dataset, the CNN model provably partially learns the true feature but exactly memorizes the spurious features from training-adversarial examples, which thus results in clean generalization and robust overfitting. For more general data assumption, we then show the efficiency of CGRO classifier from the perspective of $\textit{representation complexity}$. On the empirical side, to verify our theoretical analysis in real-world vision dataset, we investigate the $\textit{dynamics of loss landscape}$ during training. Moreover, inspired by our experiments, we prove a robust generalization bound based on $\textit{global flatness}$ of loss landscape, which may be an independent interest.
KL-Divergence Guided Temperature Sampling
Authors: Chung-Ching Chang, David Reitter, Renat Aksitov, Yun-Hsuan Sung
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01286
Pdf link: https://arxiv.org/pdf/2306.01286
Abstract Temperature sampling is a conventional approach to diversify large language model predictions. As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual. One common approach to mitigate hallucinations is to provide source/grounding documents and the model is trained to produce predictions that bind to and are attributable to the provided source. It appears that there is a trade-off between diversity and attribution. To mitigate any such trade-off, we propose to relax the constraint of having a fixed temperature over decoding steps, and a mechanism to guide the dynamic temperature according to its relevance to the source through KL-divergence. Our experiments justifies the trade-off, and shows that our sampling algorithm outperforms the conventional top-k and top-p algorithms in conversational question-answering and summarization tasks.
Nonlinear Boundary Conditions for Initial Boundary Value Problems with Applications in Computational Fluid Dynamics
Authors: Jan Nordström
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01297
Pdf link: https://arxiv.org/pdf/2306.01297
Abstract We derive new boundary conditions and implementation procedures for nonlinear initial boundary value problems (IBVPs) with non-zero boundary data that lead to bounded solutions. The new boundary procedure is applied to nonlinear IBVPs on skew-symmetric form, including dissipative terms. The complete procedure has two main ingredients. In the first part (published in [1, 2]), the energy and entropy rate in terms of a surface integral with boundary terms was produced for problems with first derivatives. In this second part we complement it by adding second derivative dissipative terms and bound the boundary terms. We develop a new nonlinear boundary procedure which generalise the characteristic boundary procedure for linear problems. Both strong and weak imposition of the nonlinear boundary conditions with non-zero boundary data are considered, and we prove that the solution is bounded. The boundary procedure is applied to four important IBVPs in computational fluid dynamics: the incompressible Euler and Navier-Stokes, the shallow water and the compressible Euler equations. Finally we show that stable discrete approximations follow by using summation-by-parts operators combined with weak boundary conditions.
Federated Learning Games for Reconfigurable Intelligent Surfaces via Causal Representations
Authors: Charbel Bou Chaaya, Sumudu Samarakoon, Mehdi Bennis
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.01306
Pdf link: https://arxiv.org/pdf/2306.01306
Abstract In this paper, we investigate the problem of robust Reconfigurable Intelligent Surface (RIS) phase-shifts configuration over heterogeneous communication environments. The problem is formulated as a distributed learning problem over different environments in a Federated Learning (FL) setting. Equivalently, this corresponds to a game played between multiple RISs, as learning agents, in heterogeneous environments. Using Invariant Risk Minimization (IRM) and its FL equivalent, dubbed FL Games, we solve the RIS configuration problem by learning invariant causal representations across multiple environments and then predicting the phases. The solution corresponds to playing according to Best Response Dynamics (BRD) which yields the Nash Equilibrium of the FL game. The representation learner and the phase predictor are modeled by two neural networks, and their performance is validated via simulations against other benchmarks from the literature. Our results show that causality-based learning yields a predictor that is 15% more accurate in unseen Out-of-Distribution (OoD) environments.
Q-learning for distributed routing in LEO satellite constellations
Authors: Beatriz Soret, Israel Leyva-Mayorga, Federico Lozano-Cuadra, Mathias D. Thorsager
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2306.01346
Pdf link: https://arxiv.org/pdf/2306.01346
Abstract End-to-end routing in Low Earth Orbit (LEO) satellite constellations (LSatCs) is a complex and dynamic problem. The topology, of finite size, is dynamic and predictable, the traffic from/to Earth and transiting the space segment is highly imbalanced, and the delay is dominated by the propagation time in non-congested routes and by the queueing time at Inter-Satellite Links (ISLs) in congested routes. Traditional routing algorithms depend on excessive communication with ground or other satellites, and oversimplify the characterization of the path links towards the destination. We model the problem as a multi-agent Partially Observable Markov Decision Problem (POMDP) where the nodes (i.e., the satellites) interact only with nearby nodes. We propose a distributed Q-learning solution that leverages on the knowledge of the neighbours and the correlation of the routing decisions of each node. We compare our results to two centralized algorithms based on the shortest path: one aiming at using the highest data rate links and a second genie algorithm that knows the instantaneous queueing delays at all satellites. The results of our proposal are positive on every front: (1) it experiences delays that are comparable to the benchmarks in steady-state conditions; (2) it increases the supported traffic load without congestion; and (3) it can be easily implemented in a LSatC as it does not depend on the ground segment and minimizes the signaling overhead among satellites.
Deep recurrent spiking neural networks capture both static and dynamic representations of the visual cortex under movie stimuli
Authors: Liwei Huang, ZhengYu Ma, Huihui Zhou, Yonghong Tian
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2306.01354
Pdf link: https://arxiv.org/pdf/2306.01354
Abstract In the real world, visual stimuli received by the biological visual system are predominantly dynamic rather than static. A better understanding of how the visual cortex represents movie stimuli could provide deeper insight into the information processing mechanisms of the visual system. Although some progress has been made in modeling neural responses to natural movies with deep neural networks, the visual representations of static and dynamic information under such time-series visual stimuli remain to be further explored. In this work, considering abundant recurrent connections in the mouse visual system, we design a recurrent module based on the hierarchy of the mouse cortex and add it into Deep Spiking Neural Networks, which have been demonstrated to be a more compelling computational model for the visual cortex. Using Time-Series Representational Similarity Analysis, we measure the representational similarity between networks and mouse cortical regions under natural movie stimuli. Subsequently, we conduct a comparison of the representational similarity across recurrent/feedforward networks and image/video training tasks. Trained on the video action recognition task, recurrent SNN achieves the highest representational similarity and significantly outperforms feedforward SNN trained on the same task by 15% and the recurrent SNN trained on the image classification task by 8%. We investigate how static and dynamic representations of SNNs influence the similarity, as a way to explain the importance of these two forms of representations in biological neural coding. Taken together, our work is the first to apply deep recurrent SNNs to model the mouse visual cortex under movie stimuli and we establish that these networks are competent to capture both static and dynamic representations and make contributions to understanding the movie information processing mechanisms of the visual cortex.
Granular Gym: High Performance Simulation for Robotic Tasks with Granular Materials
Authors: David Millard, Daniel Pastor, Joseph Bowkett, Paul Backes, Gaurav S. Sukhatme
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.01369
Pdf link: https://arxiv.org/pdf/2306.01369
Abstract Granular materials are of critical interest to many robotic tasks in planetary science, construction, and manufacturing. However, the dynamics of granular materials are complex and often computationally very expensive to simulate. We propose a set of methodologies and a system for the fast simulation of granular materials on Graphics Processing Units (GPUs), and show that this simulation is fast enough for basic training with Reinforcement Learning algorithms, which currently require many dynamics samples to achieve acceptable performance. Our method models granular material dynamics using implicit timestepping methods for multibody rigid contacts, as well as algorithmic techniques for efficient parallel collision detection between pairs of particles and between particle and arbitrarily shaped rigid bodies, and programming techniques for minimizing warp divergence on Single-Instruction, Multiple-Thread (SIMT) chip architectures. We showcase our simulation system on several environments targeted toward robotic tasks, and release our simulator as an open-source tool.
ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an Opportunity?
Authors: Michael Heck, Nurul Lubis, Benjamin Ruppik, Renato Vukovic, Shutong Feng, Christian Geishauser, Hsien-Chin Lin, Carel van Niekerk, Milica Gašić
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01386
Pdf link: https://arxiv.org/pdf/2306.01386
Abstract Recent research on dialogue state tracking (DST) focuses on methods that allow few- and zero-shot transfer to new domains or schemas. However, performance gains heavily depend on aggressive data augmentation and fine-tuning of ever larger language model based architectures. In contrast, general purpose language models, trained on large amounts of diverse data, hold the promise of solving any kind of task without task-specific training. We present preliminary experimental results on the ChatGPT research preview, showing that ChatGPT achieves state-of-the-art performance in zero-shot DST. Despite our findings, we argue that properties inherent to general purpose models limit their ability to replace specialized systems. We further theorize that the in-context learning capabilities of such models will likely become powerful tools to support the development of dedicated and dynamic dialogue state trackers.
Physics-Augmented Data-EnablEd Predictive Control for Eco-driving of Mixed Traffic Considering Diverse Human Behaviors
Authors: Dongjun Li, Kaixiang Zhang, Haoxuan Dong, Qun Wang, Zhaojian Li, Ziyou Song
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2306.01387
Pdf link: https://arxiv.org/pdf/2306.01387
Abstract Data-driven cooperative control of connected and automated vehicles (CAVs) has gained extensive research interest as it can utilize collected data to generate control actions without relying on parametric system models that are generally challenging to obtain. Existing methods mainly focused on improving traffic safety and stability, while less emphasis has been placed on energy efficiency in the presence of uncertainties and diversities of human-driven vehicles (HDVs). In this paper, we employ a data-enabled predictive control (DeePC) scheme to address the eco-driving of mixed traffic flows with diverse behaviors of human drivers. Specifically, by incorporating the physical relationship of the studied system and the Hankel matrix update from the generalized behavior representation to a particular one, we develop a new Physics-Augmented Data-EnablEd Predictive Control (PA-DeePC) approach to handle human driver diversities. In particular, a power consumption term is added to the DeePC cost function to reduce the holistic energy consumption of both CAVs and HDVs. Simulation results demonstrate the effectiveness of our approach in accurately capturing random human driver behaviors and addressing the complex dynamics of mixed traffic flows, while ensuring driving safety and traffic efficiency. Furthermore, the proposed optimization framework achieves substantial reductions in energy consumption, i.e., average reductions of 4.83% and 9.16% when compared to the benchmark algorithms.
Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors
Authors: Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, Michael R. Lyu
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.01394
Pdf link: https://arxiv.org/pdf/2306.01394
Abstract Although the dynamic type system of Python facilitates the developers in writing Python programs, it also brings type errors at run-time. There exist rule-based approaches for automatically repairing Python type errors. The approaches can generate accurate patches but they require domain experts to design patch synthesis rules and suffer from low template coverage of real-world type errors. Learning-based approaches alleviate the manual efforts in designing patch synthesis rules. Among the learning-based approaches, the prompt-based approach which leverages the knowledge base of code pre-trained models via pre-defined prompts, obtains state-of-the-art performance in general program repair tasks. However, such prompts are manually defined and do not involve any specific clues for repairing Python type errors, resulting in limited effectiveness. How to automatically improve prompts with the domain knowledge for type error repair is challenging yet under-explored. In this paper, we present TypeFix, a novel prompt-based approach with fix templates incorporated for repairing Python type errors. TypeFix first mines generalized fix templates via a novel hierarchical clustering algorithm. The identified fix templates indicate the common edit patterns and contexts of existing type error fixes. TypeFix then generates code prompts for code pre-trained models by employing the generalized fix templates as domain knowledge, in which the masks are adaptively located for each type error instead of being pre-determined. Experiments on two benchmarks, including BugsInPy and TypeBugs, show that TypeFix successfully repairs 26 and 55 type errors, outperforming the best baseline approach by 9 and 14, respectively. Besides, the proposed fix template mining approach can cover 75% of developers' patches in both benchmarks, increasing the best rule-based approach PyTER by more than 30%.
Improving Adversarial Robustness of DEQs with Explicit Regulations Along the Neural Dynamics
Authors: Zonghan Yang, Peng Li, Tianyu Pang, Yang Liu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.01435
Pdf link: https://arxiv.org/pdf/2306.01435
Abstract Deep equilibrium (DEQ) models replace the multiple-layer stacking of conventional deep networks with a fixed-point iteration of a single-layer transformation. Having been demonstrated to be competitive in a variety of real-world scenarios, the adversarial robustness of general DEQs becomes increasingly crucial for their reliable deployment. Existing works improve the robustness of general DEQ models with the widely-used adversarial training (AT) framework, but they fail to exploit the structural uniquenesses of DEQ models. To this end, we interpret DEQs through the lens of neural dynamics and find that AT under-regulates intermediate states. Besides, the intermediate states typically provide predictions with a high prediction entropy. Informed by the correlation between the entropy of dynamical systems and their stability properties, we propose reducing prediction entropy by progressively updating inputs along the neural dynamics. During AT, we also utilize random intermediate states to compute the loss function. Our methods regulate the neural dynamics of DEQ models in this manner. Extensive experiments demonstrate that our methods substantially increase the robustness of DEQ models and even outperform the strong deep network baselines.
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
Authors: Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, Yanyong Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01438
Pdf link: https://arxiv.org/pdf/2306.01438
Abstract LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints. Though seemingly natural, how to efficiently combine them for improved feature representation is still unclear. The main challenge arises from that Radar data are extremely sparse and lack height information. Therefore, directly integrating Radar features into LiDAR-centric detection networks is not optimal. In this work, we introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects. Technically, Bi-LRFusion involves two steps: first, it enriches Radar's local features by learning important details from the LiDAR branch to alleviate the problems caused by the absence of height information and extreme sparsity; second, it combines LiDAR features with the enhanced Radar features in a unified bird's-eye-view representation. We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects. Notably, Radar data in these two datasets have different formats, which demonstrates the generalizability of our method. Codes are available at https://github.com/JessieW0806/BiLRFusion.
Hierarchical Reinforcement Learning for Modeling User Novelty-Seeking Intent in Recommender Systems
Authors: Pan Li, Yuyan Wang, Ed H. Chi, Minmin Chen
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01476
Pdf link: https://arxiv.org/pdf/2306.01476
Abstract Recommending novel content, which expands user horizons by introducing them to new interests, has been shown to improve users' long-term experience on recommendation platforms \cite{chen2021values}. Users however are not constantly looking to explore novel content. It is therefore crucial to understand their novelty-seeking intent and adjust the recommendation policy accordingly. Most existing literature models a user's propensity to choose novel content or to prefer a more diverse set of recommendations at individual interactions. Hierarchical structure, on the other hand, exists in a user's novelty-seeking intent, which is manifested as a static and intrinsic user preference for seeking novelty along with a dynamic session-based propensity. To this end, we propose a novel hierarchical reinforcement learning-based method to model the hierarchical user novelty-seeking intent, and to adapt the recommendation policy accordingly based on the extracted user novelty-seeking propensity. We further incorporate diversity and novelty-related measurement in the reward function of the hierarchical RL (HRL) agent to encourage user exploration \cite{chen2021values}. We demonstrate the benefits of explicitly modeling hierarchical user novelty-seeking intent in recommendations through extensive experiments on simulated and real-world datasets. In particular, we demonstrate that the effectiveness of our proposed hierarchical RL-based method lies in its ability to capture such hierarchically-structured intent. As a result, the proposed HRL model achieves superior performance on several public datasets, compared with state-of-art baselines.
A Feature Reuse Framework with Texture-adaptive Aggregation for Reference-based Super-Resolution
Authors: Xiaoyong Mei, Yi Yang, Ming Li, Changqin Huang, Kai Zhang, Pietro Lió
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01500
Pdf link: https://arxiv.org/pdf/2306.01500
Abstract Reference-based super-resolution (RefSR) has gained considerable success in the field of super-resolution with the addition of high-resolution reference images to reconstruct low-resolution (LR) inputs with more high-frequency details, thereby overcoming some limitations of single image super-resolution (SISR). Previous research in the field of RefSR has mostly focused on two crucial aspects. The first is accurate correspondence matching between the LR and the reference (Ref) image. The second is the effective transfer and aggregation of similar texture information from the Ref images. Nonetheless, an important detail of perceptual loss and adversarial loss has been underestimated, which has a certain adverse effect on texture transfer and reconstruction. In this study, we propose a feature reuse framework that guides the step-by-step texture reconstruction process through different stages, reducing the negative impacts of perceptual and adversarial loss. The feature reuse framework can be used for any RefSR model, and several RefSR approaches have improved their performance after being retrained using our framework. Additionally, we introduce a single image feature embedding module and a texture-adaptive aggregation module. The single image feature embedding module assists in reconstructing the features of the LR inputs itself and effectively lowers the possibility of including irrelevant textures. The texture-adaptive aggregation module dynamically perceives and aggregates texture information between the LR inputs and the Ref images using dynamic filters. This enhances the utilization of the reference texture while reducing reference misuse. The source code is available at https://github.com/Yi-Yang355/FRFSR.
One for All: Unified Workload Prediction for Dynamic Multi-tenant Edge Cloud Platforms
Authors: Shaoyuan Huang, Zheng Wang, Heng Zhang, Xiaofei Wang, Cheng Zhang, Wenyu Wang
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01507
Pdf link: https://arxiv.org/pdf/2306.01507
Abstract Workload prediction in multi-tenant edge cloud platforms (MT-ECP) is vital for efficient application deployment and resource provisioning. However, the heterogeneous application patterns, variable infrastructure performance, and frequent deployments in MT-ECP pose significant challenges for accurate and efficient workload prediction. Clustering-based methods for dynamic MT-ECP modeling often incur excessive costs due to the need to maintain numerous data clusters and models, which leads to excessive costs. Existing end-to-end time series prediction methods are challenging to provide consistent prediction performance in dynamic MT-ECP. In this paper, we propose an end-to-end framework with global pooling and static content awareness, DynEformer, to provide a unified workload prediction scheme for dynamic MT-ECP. Meticulously designed global pooling and information merging mechanisms can effectively identify and utilize global application patterns to drive local workload predictions. The integration of static content-aware mechanisms enhances model robustness in real-world scenarios. Through experiments on five real-world datasets, DynEformer achieved state-of-the-art in the dynamic scene of MT-ECP and provided a unified end-to-end prediction scheme for MT-ECP.
Network Degeneracy as an Indicator of Training Performance: Comparing Finite and Infinite Width Angle Predictions
Authors: Cameron Jakub, Mihai Nica
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.01513
Pdf link: https://arxiv.org/pdf/2306.01513
Abstract Neural networks are powerful functions with widespread use, but the theoretical behaviour of these functions is not fully understood. Creating deep neural networks by stacking many layers has achieved exceptional performance in many applications and contributed to the recent explosion of these methods. Previous works have shown that depth can exponentially increase the expressibility of the network. However, as networks get deeper and deeper, they are more susceptible to becoming degenerate. We observe this degeneracy in the sense that on initialization, inputs tend to become more and more correlated as they travel through the layers of the network. If a network has too many layers, it tends to approximate a (random) constant function, making it effectively incapable of distinguishing between inputs. This seems to affect the training of the network and cause it to perform poorly, as we empirically investigate in this paper. We use a simple algorithm that can accurately predict the level of degeneracy for any given fully connected ReLU network architecture, and demonstrate how the predicted degeneracy relates to training dynamics of the network. We also compare this prediction to predictions derived using infinite width networks.
Group channel pruning and spatial attention distilling for object detection
Authors: Yun Chu, Pu Li, Yong Bai, Zhuhua Hu, Yongqing Chen, Jiafeng Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01526
Pdf link: https://arxiv.org/pdf/2306.01526
Abstract Due to the over-parameterization of neural networks, many model compression methods based on pruning and quantization have emerged. They are remarkable in reducing the size, parameter number, and computational complexity of the model. However, most of the models compressed by such methods need the support of special hardware and software, which increases the deployment cost. Moreover, these methods are mainly used in classification tasks, and rarely directly used in detection tasks. To address these issues, for the object detection network we introduce a three-stage model compression method: dynamic sparse training, group channel pruning, and spatial attention distilling. Firstly, to select out the unimportant channels in the network and maintain a good balance between sparsity and accuracy, we put forward a dynamic sparse training method, which introduces a variable sparse rate, and the sparse rate will change with the training process of the network. Secondly, to reduce the effect of pruning on network accuracy, we propose a novel pruning method called group channel pruning. In particular, we divide the network into multiple groups according to the scales of the feature layer and the similarity of module structure in the network, and then we use different pruning thresholds to prune the channels in each group. Finally, to recover the accuracy of the pruned network, we use an improved knowledge distillation method for the pruned network. Especially, we extract spatial attention information from the feature maps of specific scales in each group as knowledge for distillation. In the experiments, we use YOLOv4 as the object detection network and PASCAL VOC as the training dataset. Our method reduces the parameters of the model by 64.7 % and the calculation by 34.9%.
Constraint-Guided Test Execution Scheduling: An Experience Report at ABB Robotics
Authors: Arnaud Gotlieb, Morten Mossige, Helge Spieker
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.01529
Pdf link: https://arxiv.org/pdf/2306.01529
Abstract Automated test execution scheduling is crucial in modern software development environments, where components are frequently updated with changes that impact their integration with hardware systems. Building test schedules, which focus on the right tests and make optimal use of the available resources, both time and hardware, under consideration of vast requirements on the selection of test cases and their assignment to certain test execution machines, is a complex optimization task. Manual solutions are time-consuming and often error-prone. Furthermore, when software and hardware components and test scripts are frequently added, removed or updated, static test execution scheduling is no longer feasible and the motivation for automation taking care of dynamic changes grows. Since 2012, our work has focused on transferring technology based on constraint programming for automating the testing of industrial robotic systems at ABB Robotics. After having successfully transferred constraint satisfaction models dedicated to test case generation, we present the results of a project called DynTest whose goal is to automate the scheduling of test execution from a large test repository, on distinct industrial robots. This paper reports on our experience and lessons learned for successfully transferring constraint-based optimization models for test execution scheduling at ABB Robotics. Our experience underlines the benefits of a close collaboration between industry and academia for both parties.
FPIM: Field-Programmable Ising Machines for Solving SAT
Authors: Thomas Jagielski (1), Rajit Manohar (1), Jaijeet Roychowdhury (2) ((1) Yale University, (2) University of California, Berkeley.)
Subjects: Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2306.01569
Pdf link: https://arxiv.org/pdf/2306.01569
Abstract On-chip analog Ising Machines (IMs) are a promising means to solve difficult combinatorial optimization problems. For scalable on-chip realizations to be practical, 1) the problem should map scalably to Ising form, 2) interconnectivity between spins should be sparse, 3) the number of bits of coupling resolution (BCR) needed for programming interconnection weights should be small, and 4) the chip should be capable of solving problems with different connection topologies. We explore these issues for the SATisfiability problem and devise FPIM, a reconfigurable on-chip analog Ising machine scheme well suited for SAT. To map SAT problems onto FPIMs, we leverage Boolean logic synthesis as a first step, but replace synthesized logic gates with Ising equivalent circuits whose analog dynamics solve SAT by minimizing the Ising Hamiltonian. We apply our approach to 2000 benchmark problems from SATLIB,demonstrating excellent scaling, together with low sparsity and low BCR that are independent of problem scale. Placement/routing reveals a very feasible requirement of less than 10 routing tracks to implement all the benchmarks, translating to an area requirement of about 10mm^2 for a programmable 1000-spin FPIM in 65nm technology.
SuperFlow: Performance Testing for Serverless Computing
Authors: Jinfeng Wen, Zhenpeng Chen, Federica Sarro, Xuanzhe Liu
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.01620
Pdf link: https://arxiv.org/pdf/2306.01620
Abstract Serverless computing is an emerging cloud computing paradigm that allows software engineers to develop cloud applications as a set of functions (called serverless functions). However, accurately obtaining the performance (i.e., response latency) of serverless functions is challenging due to the highly dynamic nature of the environment in which they run. To tackle this problem, a possible solution is to use performance testing to determine how many repetitions of a serverless function with a given input are needed to cater to the performance fluctuation. To this end, we conduct an empirical study of state-of-the-art performance testing techniques for traditional cloud applications on 65 serverless functions collected from top-tier research venues. We find that these techniques exhibit low accuracy. Therefore, we propose SuperFlow, the first performance testing approach tailored specifically for serverless computing. SuperFlow incorporates an accuracy check and a stability check to obtain accurate and reliable performance results. The evaluation demonstrates that SuperFlow provides testing results with 97.22% accuracy, 39.91 percentage points higher than the best currently available technique. We have publicly released the code and data from this study to facilitate future replication and extension.
An Adaptive Method for Weak Supervision with Drifting Data
Authors: Alessio Mazzetto, Reza Esfandiarpoor, Eli Upfal, Stephen H. Bach
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01658
Pdf link: https://arxiv.org/pdf/2306.01658
Abstract We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time, e.g., because of changes in the underlying data distribution. Due to the drift, older data could provide misleading information to infer the label of the current data point. Previous work relied on a priori assumptions on the magnitude of the drift to decide how much data to use from the past. Comparatively, our algorithm does not require any assumptions on the drift, and it adapts based on the input. In particular, at each step, our algorithm guarantees an estimation of the current accuracies of the weak supervision sources over a window of past observations that minimizes a trade-off between the error due to the variance of the estimation and the error due to the drift. Experiments on synthetic and real-world labelers show that our approach indeed adapts to the drift. Unlike fixed-window-size strategies, it dynamically chooses a window size that allows it to consistently maintain good performance.
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Authors: Md Shamim Hussain, Mohammed J. Zaki, Dharmashankar Subramanian
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01705
Pdf link: https://arxiv.org/pdf/2306.01705
Abstract Transformers use the dense self-attention mechanism which gives a lot of flexibility for long-range connectivity. Over multiple layers of a deep transformer, the number of possible connectivity patterns increases exponentially. However, very few of these contribute to the performance of the network, and even fewer are essential. We hypothesize that there are sparsely connected sub-networks within a transformer, called information pathways which can be trained independently. However, the dynamic (i.e., input-dependent) nature of these pathways makes it difficult to prune dense self-attention during training. But the overall distribution of these pathways is often predictable. We take advantage of this fact to propose Stochastically Subsampled self-Attention (SSA) - a general-purpose training strategy for transformers that can reduce both the memory and computational cost of self-attention by 4 to 8 times during training while also serving as a regularization method - improving generalization over dense training. We show that an ensemble of sub-models can be formed from the subsampled pathways within a network, which can achieve better performance than its densely attended counterpart. We perform experiments on a variety of NLP, computer vision and graph learning tasks in both generative and discriminative settings to provide empirical evidence for our claims and show the effectiveness of the proposed method.
OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection
Authors: Zhangyang Qi, Jiaqi Wang, Xiaoyang Wu, Hengshuang Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01738
Pdf link: https://arxiv.org/pdf/2306.01738
Abstract Multi-view 3D object detection is becoming popular in autonomous driving due to its high effectiveness and low cost. Most of the current state-of-the-art detectors follow the query-based bird's-eye-view (BEV) paradigm, which benefits from both BEV's strong perception power and end-to-end pipeline. Despite achieving substantial progress, existing works model objects via globally leveraging temporal and spatial information of BEV features, resulting in problems when handling the challenging complex and dynamic autonomous driving scenarios. In this paper, we proposed an Object-Centric query-BEV detector OCBEV, which can carve the temporal and spatial cues of moving targets more effectively. OCBEV comprises three designs: Object Aligned Temporal Fusion aligns the BEV feature based on ego-motion and estimated current locations of moving objects, leading to a precise instance-level feature fusion. Object Focused Multi-View Sampling samples more 3D features from an adaptive local height ranges of objects for each scene to enrich foreground information. Object Informed Query Enhancement replaces part of pre-defined decoder queries in common DETR-style decoders with positional features of objects on high-confidence locations, introducing more direct object positional priors. Extensive experimental evaluations are conducted on the challenging nuScenes dataset. Our approach achieves a state-of-the-art result, surpassing the traditional BEVFormer by 1.5 NDS points. Moreover, we have a faster convergence speed and only need half of the training iterations to get comparable performance, which further demonstrates its effectiveness.
Keyword: adaptive

AbODE: Ab Initio Antibody Design using Conjoined ODEs
Authors: Yogesh Verma, Markus Heinonen, Vikas Garg
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
Arxiv link: https://arxiv.org/abs/2306.01005
Pdf link: https://arxiv.org/pdf/2306.01005
Abstract Antibodies are Y-shaped proteins that neutralize pathogens and constitute the core of our adaptive immune system. De novo generation of new antibodies that target specific antigens holds the key to accelerating vaccine discovery. However, this co-design of the amino acid sequence and the 3D structure subsumes and accentuates some central challenges from multiple tasks, including protein folding (sequence to structure), inverse folding (structure to sequence), and docking (binding). We strive to surmount these challenges with a new generative model AbODE that extends graph PDEs to accommodate both contextual information and external interactions. Unlike existing approaches, AbODE uses a single round of full-shot decoding and elicits continuous differential attention that encapsulates and evolves with latent interactions within the antibody as well as those involving the antigen. We unravel fundamental connections between AbODE and temporal networks as well as graph-matching networks. The proposed model significantly outperforms existing methods on standard metrics across benchmarks.
Towards Fair Disentangled Online Learning for Changing Environments
Authors: Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Christan Grant, Feng Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01007
Pdf link: https://arxiv.org/pdf/2306.01007
Abstract In the problem of online learning for changing environments, data are sequentially received one after another over time, and their distribution assumptions may vary frequently. Although existing methods demonstrate the effectiveness of their learning algorithms by providing a tight bound on either dynamic regret or adaptive regret, most of them completely ignore learning with model fairness, defined as the statistical parity across different sub-population (e.g., race and gender). Another drawback is that when adapting to a new environment, an online learner needs to update model parameters with a global change, which is costly and inefficient. Inspired by the sparse mechanism shift hypothesis, we claim that changing environments in online learning can be attributed to partial changes in learned parameters that are specific to environments and the rest remain invariant to changing environments. To this end, in this paper, we propose a novel algorithm under the assumption that data collected at each time can be disentangled with two representations, an environment-invariant semantic factor and an environment-specific variation factor. The semantic factor is further used for fair prediction under a group fairness constraint. To evaluate the sequence of model parameters generated by the learner, a novel regret is proposed in which it takes a mixed form of dynamic and static regret metrics followed by a fairness-aware long-term constraint. The detailed analysis provides theoretical guarantees for loss regret and violation of cumulative fairness constraints. Empirical evaluations on real-world datasets demonstrate our proposed method sequentially outperforms baseline methods in model accuracy and fairness.
PV2TEA: Patching Visual Modality to Textual-Established Information Extraction
Authors: Hejie Cui, Rongmei Lin, Nasser Zalmout, Chenwei Zhang, Jingbo Shang, Carl Yang, Xian Li
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2306.01016
Pdf link: https://arxiv.org/pdf/2306.01016
Abstract Information extraction, e.g., attribute value extraction, has been extensively studied and formulated based only on text. However, many attributes can benefit from image-based extraction, like color, shape, pattern, among others. The visual modality has long been underutilized, mainly due to multimodal annotation difficulty. In this paper, we aim to patch the visual modality to the textual-established attribute information extractor. The cross-modality integration faces several unique challenges: (C1) images and textual descriptions are loosely paired intra-sample and inter-samples; (C2) images usually contain rich backgrounds that can mislead the prediction; (C3) weakly supervised labels from textual-established extractors are biased for multimodal training. We present PV2TEA, an encoder-decoder architecture equipped with three bias reduction schemes: (S1) Augmented label-smoothed contrast to improve the cross-modality alignment for loosely-paired image and text; (S2) Attention-pruning that adaptively distinguishes the visual foreground; (S3) Two-level neighborhood regularization that mitigates the label textual bias via reliability estimation. Empirical results on real-world e-Commerce datasets demonstrate up to 11.74% absolute (20.97% relatively) F1 increase over unimodal baselines.
Chaos persists in large-scale multi-agent learning despite adaptive learning rates
Authors: Emmanouil-Vasileios Vlatakis-Gkaragkounis, Lampros Flokas, Georgios Piliouras
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.01032
Pdf link: https://arxiv.org/pdf/2306.01032
Abstract Multi-agent learning is intrinsically harder, more unstable and unpredictable than single agent optimization. For this reason, numerous specialized heuristics and techniques have been designed towards the goal of achieving convergence to equilibria in self-play. One such celebrated approach is the use of dynamically adaptive learning rates. Although such techniques are known to allow for improved convergence guarantees in small games, it has been much harder to analyze them in more relevant settings with large populations of agents. These settings are particularly hard as recent work has established that learning with fixed rates will become chaotic given large enough populations.In this work, we show that chaos persists in large population congestion games despite using adaptive learning rates even for the ubiquitous Multiplicative Weight Updates algorithm, even in the presence of only two strategies. At a technical level, due to the non-autonomous nature of the system, our approach goes beyond conventional period-three techniques Li-Yorke by studying fundamental properties of the dynamics including invariant sets, volume expansion and turbulent sets. We complement our theoretical insights with experiments showcasing that slight variations to system parameters lead to a wide variety of unpredictable behaviors.
Extended-XRI Body Interfaces for Hyper-Connected Metaverse Environments
Authors: Jie Guan, Alexis Morris
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2306.01096
Pdf link: https://arxiv.org/pdf/2306.01096
Abstract Hybrid mixed-reality (XR) internet-of-things (IoT) research, here called XRI, aims at a strong integration between physical and virtual objects, environments, and agents wherein IoT-enabled edge devices are deployed for sensing, context understanding, networked communication and control of device actuators. Likewise, as augmented reality systems provide an immersive overlay on the environments, and virtual reality provides fully immersive environments, the merger of these domains leads to immersive smart spaces that are hyper-connected, adaptive and dynamic components that anchor the metaverse to real-world constructs. Enabling the human-in-the-loop to remain engaged and connected across these virtual-physical hybrid environments requires advances in user interaction that are multi-dimensional. This work investigates the potential to transition the user interface to the human body as an extended-reality avatar with hybrid extended-body interfaces that can interact both with the physical and virtual sides of the metaverse. It contributes: i) an overview of metaverses, XRI, and avatarization concepts, ii) a taxonomy landscape for extended XRI body interfaces, iii) an architecture and potential interactions for XRI body designs, iv) a prototype XRI body implementation based on the architecture, v) a design-science evaluation, toward enabling future design research directions.
Exploring the Versatility of Zero-Shot CLIP for Interstitial Lung Disease Classification
Authors: Cara Van Uden, Christian Bluethgen, Maayane Attias, Malgorzata Polacin, Haiwei Henry Guo, Neha Simha, Rishi Raj, Curtis Langlotz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01111
Pdf link: https://arxiv.org/pdf/2306.01111
Abstract Interstitial lung diseases (ILD) present diagnostic challenges due to their varied manifestations and overlapping imaging features. To address this, we propose a machine learning approach that utilizes CLIP, a multimodal (image and text) self-supervised model, for ILD classification. We extensively integrate zero-shot CLIP throughout our workflow, starting from the initial extraction of image patches from volumetric CT scans and proceeding to ILD classification using "patch montages". Furthermore, we investigate how domain adaptive pretraining (DAPT) CLIP with task-specific images (CT "patch montages" extracted with ILD-specific prompts for CLIP) and/or text (lung-specific sections of radiology reports) affects downstream ILD classification performance. By leveraging CLIP-extracted "patch montages" and DAPT, we achieve strong zero-shot ILD classification results, including an AUROC of 0.893, without the need for any labeled training data. This work highlights the versatility and potential of multimodal models like CLIP for medical image classification tasks where labeled data is scarce.
Physics-informed UNets for Discovering Hidden Elasticity in Heterogeneous Materials
Authors: Ali Kamali, Kaveh Laksari
Subjects: Machine Learning (cs.LG); Soft Condensed Matter (cond-mat.soft)
Arxiv link: https://arxiv.org/abs/2306.01204
Pdf link: https://arxiv.org/pdf/2306.01204
Abstract Soft biological tissues often have complex mechanical properties due to variation in structural components. In this paper, we develop a novel UNet-based neural network model for inversion in elasticity (El-UNet) to infer the spatial distributions of mechanical parameters from strain maps as input images, normal stress boundary conditions, and domain physics information. We show superior performance, both in terms of accuracy and computational cost, by El-UNet compared to fully-connected physics-informed neural networks in estimating unknown parameters and stress distributions for isotropic linear elasticity. We characterize different variations of El-UNet and propose a self-adaptive spatial loss weighting approach. To validate our inversion models, we performed various finite-element simulations of isotropic domains with heterogenous distributions of material parameters to generate synthetic data. El-UNet is faster and more accurate than the fully-connected physics-informed implementation in resolving the distribution of unknown fields. Among the tested models, the self-adaptive spatially weighted models had the most accurate reconstructions in equal computation times. The learned spatial weighting distribution visibly corresponded to regions that the unweighted models were resolving inaccurately. Our work demonstrates a computationally efficient inversion algorithm for elasticity imaging using convolutional neural networks and presents a potential fast framework for three-dimensional inverse elasticity problems that have proven unachievable through previously proposed methods.
Counting Crowds in Bad Weather
Authors: Zhi-Kai Huang, Wei-Ting Chen, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2306.01209
Pdf link: https://arxiv.org/pdf/2306.01209
Abstract Crowd counting has recently attracted significant attention in the field of computer vision due to its wide applications to image understanding. Numerous methods have been proposed and achieved state-of-the-art performance for real-world tasks. However, existing approaches do not perform well under adverse weather such as haze, rain, and snow since the visual appearances of crowds in such scenes are drastically different from those images in clear weather of typical datasets. In this paper, we propose a method for robust crowd counting in adverse weather scenarios. Instead of using a two-stage approach that involves image restoration and crowd counting modules, our model learns effective features and adaptive queries to account for large appearance variations. With these weather queries, the proposed model can learn the weather information according to the degradation of the input image and optimize with the crowd counting module simultaneously. Experimental results show that the proposed algorithm is effective in counting crowds under different weather types on benchmark datasets. The source code and trained models will be made available to the public.
Beyond Active Learning: Leveraging the Full Potential of Human Interaction via Auto-Labeling, Human Correction, and Human Verification
Authors: Nathan Beck, Krishnateja Killamsetty, Suraj Kothawade, Rishabh Iyer
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2306.01277
Pdf link: https://arxiv.org/pdf/2306.01277
Abstract Active Learning (AL) is a human-in-the-loop framework to interactively and adaptively label data instances, thereby enabling significant gains in model performance compared to random sampling. AL approaches function by selecting the hardest instances to label, often relying on notions of diversity and uncertainty. However, we believe that these current paradigms of AL do not leverage the full potential of human interaction granted by automated label suggestions. Indeed, we show that for many classification tasks and datasets, most people verifying if an automatically suggested label is correct take $3\times$ to $4\times$ less time than they do changing an incorrect suggestion to the correct label (or labeling from scratch without any suggestion). Utilizing this result, we propose CLARIFIER (aCtive LeARnIng From tIEred haRdness), an Interactive Learning framework that admits more effective use of human interaction by leveraging the reduced cost of verification. By targeting the hard (uncertain) instances with existing AL methods, the intermediate instances with a novel label suggestion scheme using submodular mutual information functions on a per-class basis, and the easy (confident) instances with highest-confidence auto-labeling, CLARIFIER can improve over the performance of existing AL approaches on multiple datasets -- particularly on those that have a large number of classes -- by almost 1.5$\times$ to 2$\times$ in terms of relative labeling cost.
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement
Authors: Long Ma, Dian Jin, Nan An, Jinyuan Liu, Xin Fan, Risheng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01343
Pdf link: https://arxiv.org/pdf/2306.01343
Abstract Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision. The mainstream learning-based methods mainly acquire the enhanced model by learning the data distribution from the specific scenes, causing poor adaptability (even failure) when meeting real-world scenarios that have never been encountered before. The main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes. To remedy this, we first explore relationships between diverse low-light scenes based on statistical analysis, i.e., the network parameters of the encoder trained in different data distributions are close. We introduce the bilevel paradigm to model the above latent correspondence from the perspective of hyperparameter optimization. A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes (i.e., freezing the encoder in the adaptation and testing phases). Further, we define a reinforced bilevel learning framework to provide a meta-initialization for scene-specific decoder to further ameliorate visual quality. Moreover, to improve the practicability, we establish a Retinex-induced architecture with adaptive denoising and apply our built learning framework to acquire its parameters by using two training losses including supervised and unsupervised forms. Extensive experimental evaluations on multiple datasets verify our adaptability and competitive performance against existing state-of-the-art works. The code and datasets will be available at https://github.com/vis-opt-group/BL.
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Authors: Borui Wan, Juntao Zhao, Chuan Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.01381
Pdf link: https://arxiv.org/pdf/2306.01381
Abstract Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as messages) across devices bring significant communication overhead for nodes with remote neighbors on other devices (marginal nodes) and unnecessary waiting time for nodes without remote neighbors (central nodes) in the training graph. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph GNN training. We stochastically quantize messages transferred across devices to lower-precision integers for communication traffic reduction and advocate communication-computation parallelization between marginal nodes and central nodes. We provide theoretical analysis to prove fast training convergence (at the rate of O(T^{-1}) with T being the total number of training epochs) and design an adaptive quantization bit-width assignment scheme for each message based on the analysis, targeting a good trade-off between training convergence and efficiency. Extensive experiments on mainstream graph datasets show that AdaQP substantially improves distributed full-graph training's throughput (up to 3.01 X) with negligible accuracy drop (at most 0.30%) or even accuracy improvement (up to 0.19%) in most cases, showing significant advantages over the state-of-the-art works.
Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors
Authors: Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, Michael R. Lyu
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.01394
Pdf link: https://arxiv.org/pdf/2306.01394
Abstract Although the dynamic type system of Python facilitates the developers in writing Python programs, it also brings type errors at run-time. There exist rule-based approaches for automatically repairing Python type errors. The approaches can generate accurate patches but they require domain experts to design patch synthesis rules and suffer from low template coverage of real-world type errors. Learning-based approaches alleviate the manual efforts in designing patch synthesis rules. Among the learning-based approaches, the prompt-based approach which leverages the knowledge base of code pre-trained models via pre-defined prompts, obtains state-of-the-art performance in general program repair tasks. However, such prompts are manually defined and do not involve any specific clues for repairing Python type errors, resulting in limited effectiveness. How to automatically improve prompts with the domain knowledge for type error repair is challenging yet under-explored. In this paper, we present TypeFix, a novel prompt-based approach with fix templates incorporated for repairing Python type errors. TypeFix first mines generalized fix templates via a novel hierarchical clustering algorithm. The identified fix templates indicate the common edit patterns and contexts of existing type error fixes. TypeFix then generates code prompts for code pre-trained models by employing the generalized fix templates as domain knowledge, in which the masks are adaptively located for each type error instead of being pre-determined. Experiments on two benchmarks, including BugsInPy and TypeBugs, show that TypeFix successfully repairs 26 and 55 type errors, outperforming the best baseline approach by 9 and 14, respectively. Besides, the proposed fix template mining approach can cover 75% of developers' patches in both benchmarks, increasing the best rule-based approach PyTER by more than 30%.
Adaptive Attractors: A Defense Strategy against ML Adversarial Collusion Attacks
Authors: Jiyi Zhang, Han Fang, Ee-Chien Chang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2306.01400
Pdf link: https://arxiv.org/pdf/2306.01400
Abstract In the seller-buyer setting on machine learning models, the seller generates different copies based on the original model and distributes them to different buyers, such that adversarial samples generated on one buyer's copy would likely not work on other copies. A known approach achieves this using attractor-based rewriter which injects different attractors to different copies. This induces different adversarial regions in different copies, making adversarial samples generated on one copy not replicable on others. In this paper, we focus on a scenario where multiple malicious buyers collude to attack. We first give two formulations and conduct empirical studies to analyze effectiveness of collusion attack under different assumptions on the attacker's capabilities and properties of the attractors. We observe that existing attractor-based methods do not effectively mislead the colluders in the sense that adversarial samples found are influenced more by the original model instead of the attractors as number of colluders increases. Based on this observation, we propose using adaptive attractors whose weight is guided by a U-shape curve to cover the shortfalls. Experimentation results show that when using our approach, the attack success rate of a collusion attack converges to around 15% even when lots of copies are applied for collusion. In contrast, when using the existing attractor-based rewriter with fixed weight, the attack success rate increases linearly with the number of copies used for collusion.
Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine Learning
Authors: Federico Quin, Danny Weyns
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.01407
Pdf link: https://arxiv.org/pdf/2306.01407
Abstract A/B testing is a common approach used in industry to facilitate innovation through the introduction of new features or the modification of existing software. Traditionally, A/B tests are conducted sequentially, with each experiment targeting the entire population of the corresponding application. This approach can be time-consuming and costly, particularly when the experiments are not relevant to the entire population. To tackle these problems, we introduce a new self-adaptive approach called AutoPABS, short for Automated Pipelines of A/B tests using Self-adaptation, that (1) automates the execution of pipelines of A/B tests, and (2) supports a split of the population in the pipeline to divide the population into multiple A/B tests according to user-based criteria, leveraging machine learning. We started the evaluation with a small survey to probe the appraisal of the notation and infrastructure of AutoPABS. Then we performed a series of tests to measure the gains obtained by applying a population split in an automated A/B testing pipeline, using an extension of the SEAByTE artifact. The survey results show that the participants express the usefulness of automating A/B testing pipelines and population split. The tests show that automatically executing pipelines of A/B tests with a population split accelerates the identification of statistically significant results of the parallel executed experiments of A/B tests compared to a traditional approach that performs the experiments sequentially.
Algorithmic realization of the solution to the sign conflict problem for hanging nodes on hp-hexahedral Nédélec elements
Authors: Sebastian Kinnewig, Thomas Wick, Sven Beuchler
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.01416
Pdf link: https://arxiv.org/pdf/2306.01416
Abstract While working with N\'ed\'elec elements on adaptively refined meshes with hanging nodes, the orientation of the hanging edges and faces must be taken into account. Indeed, for non-orientable meshes, there was no solution and implementation available to date. The problem statement and corresponding algorithms are described in great detail. As a model problem, the time-harmonic Maxwell's equations are adopted because N\'ed\'elec elements constitute their natural discretization. The implementation is performed within the finite element library deal.II. The algorithms and implementation are demonstrated through four numerical examples on different uniformly and adaptively refined meshes.
Leveraging the Triple Exponential Moving Average for Fast-Adaptive Moment Estimation
Authors: Roi Peleg, Roi Weiss, Assaf Hoogi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01423
Pdf link: https://arxiv.org/pdf/2306.01423
Abstract Network optimization is a crucial step in the field of deep learning, as it directly affects the performance of models in various domains such as computer vision. Despite the numerous optimizers that have been developed over the years, the current methods are still limited in their ability to accurately and quickly identify gradient trends, which can lead to sub-optimal network performance. In this paper, we propose a novel deep optimizer called Fast-Adaptive Moment Estimation (FAME), which for the first time estimates gradient moments using a Triple Exponential Moving Average (TEMA). Incorporating TEMA into the optimization process provides richer and more accurate information on data changes and trends, as compared to the standard Exponential Moving Average used in essentially all current leading adaptive optimization methods. Our proposed FAME optimizer has been extensively validated through a wide range of benchmarks, including CIFAR-10, CIFAR-100, PASCAL-VOC, MS-COCO, and Cityscapes, using 14 different learning architectures, six optimizers, and various vision tasks, including detection, classification and semantic understanding. The results demonstrate that our FAME optimizer outperforms other leading optimizers in terms of both robustness and accuracy.
A Feature Reuse Framework with Texture-adaptive Aggregation for Reference-based Super-Resolution
Authors: Xiaoyong Mei, Yi Yang, Ming Li, Changqin Huang, Kai Zhang, Pietro Lió
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01500
Pdf link: https://arxiv.org/pdf/2306.01500
Abstract Reference-based super-resolution (RefSR) has gained considerable success in the field of super-resolution with the addition of high-resolution reference images to reconstruct low-resolution (LR) inputs with more high-frequency details, thereby overcoming some limitations of single image super-resolution (SISR). Previous research in the field of RefSR has mostly focused on two crucial aspects. The first is accurate correspondence matching between the LR and the reference (Ref) image. The second is the effective transfer and aggregation of similar texture information from the Ref images. Nonetheless, an important detail of perceptual loss and adversarial loss has been underestimated, which has a certain adverse effect on texture transfer and reconstruction. In this study, we propose a feature reuse framework that guides the step-by-step texture reconstruction process through different stages, reducing the negative impacts of perceptual and adversarial loss. The feature reuse framework can be used for any RefSR model, and several RefSR approaches have improved their performance after being retrained using our framework. Additionally, we introduce a single image feature embedding module and a texture-adaptive aggregation module. The single image feature embedding module assists in reconstructing the features of the LR inputs itself and effectively lowers the possibility of including irrelevant textures. The texture-adaptive aggregation module dynamically perceives and aggregates texture information between the LR inputs and the Ref images using dynamic filters. This enhances the utilization of the reference texture while reducing reference misuse. The source code is available at https://github.com/Yi-Yang355/FRFSR.
Towards Source-free Domain Adaptive Semantic Segmentation via Importance-aware and Prototype-contrast Learning
Authors: Yihong Cao, Hui Zhang, Xiao Lu, Zheng Xiao, Kailun Yang, Yaonan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2306.01598
Pdf link: https://arxiv.org/pdf/2306.01598
Abstract Domain adaptive semantic segmentation enables robust pixel-wise understanding in real-world driving scenes. Source-free domain adaptation, as a more practical technique, addresses the concerns of data privacy and storage limitations in typical unsupervised domain adaptation methods. It utilizes a well-trained source model and unlabeled target data to achieve adaptation in the target domain. However, in the absence of source data and target labels, current solutions cannot sufficiently reduce the impact of domain shift and fully leverage the information from the target data. In this paper, we propose an end-to-end source-free domain adaptation semantic segmentation method via Importance-Aware and Prototype-Contrast (IAPC) learning. The proposed IAPC framework effectively extracts domain-invariant knowledge from the well-trained source model and learns domain-specific knowledge from the unlabeled target domain. Specifically, considering the problem of domain shift in the prediction of the target domain by the source model, we put forward an importance-aware mechanism for the biased target prediction probability distribution to extract domain-invariant knowledge from the source model. We further introduce a prototype-contrast strategy, which includes a prototype-symmetric cross-entropy loss and a prototype-enhanced cross-entropy loss, to learn target intra-domain knowledge without relying on labels. A comprehensive variety of experiments on two domain adaptive semantic segmentation benchmarks demonstrates that the proposed end-to-end IAPC solution outperforms existing state-of-the-art methods. Code will be made publicly available at https://github.com/yihong-97/Source-free_IAPC.
An Adaptive Method for Weak Supervision with Drifting Data
Authors: Alessio Mazzetto, Reza Esfandiarpoor, Eli Upfal, Stephen H. Bach
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.01658
Pdf link: https://arxiv.org/pdf/2306.01658
Abstract We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time, e.g., because of changes in the underlying data distribution. Due to the drift, older data could provide misleading information to infer the label of the current data point. Previous work relied on a priori assumptions on the magnitude of the drift to decide how much data to use from the past. Comparatively, our algorithm does not require any assumptions on the drift, and it adapts based on the input. In particular, at each step, our algorithm guarantees an estimation of the current accuracies of the weak supervision sources over a window of past observations that minimizes a trade-off between the error due to the variance of the estimation and the error due to the drift. Experiments on synthetic and real-world labelers show that our approach indeed adapts to the drift. Unlike fixed-window-size strategies, it dynamically chooses a window size that allows it to consistently maintain good performance.
OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection
Authors: Zhangyang Qi, Jiaqi Wang, Xiaoyang Wu, Hengshuang Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.01738
Pdf link: https://arxiv.org/pdf/2306.01738
Abstract Multi-view 3D object detection is becoming popular in autonomous driving due to its high effectiveness and low cost. Most of the current state-of-the-art detectors follow the query-based bird's-eye-view (BEV) paradigm, which benefits from both BEV's strong perception power and end-to-end pipeline. Despite achieving substantial progress, existing works model objects via globally leveraging temporal and spatial information of BEV features, resulting in problems when handling the challenging complex and dynamic autonomous driving scenarios. In this paper, we proposed an Object-Centric query-BEV detector OCBEV, which can carve the temporal and spatial cues of moving targets more effectively. OCBEV comprises three designs: Object Aligned Temporal Fusion aligns the BEV feature based on ego-motion and estimated current locations of moving objects, leading to a precise instance-level feature fusion. Object Focused Multi-View Sampling samples more 3D features from an adaptive local height ranges of objects for each scene to enrich foreground information. Object Informed Query Enhancement replaces part of pre-defined decoder queries in common DETR-style decoders with positional features of objects on high-confidence locations, introducing more direct object positional priors. Extensive experimental evaluations are conducted on the challenging nuScenes dataset. Our approach achieves a state-of-the-art result, surpassing the traditional BEVFormer by 1.5 NDS points. Moreover, we have a faster convergence speed and only need half of the training iterations to get comparable performance, which further demonstrates its effectiveness.
Keyword: quantization

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding
Authors: Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01076
Pdf link: https://arxiv.org/pdf/2306.01076
Abstract Fine-tuned transformer models have shown superior performances in many natural language tasks. However, the large model size prohibits deploying high-performance transformer models on resource-constrained devices. This paper proposes a quantization-aware tensor-compressed training approach to reduce the model size, arithmetic operations, and ultimately runtime latency of transformer-based models. We compress the embedding and linear layers of transformers into small low-rank tensor cores, which significantly reduces model parameters. A quantization-aware training with learnable scale factors is used to further obtain low-precision representations of the tensor-compressed models. The developed approach can be used for both end-to-end training and distillation-based training. To improve the convergence, a layer-by-layer distillation is applied to distill a quantized and tensor-compressed student model from a pre-trained transformer. The performance is demonstrated in two natural language understanding tasks, showing up to $63\times$ compression ratio, little accuracy loss and remarkable inference and training speedup.
Towards Learning Discrete Representations via Self-Supervision for Wearables-Based Human Activity Recognition
Authors: Harish Haresamudram, Irfan Essa, Thomas Ploetz
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.01108
Pdf link: https://arxiv.org/pdf/2306.01108
Abstract Human activity recognition (HAR) in wearable computing is typically based on direct processing of sensor data. Sensor readings are translated into representations, either derived through dedicated preprocessing, or integrated into end-to-end learning. Independent of their origin, for the vast majority of contemporary HAR, those representations are typically continuous in nature. That has not always been the case. In the early days of HAR, discretization approaches have been explored - primarily motivated by the desire to minimize computational requirements, but also with a view on applications beyond mere recognition, such as, activity discovery, fingerprinting, or large-scale search. Those traditional discretization approaches, however, suffer from substantial loss in precision and resolution in the resulting representations with detrimental effects on downstream tasks. Times have changed and in this paper we propose a return to discretized representations. We adopt and apply recent advancements in Vector Quantization (VQ) to wearables applications, which enables us to directly learn a mapping between short spans of sensor data and a codebook of vectors, resulting in recognition performance that is generally on par with their contemporary, continuous counterparts - sometimes surpassing them. Therefore, this work presents a proof-of-concept for demonstrating how effective discrete representations can be derived, enabling applications beyond mere activity classification but also opening up the field to advanced tools for the analysis of symbolic sequences, as they are known, for example, from domains such as natural language processing. Based on an extensive experimental evaluation on a suite of wearables-based benchmark HAR tasks, we demonstrate the potential of our learned discretization scheme and discuss how discretized sensor data analysis can lead to substantial changes in HAR.
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Authors: Borui Wan, Juntao Zhao, Chuan Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.01381
Pdf link: https://arxiv.org/pdf/2306.01381
Abstract Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as messages) across devices bring significant communication overhead for nodes with remote neighbors on other devices (marginal nodes) and unnecessary waiting time for nodes without remote neighbors (central nodes) in the training graph. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph GNN training. We stochastically quantize messages transferred across devices to lower-precision integers for communication traffic reduction and advocate communication-computation parallelization between marginal nodes and central nodes. We provide theoretical analysis to prove fast training convergence (at the rate of O(T^{-1}) with T being the total number of training epochs) and design an adaptive quantization bit-width assignment scheme for each message based on the analysis, targeting a good trade-off between training convergence and efficiency. Extensive experiments on mainstream graph datasets show that AdaQP substantially improves distributed full-graph training's throughput (up to 3.01 X) with negligible accuracy drop (at most 0.30%) or even accuracy improvement (up to 0.19%) in most cases, showing significant advantages over the state-of-the-art works.
Extremely large-scale Array Systems: Near-Filed Codebook Design and Performance Analysis
Authors: Feng Zheng
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2306.01458
Pdf link: https://arxiv.org/pdf/2306.01458
Abstract Extremely large-scale Array (ELAA) promises to deliver ultra-high data rates with more antenna elements. Meanwhile, the increase of antenna elements leads to a wider realm of near-field, which challenges the traditional design of codebooks. In this paper, we propose novel codebook design schemes which provide better quantized correlation with limited overhead. First, we analyze the correlation between codewords and channel vectors uniform linear array (ULA) and uniform planar array (UPA). The correlation formula for the ULA channel can be expressed as an elliptic function, and the correlation formula for the UPA channel can be represented as an ellipsoid formula. Based on the analysis, we design a uniform sampling codebook to maximize the minimum quantized correlation and a dislocation ULA codebook to reduce the number of quantized bits further. Besides, we give a better sampling interval for the codebook of the UPA channel. Numerical results demonstrate the appealing advantages of the proposed codebook over existing methods in quantization bit number and quantization accuracy.
Group channel pruning and spatial attention distilling for object detection
Authors: Yun Chu, Pu Li, Yong Bai, Zhuhua Hu, Yongqing Chen, Jiafeng Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.01526
Pdf link: https://arxiv.org/pdf/2306.01526
Abstract Due to the over-parameterization of neural networks, many model compression methods based on pruning and quantization have emerged. They are remarkable in reducing the size, parameter number, and computational complexity of the model. However, most of the models compressed by such methods need the support of special hardware and software, which increases the deployment cost. Moreover, these methods are mainly used in classification tasks, and rarely directly used in detection tasks. To address these issues, for the object detection network we introduce a three-stage model compression method: dynamic sparse training, group channel pruning, and spatial attention distilling. Firstly, to select out the unimportant channels in the network and maintain a good balance between sparsity and accuracy, we put forward a dynamic sparse training method, which introduces a variable sparse rate, and the sparse rate will change with the training process of the network. Secondly, to reduce the effect of pruning on network accuracy, we propose a novel pruning method called group channel pruning. In particular, we divide the network into multiple groups according to the scales of the feature layer and the similarity of module structure in the network, and then we use different pruning thresholds to prune the channels in each group. Finally, to recover the accuracy of the pruned network, we use an improved knowledge distillation method for the pruned network. Especially, we extract spatial attention information from the feature maps of specific scales in each group as knowledge for distillation. In the experiments, we use YOLOv4 as the object detection network and PASCAL VOC as the training dataset. Our method reduces the parameters of the model by 64.7 % and the calculation by 34.9%.

A-suozhang / GetArxivDaily

New submissions for Mon, 5 Jun 23 #72

Keyword: efficient

Towards Fair Disentangled Online Learning for Changing Environments

How to Estimate Model Transferability of Pre-Trained Speech Models?

ITR: A grammar-based graph compressor supporting fast neighborhood queries

Improving the Robustness of Summarization Systems with Dual Augmentation

Large-Batch, Neural Multi-Objective Bayesian Optimization

A Neural RDE-based model for solving path-dependent PDEs

Numerical Investigation of the Fractional Oscillation Equations under the Context of Variable Order Caputo Fractional Derivative via Fractional Order Bernstein Wavelets

The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks

BAMF-SLAM: Bundle Adjusted Multi-Fisheye Visual-Inertial SLAM Using Recurrent Field Transforms

A Yee-like finite-element scheme for Maxwell's equations on unstructured grids

Labeled Interleaving Distance for Reeb Graphs

Physics-informed UNets for Discovering Hidden Elasticity in Heterogeneous Materials

CSMAAFL: Client Scheduling and Model Aggregation in Asynchronous Federated Learning

A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits

Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

A stable imaging functional for anisotropic periodic media in electromagnetic inverse scattering

Adaptive Robotic Information Gathering via Non-Stationary Gaussian Processes

Self Contrastive Learning for Session-based Recommendation

Efficient volumetric mapping of multi-scale environments using wavelet-based compression

Nonholonomic Motion Planning as Efficient as Piano Mover's

Energy-efficient Rate Splitting for MIMO STAR-RIS-assisted Broadcast Channels with I/Q Imbalance

Resource-Efficient Federated Hyperdimensional Computing

The Maximum Matrix Contraction Problem

Energy-Efficient UAV-Assisted IoT Data Collection via TSP-Based Solution Space Reduction

DWT-CompCNN: Deep Image Classification Network for High Throughput JPEG 2000 Compressed Documents

Granular Gym: High Performance Simulation for Robotic Tasks with Granular Materials

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Chemical Property-Guided Neural Networks for Naphtha Composition Prediction

Matrix Inference in Growing Rank Regimes

Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

Multi-Objective Population Based Training

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

Towards Robust FastSpeech 2 by Modelling Residual Multimodality

dugMatting: Decomposed-Uncertainty-Guided Matting

One for All: Unified Workload Prediction for Dynamic Multi-tenant Edge Cloud Platforms

Does it pay to optimize AUC?

CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Strong tractability for multivariate integration in a subspace of the Wiener algebra

Blockchain Model for Environment/Infrastructure Monitoring in Cloud-Enabled High-Altitude Platform Systems

HomE: Homography-Equivariant Video Representation Learning

Quantifying synergy and redundancy in multiplex networks

Federated Multi-Sequence Stochastic Approximation with Local Hypergradient Estimation

Fair multilingual vandalism detection system for Wikipedia

On the Coverage of Cognitive mmWave Networks with Directional Sensing and Communication

Towards In-context Scene Understanding

Domain Decomposition Methods for the Monge-Ampère equation

Balancing Exploration and Exploitation: Disentangled $β$-CVAE in De Novo Drug Design

Keyword: faster

Physics-informed UNets for Discovering Hidden Elasticity in Heterogeneous Materials

Towards Sustainable Learning: Coresets for Data-efficient Deep Learning

Nonholonomic Motion Planning as Efficient as Piano Mover's

Hyperparameters in Reinforcement Learning and How To Tune Them

Resolving Interference When Merging Models

Distilling Efficient Language-Specific Models for Cross-Lingual Transfer

OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection

Keyword: mobile

How Should We Support Designing Privacy-Friendly Apps for Children? Using a Research through Design Process to Understand Developers' Needs and Challenges

Optimal Path Planning in Distinct Topo-Geometric Classes using Neighborhood-augmented Graph and its Application to Path Planning for a Tethered Robot in 3D

SelFLoc: Selective Feature Fusion for Large-scale Point Cloud-based Place Recognition

Keyword: pruning

PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

Robust low-rank training via approximate orthonormal constraints

Group channel pruning and spatial attention distilling for object detection

Keyword: diffusion

DiffLoad: Uncertainty Quantification in Load Forecasting with Diffusion Model

Addressing Discrepancies in Semantic and Visual Alignment in Neural Networks

Generative AI for Product Design: Getting the Right Design and the Design Right

Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Quantifying Sample Anonymity in Score-Based Generative Models with Adversarial Fingerprinting

PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models

Influence Maximization with Fairness at Scale (Extended Version)

DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

Video Colorization with Pre-trained Text-to-Image Diffusion Models

Keyword: dynamic

Towards Fair Disentangled Online Learning for Changing Environments