Abstract
The dominant text generation models compose the output by sequentially selecting words from a fixed vocabulary. In this paper, we formulate text generation as progressively copying text segments (e.g., words or phrases) from an existing text collection. We compute the contextualized representations of meaningful text segments and index them using efficient vector search toolkits. The task of text generation is then decomposed into a series of copy-and-paste operations: at each time step, we seek suitable text spans from the text collection rather than selecting from a standalone vocabulary. Experiments on the standard language modeling benchmark (WikiText-103) show that our approach achieves better generation quality according to both automatic and human evaluations. Besides, its inference efficiency is comparable to token-level autoregressive models thanks to the reduction of decoding steps. We also show that our approach allows for effective domain adaptation by simply switching to domain-specific text collection without extra training. Finally, we observe that our approach attains additional performance gains by simply scaling up to larger text collections, again without further training.\footnote{Our source codes are publicly available at \url{https://github.com/gmftbyGMFTBY/Copyisallyouneed}.}
Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section
Authors: Hongyi Zheng, Yixin Zhu, Lavender Yao Jiang, Kyunghyun Cho, Eric Karl Oermann
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Recent advances in large language models have led to renewed interest in natural language processing in healthcare using the free text of clinical notes. One distinguishing characteristic of clinical notes is their long time span over multiple long documents. The unique structure of clinical notes creates a new design choice: when the context length for a language model predictor is limited, which part of clinical notes should we choose as the input? Existing studies either choose the inputs with domain knowledge or simply truncate them. We propose a framework to analyze the sections with high predictive power. Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large. Our findings suggest that a carefully selected sampling function could enable more efficient information extraction from clinical notes.
An Exploration of the Impact of Mapping Style and Device Roadmap on Simulated ReRAM Architectures for Neuromorphic Computing
Abstract
This paper investigates the relationship between mapping style and device roadmap in Resistive Random Access Memory (ReRAM) architectures for neuromorphic computing. The study leverages simulations using DNN+NeuroSim to evaluate the impact of different parameters on chip performance, including latency, energy consumption, and overall system efficiency. The results demonstrate that novel mapping techniques and a high-performance (HP) device roadmap are optimal if energy and speed considerations are weighted equally. This is because as the study demonstrates, HP devices provide a latency cut that outsizes the energy cost. Additionally, adopting novel mapping in the device cuts latency by nearly 30% while being slightly more energy efficient. The findings highlight the importance of considering mapping style and device roadmap in optimizing ReRAM architectures for neuromorphic computing, which may contribute to advancing the practical implementation of ReRAM in computational systems.
Vertex-based Networks to Accelerate Path Planning Algorithms
Abstract
Path planning plays a crucial role in various autonomy applications, and RRT is one of the leading solutions in this field. In this paper, we propose the utilization of vertex-based networks to enhance the sampling process of RRT, leading to more efficient path planning. Our approach focuses on critical vertices along the optimal paths, which provide essential yet sparser abstractions of the paths. We employ focal loss to address the associated data imbalance issue, and explore different masking configurations to determine practical tradeoffs in system performance. Through experiments conducted on randomly generated floor maps, our solutions demonstrate significant speed improvements, achieving over a 400% enhancement compared to the baseline model.
Rician likelihood loss for quantitative MRI using self-supervised deep learning
Authors: Christopher S. Parker, Anna Schroder, Sean C. Epstein, James Cole, Daniel C. Alexander, Hui Zhang
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
Abstract
Purpose: Previous quantitative MR imaging studies using self-supervised deep learning have reported biased parameter estimates at low SNR. Such systematic errors arise from the choice of Mean Squared Error (MSE) loss function for network training, which is incompatible with Rician-distributed MR magnitude signals. To address this issue, we introduce the negative log Rician likelihood (NLR) loss. Methods: A numerically stable and accurate implementation of the NLR loss was developed to estimate quantitative parameters of the apparent diffusion coefficient (ADC) model and intra-voxel incoherent motion (IVIM) model. Parameter estimation accuracy, precision and overall error were evaluated in terms of bias, variance and root mean squared error and compared against the MSE loss over a range of SNRs (5 - 30). Results: Networks trained with NLR loss show higher estimation accuracy than MSE for the ADC and IVIM diffusion coefficients as SNR decreases, with minimal loss of precision or total error. At high effective SNR (high SNR and small diffusion coefficients), both losses show comparable accuracy and precision for all parameters of both models. Conclusion: The proposed NLR loss is numerically stable and accurate across the full range of tested SNRs and improves parameter estimation accuracy of diffusion coefficients using self-supervised deep learning. We expect the development to benefit quantitative MR imaging techniques broadly, enabling more accurate parameter estimation from noisy data.
State-Robust Observability Measures for Sensor Selection in Nonlinear Dynamic Systems
Authors: Mohamad H. Kazma, Sebastian A. Nugroho, Aleksandar Haber, Ahmad F. Taha
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Abstract
This paper explores the problem of selecting sensor nodes for a general class of nonlinear dynamical networks. In particular, we study the problem by utilizing altered definitions of observability and open-loop lifted observers. The approach is performed by discretizing the system's dynamics using the implicit Runge-Kutta method and by introducing a state-averaged observability measure. The observability measure is computed for a number of perturbed initial states in the vicinity of the system's true initial state. The sensor node selection problem is revealed to retain the submodular and modular properties of the original problem. This allows the problem to be solved efficiently using a greedy algorithm with a guaranteed performance bound while showing an augmented robustness to unknown or uncertain initial conditions. The validity of this approach is numerically demonstrated on a $H{2}/O{2}$ combustion reaction network.
A Scenario-Based Functional Testing Approach to Improving DNN Performance
Authors: Hong Zhu, Thi Minh Tam Tran, Aduen Benjumea, Andrew Bradley
Abstract
This paper proposes a scenario-based functional testing approach for enhancing the performance of machine learning (ML) applications. The proposed method is an iterative process that starts with testing the ML model on various scenarios to identify areas of weakness. It follows by a further testing on the suspected weak scenarios and statistically evaluate the model's performance on the scenarios to confirm the diagnosis. Once the diagnosis of weak scenarios is confirmed by test results, the treatment of the model is performed by retraining the model using a transfer learning technique with the original model as the base and applying a set of training data specifically targeting the treated scenarios plus a subset of training data selected at random from the original train dataset to prevent the so-call catastrophic forgetting effect. Finally, after the treatment, the model is assessed and evaluated again by testing on the treated scenarios as well as other scenarios to check if the treatment is effective and no side effect caused. The paper reports a case study with a real ML deep neural network (DNN) model, which is the perception system of an autonomous racing car. It is demonstrated that the method is effective in the sense that DNN model's performance can be improved. It provides an efficient method of enhancing ML model's performance with much less human and compute resource than retrain from scratch.
More Than React: Investigating The Role of Emoji Reaction in GitHub Pull Requests
Abstract
Open source software development has become more social and collaborative, evident GitHub. Since 2016, GitHub started to support more informal methods such as emoji reactions, with the goal to reduce commenting noise when reviewing any code changes to a repository. From a code review context, the extent to which emoji reactions facilitate a more efficient review process is unknown. We conduct an empirical study to mine 1,850 active repositories across seven popular languages to analyze 365,811 Pull Requests (PRs) for their emoji reactions against the review time, first-time contributors, comment intentions, and the consistency of the sentiments. Answering these four research perspectives, we first find that the number of emoji reactions has a significant correlation with the review time. Second, our results show that a PR submitted by a first-time contributor is less likely to receive emoji reactions. Third, the results reveal that the comments with an intention of information giving, are more likely to receive an emoji reaction. Fourth, we observe that only a small proportion of sentiments are not consistent between comments and emoji reactions, i.e., with 11.8% of instances being identified. In these cases, the prevalent reason is when reviewers cheer up authors that admit to a mistake, i.e., acknowledge a mistake. Apart from reducing commenting noise, our work suggests that emoji reactions play a positive role in facilitating collaborative communication during the review process.
Risk-Constrained Control of Mean-Field Linear Quadratic Systems
Authors: Masoud Roudneshin, Saba Sanami, Amir G. Aghdam
Abstract
The risk-neutral LQR controller is optimal for stochastic linear dynamical systems. However, the classical optimal controller performs inefficiently in the presence of low-probability yet statistically significant (risky) events. The present research focuses on infinite-horizon risk-constrained linear quadratic regulators in a mean-field setting. We address the risk constraint by bounding the cumulative one-stage variance of the state penalty of all players. It is shown that the optimal controller is affine in the state of each player with an additive term that controls the risk constraint. In addition, we propose a solution independent of the number of players. Finally, simulations are presented to verify the theoretical findings.
Parallelising Glauber dynamics
Authors: Holden Lee
Subjects: Data Structures and Algorithms (cs.DS); Probability (math.PR)
Abstract
For distributions over discrete product spaces $\prod_{i=1}^n \Omegai'$, Glauber dynamics is a Markov chain that at each step, resamples a random coordinate conditioned on the other coordinates. We show that $k$-Glauber dynamics, which resamples a random subset of $k$ coordinates, mixes $k$ times faster in $\chi^2$-divergence, and assuming approximate tensorization of entropy, mixes $k$ times faster in KL-divergence. We apply this to Ising models $\mu{J,h}(x)\propto \exp(\frac1 2\left\langle x,Jx \right\rangle + \langle h,x\rangle)$ with $|J|<1-c$ (the regime where fast mixing is known), where we show that we can implement each step of $\widetilde O(n/|J|_F)$-Glauber dynamics efficiently with a parallel algorithm, resulting in a parallel algorithm with running time $\widetilde O(|J|_F) = \widetilde O(\sqrt n)$.
SLSSNN: High energy efficiency spike-train level spiking neural networks with spatio-temporal conversion
Authors: Changqing Xu, Yi Liu, Yintang Yang
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Brain-inspired spiking neuron networks (SNNs) have attracted widespread research interest due to their low power features, high biological plausibility, and strong spatiotemporal information processing capability. Although adopting a surrogate gradient (SG) makes the non-differentiability SNN trainable, achieving comparable accuracy for ANNs and keeping low-power features simultaneously is still tricky. In this paper, we proposed an energy-efficient spike-train level spiking neural network (SLSSNN) with low computational cost and high accuracy. In the SLSSNN, spatio-temporal conversion blocks (STCBs) are applied to replace the convolutional and ReLU layers to keep the low power features of SNNs and improve accuracy. However, SLSSNN cannot adopt backpropagation algorithms directly due to the non-differentiability nature of spike trains. We proposed a suitable learning rule for SLSSNNs by deducing the equivalent gradient of STCB. We evaluate the proposed SLSSNN on static and neuromorphic datasets, including Fashion-Mnist, Cifar10, Cifar100, TinyImageNet, and DVS-Cifar10. The experiment results show that our proposed SLSSNN outperforms the state-of-the-art accuracy on nearly all datasets, using fewer time steps and being highly energy-efficient.
A Surrogate Data Assimilation Model for the Estimation of Dynamical System in a Limited Area
Authors: Wei Kang, Liang Xu, Hong Zhou
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
We propose a novel learning-based surrogate data assimilation (DA) model for efficient state estimation in a limited area. Our model employs a feedforward neural network for online computation, eliminating the need for integrating high-dimensional limited-area models. This approach offers significant computational advantages over traditional DA algorithms. Furthermore, our method avoids the requirement of lateral boundary conditions for the limited-area model in both online and offline computations. The design of our surrogate DA model is built upon a robust theoretical framework that leverages two fundamental concepts: observability and effective region. The concept of observability enables us to quantitatively determine the optimal amount of observation data necessary for accurate DA. Meanwhile, the concept of effective region substantially reduces the computational burden associated with computing observability and generating training data.
A $(3/2 + \varepsilon)$-Approximation for Multiple TSP with a Variable Number of Depots
Authors: Max Deppert, Matthias Kaul, Matthias Mnich
Abstract
One of the most studied extensions of the famous Traveling Salesperson Problem (TSP) is the {\sc Multiple TSP}: a set of $m\geq 1$ salespersons collectively traverses a set of $n$ cities by $m$ non-trivial tours, to minimize the total length of their tours. This problem can also be considered to be a variant of {\sc Uncapacitated Vehicle Routing} where the objective function is the sum of all tour lengths. When all $m$ tours start from a single common \emph{depot} $v_0$, then the metric {\sc Multiple TSP} can be approximated equally well as the standard metric TSP, as shown by Frieze (1983). The {\sc Multiple TSP} becomes significantly harder to approximate when there is a \emph{set} $D$ of $d \geq 1$ depots that form the starting and end points of the $m$ tours. For this case only a $(2-1/d)$-approximation in polynomial time is known, as well as a $3/2$-approximation for \emph{constant} $d$ which requires a prohibitive run time of $n^{\Theta(d)}$ (Xu and Rodrigues, \emph{INFORMS J. Comput.}, 2015). A recent work of Traub, Vygen and Zenklusen (STOC 2020) gives another approximation algorithm for {\sc Multiple TSP} running in time $n^{\Theta(d)}$ and reducing the problem to approximating TSP. In this paper we overcome the $n^{\Theta(d)}$ time barrier: we give the first efficient approximation algorithm for {\sc Multiple TSP} with a \emph{variable} number $d$ of depots that yields a better-than-2 approximation. Our algorithm runs in time $(1/\varepsilon)^{\mathcal O(d\log d)}\cdot n^{\mathcal O(1)}$, and produces a $(3/2+\varepsilon)$-approximation with constant probability. For the graphic case, we obtain a deterministic $3/2$-approximation in time $2^d\cdot n^{\mathcal O(1)}$.ithm for metric {\sc Multiple TSP} with run time $n^{\Theta(d)}$, which reduces the problem to approximating metric TSP.
Analytical Investigation of Two Benchmark Resource Allocation Algorithms for LTE-V2V
Authors: A. Bazzi, A. Zanella, G. Cecchini, B. M. Masini
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Short-range wireless technologies will enable vehicles to communicate and coordinate their actions, thus improving people's safety and traffic efficiency. Whereas IEEE 802.11p (and related standards) had been the only practical solution for years, in 2016 a new option was introduced with Release 14 of long term evolution (LTE), which includes new features to enable direct vehicle-to-vehicle (V2V) communications. LTE-V2V promises a more efficient use of the channel compared to IEEE 802.11p thanks to an improved PHY layer and the use of orthogonal resources at the MAC layer. In LTE-V2V, a key role is played by the resource allocation algorithm and increasing efforts are being made to design new solutions to optimize the spatial reuse.In this context, an important aspect still little studied, is therefore that of identifying references that allow: 1) to have a perception of the space in which the resource allocation algorithms move; and 2) to verify the performance of new proposals. In this work, we focus on a highway scenario and identify two algorithms to be used as a minimum and maximum reference in terms of the packet reception probability (PRP). The PRP is derived as a function of various parameters that describe the scenario and settings, from the application to the physical layer. Results, obtained both in a simplified Poisson point process scenario and with realistic traffic traces, show that the PRP varies considerably with different algorithms and that there is room for the improvement of current solutions.
MaxSR: Image Super-Resolution Using Improved MaxViT
Authors: Bincheng Yang, Gangshan Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
While transformer models have been demonstrated to be effective for natural language processing tasks and high-level vision tasks, only a few attempts have been made to use powerful transformer models for single image super-resolution. Because transformer models have powerful representation capacity and the in-built self-attention mechanisms in transformer models help to leverage self-similarity prior in input low-resolution image to improve performance for single image super-resolution, we present a single image super-resolution model based on recent hybrid vision transformer of MaxViT, named as MaxSR. MaxSR consists of four parts, a shallow feature extraction block, multiple cascaded adaptive MaxViT blocks to extract deep hierarchical features and model global self-similarity from low-level features efficiently, a hierarchical feature fusion block, and finally a reconstruction block. The key component of MaxSR, i.e., adaptive MaxViT block, is based on MaxViT block which mixes MBConv with squeeze-and-excitation, block attention and grid attention. In order to achieve better global modelling of self-similarity in input low-resolution image, we improve block attention and grid attention in MaxViT block to adaptive block attention and adaptive grid attention which do self-attention inside each window across all grids and each grid across all windows respectively in the most efficient way. We instantiate proposed model for classical single image super-resolution (MaxSR) and lightweight single image super-resolution (MaxSR-light). Experiments show that our MaxSR and MaxSR-light establish new state-of-the-art performance efficiently.
Numerical evaluation of oscillatory integrals via automated steepest descent contour deformation
Abstract
Steepest descent methods combining complex contour deformation with numerical quadrature provide an efficient and accurate approach for the evaluation of highly oscillatory integrals. However, unless the phase function governing the oscillation is particularly simple, their application requires a significant amount of a priori analysis and expert user input, to determine the appropriate contour deformation, and to deal with the non-uniformity in the accuracy of standard quadrature techniques associated with the coalescence of stationary points (saddle points) with each other, or with the endpoints of the original integration contour. In this paper we present a novel algorithm for the numerical evaluation of oscillatory integrals with general polynomial phase functions, which automates the contour deformation process and avoids the difficulties typically encountered with coalescing stationary points and endpoints. The inputs to the algorithm are simply the phase and amplitude functions, the endpoints and orientation of the original integration contour, and a small number of numerical parameters. By a series of numerical experiments we demonstrate that the algorithm is accurate and efficient over a large range of frequencies, even for examples with a large number of coalescing stationary points and with endpoints at infinity. As a particular application, we use our algorithm to evaluate cuspoid canonical integrals from scattering theory. A Matlab implementation of the algorithm is made available and is called PathFinder.
Abstract
Wheeler automata were introduced in 2017 as a tool to generalize existing indexing and compression techniques based on the Burrows-Wheeler transform. Intuitively, an automaton is said to be Wheeler if there exists a total order on its states reflecting the co-lexicographic order of the strings labeling the automaton's paths; this property makes it possible to represent the automaton's topology in a constant number of bits per transition, as well as efficiently solving pattern matching queries on its accepted regular language. After their introduction, Wheeler automata have been the subject of a prolific line of research, both from the algorithmic and language-theoretic points of view. A recurring issue faced in these studies is the lack of large datasets of Wheeler automata on which the developed algorithms and theories could be tested. One possible way to overcome this issue is to generate random Wheeler automata. Motivated by this observation, in this paper we initiate the theoretical study of random Wheeler automata, focusing on the deterministic case (Wheeler DFAs -- WDFAs). We start by extending the Erd\H{o}s-R\'enyi random graph model to WDFAs, and proceed by providing an algorithm generating uniform WDFAs according to this model. Our algorithm generates a uniform WDFA with $n$ states, $m$ transitions, and alphabet's cardinality $\sigma$ in $O(m)$ expected time ($O(m\log m)$ worst-case time w.h.p.) and constant working space for all alphabets of size $\sigma \le m/\ln m$. As a by-product, we also give formulas for the number of distinct WDFAs and obtain that $ n\sigma + (n - \sigma) \log \sigma$ bits are necessary and sufficient to encode a WDFA with $n$ states and alphabet of size $\sigma$, up to an additive $\Theta(n)$ term. We present an implementation of our algorithm and show that it is extremely fast in practice, with a throughput of over 8 million transitions per second.
Sampling-Priors-Augmented Deep Unfolding Network for Robust Video Compressive Sensing
Authors: Yuhao Huang, Gangrong Qu, Youran Ge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Video Compressed Sensing (VCS) aims to reconstruct multiple frames from one single captured measurement, thus achieving high-speed scene recording with a low-frame-rate sensor. Although there have been impressive advances in VCS recently, those state-of-the-art (SOTA) methods also significantly increase model complexity and suffer from poor generality and robustness, which means that those networks need to be retrained to accommodate the new system. Such limitations hinder the real-time imaging and practical deployment of models. In this work, we propose a Sampling-Priors-Augmented Deep Unfolding Network (SPA-DUN) for efficient and robust VCS reconstruction. Under the optimization-inspired deep unfolding framework, a lightweight and efficient U-net is exploited to downsize the model while improving overall performance. Moreover, the prior knowledge from the sampling model is utilized to dynamically modulate the network features to enable single SPA-DUN to handle arbitrary sampling settings, augmenting interpretability and generality. Extensive experiments on both simulation and real datasets demonstrate that SPA-DUN is not only applicable for various sampling settings with one single model but also achieves SOTA performance with incredible efficiency.
3D Shape-Based Myocardial Infarction Prediction Using Point Cloud Classification Networks
Abstract
Myocardial infarction (MI) is one of the most prevalent cardiovascular diseases with associated clinical decision-making typically based on single-valued imaging biomarkers. However, such metrics only approximate the complex 3D structure and physiology of the heart and hence hinder a better understanding and prediction of MI outcomes. In this work, we investigate the utility of complete 3D cardiac shapes in the form of point clouds for an improved detection of MI events. To this end, we propose a fully automatic multi-step pipeline consisting of a 3D cardiac surface reconstruction step followed by a point cloud classification network. Our method utilizes recent advances in geometric deep learning on point clouds to enable direct and efficient multi-scale learning on high-resolution surface models of the cardiac anatomy. We evaluate our approach on 1068 UK Biobank subjects for the tasks of prevalent MI detection and incident MI prediction and find improvements of ~13% and ~5% respectively over clinical benchmarks. Furthermore, we analyze the role of each ventricle and cardiac phase for 3D shape-based MI detection and conduct a visual analysis of the morphological and physiological patterns typically associated with MI outcomes.
HEAL-SWIN: A Vision Transformer On The Sphere
Authors: Oscar Carlsson, Jan E. Gerken, Hampus Linander, Heiner Spieß, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, resulting in a one-dimensional representation of the spherical data with minimal computational overhead. We demonstrate the superior performance of our model for semantic segmentation and depth regression tasks on both synthetic and real automotive datasets. Our code is available at https://github.com/JanEGerken/HEAL-SWIN.
Distributed Planning for Rigid Robot Formations using Consensus on the Transformation of a Base Configuration
Authors: Jeppe Heini Mikkelsen, Matteo Fumagalli
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
This paper presents a novel planning method that achieves navigation of multi-robot formations in cluttered environments, while maintaining the formation throughout the robots motion. The method utilises a decentralised approach to find feasible formation parameters that guarantees formation constraints for rigid formations. The method proves to be computationally efficient, making it relevant for reactive planning and control of multi-robot systems formation. The method has been tested in a simulation environment to prove feasibility and run-time efficiency.
Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy
Abstract
Data-poisoning based backdoor attacks aim to insert backdoor into models by manipulating training datasets without controlling the training process of the target model. Existing attack methods mainly focus on designing triggers or fusion strategies between triggers and benign samples. However, they often randomly select samples to be poisoned, disregarding the varying importance of each poisoning sample in terms of backdoor injection. A recent selection strategy filters a fixed-size poisoning sample pool by recording forgetting events, but it fails to consider the remaining samples outside the pool from a global perspective. Moreover, computing forgetting events requires significant additional computing resources. Therefore, how to efficiently and effectively select poisoning samples from the entire dataset is an urgent problem in backdoor attacks.To address it, firstly, we introduce a poisoning mask into the regular backdoor training loss. We suppose that a backdoored model training with hard poisoning samples has a more backdoor effect on easy ones, which can be implemented by hindering the normal training process (\ie, maximizing loss \wrt mask). To further integrate it with normal training process, we then propose a learnable poisoning sample selection strategy to learn the mask together with the model parameters through a min-max optimization.Specifically, the outer loop aims to achieve the backdoor attack goal by minimizing the loss based on the selected samples, while the inner loop selects hard poisoning samples that impede this goal by maximizing the loss. After several rounds of adversarial training, we finally select effective poisoning samples with high contribution. Extensive experiments on benchmark datasets demonstrate the effectiveness and efficiency of our approach in boosting backdoor attack performance.
Fully Coupled Forced Response Analysis of Nonlinear Turbine Blade Vibrations in the Frequency Domain
Authors: Christian Berthold, Johann Gross, Christian Frey, Malte Krack
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
For the first time, a fully-coupled Harmonic Balance method is developed for the forced response of turbomachinery blades. The method is applied to a state-of-the-art model of a turbine bladed disk with interlocked shrouds subjected to wake-induced loading. The recurrent opening and closing of the pre-loaded shroud contact causes a softening effect, leading to turning points in the amplitude-frequency curve near resonance. Therefore, the coupled solver is embedded into a numerical path continuation framework. Two variants are developed: the coupled continuation of the solution path, and the coupled re-iteration of selected solution points. While the re-iteration variant is slightly more costly per solution point, it has the important advantage that it can be run completely in parallel, which substantially reduces the wall clock time. It is shown that wake- and vibration-induced flow fields do not linearly superimpose, leading to a severe underestimation of the resonant vibration level by the influence-coefficient-based state-of-the-art methods (which rely on this linearity assumption).
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
Authors: Maria del Rio-Chanona, Nadzeya Laurentsyeva, Johannes Wachs
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract
Large language models like ChatGPT efficiently provide users with information about various topics, presenting a potential substitute for searching the web and asking people for help online. But since users interact privately with the model, these models may drastically reduce the amount of publicly available human-generated data and knowledge resources. This substitution can present a significant problem in securing training data for future models. In this work, we investigate how the release of ChatGPT changed human-generated open data on the web by analyzing the activity on Stack Overflow, the leading online Q\&A platform for computer programming. We find that relative to its Russian and Chinese counterparts, where access to ChatGPT is limited, and to similar forums for mathematics, where ChatGPT is less capable, activity on Stack Overflow significantly decreased. A difference-in-differences model estimates a 16\% decrease in weekly posts on Stack Overflow. This effect increases in magnitude over time, and is larger for posts related to the most widely used programming languages. Posts made after ChatGPT get similar voting scores than before, suggesting that ChatGPT is not merely displacing duplicate or low-quality content. These results suggest that more users are adopting large language models to answer questions and they are better substitutes for Stack Overflow for languages for which they have more training data. Using models like ChatGPT may be more efficient for solving certain programming problems, but its widespread adoption and the resulting shift away from public exchange on the web will limit the open data people and models can learn from in the future.
Strategic Budget Selection in a Competitive Autobidding World
Authors: Yiding Feng, Brendan Lucier, Aleksandrs Slivkins
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
Abstract
We study a game played between advertisers in an online ad platform. The platform sells ad impressions by first-price auction and provides autobidding algorithms that optimize bids on each advertiser's behalf. Each advertiser strategically declares a budget constraint (and possibly a maximum bid) to their autobidder. The chosen constraints define an "inner" budget-pacing game for the autobidders, who compete to maximize the total value received subject to the constraints. Advertiser payoffs in the constraint-choosing "metagame" are determined by the equilibrium reached by the autobidders. Advertisers only specify budgets and linear values to their autobidders, but their true preferences can be more general: we assume only that they have weakly decreasing marginal value for clicks and weakly increasing marginal disutility for spending money. Our main result is that despite this gap between general preferences and simple autobidder constraints, the allocations at equilibrium are approximately efficient. Specifically, at any pure Nash equilibrium of the metagame, the resulting allocation obtains at least half of the liquid welfare of any allocation and this bound is tight. We also obtain a 4-approximation for any mixed Nash equilibrium, and this result extends also to Bayes-Nash equilibria. These results rely on the power to declare budgets: if advertisers can specify only a (linear) value per click but not a budget constraint, the approximation factor at equilibrium can be as bad as linear in the number of advertisers.
Rank Your Summaries: Enhancing Bengali Text Summarization via Ranking-based Approach
Authors: G. M. Shahariar, Tonmoy Talukder, Rafin Alam Khan Sotez, Md. Tanvir Rouf Shawon
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
With the increasing need for text summarization techniques that are both efficient and accurate, it becomes crucial to explore avenues that enhance the quality and precision of pre-trained models specifically tailored for summarizing Bengali texts. When it comes to text summarization tasks, there are numerous pre-trained transformer models at one's disposal. Consequently, it becomes quite a challenge to discern the most informative and relevant summary for a given text among the various options generated by these pre-trained summarization models. This paper aims to identify the most accurate and informative summary for a given text by utilizing a simple but effective ranking-based approach that compares the output of four different pre-trained Bengali text summarization models. The process begins by carrying out preprocessing of the input text that involves eliminating unnecessary elements such as special characters and punctuation marks. Next, we utilize four pre-trained summarization models to generate summaries, followed by applying a text ranking algorithm to identify the most suitable summary. Ultimately, the summary with the highest ranking score is chosen as the final one. To evaluate the effectiveness of this approach, the generated summaries are compared against human-annotated summaries using standard NLG metrics such as BLEU, ROUGE, BERTScore, WIL, WER, and METEOR. Experimental results suggest that by leveraging the strengths of each pre-trained transformer model and combining them using a ranking-based approach, our methodology significantly improves the accuracy and effectiveness of the Bengali text summarization.
SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation
Authors: Dohyun Kim, Yeseung Kim, Jaehwi Jang, Minjae Song, Woojin Choi, Daehyung Park
Abstract
The spoken language serves as an accessible and efficient interface, enabling non-experts and disabled users to interact with complex assistant robots. However, accurately grounding language utterances gives a significant challenge due to the acoustic variability in speakers' voices and environmental noise. In this work, we propose a novel speech-scene graph grounding network (SGGNet$^2$) that robustly grounds spoken utterances by leveraging the acoustic similarity between correctly recognized and misrecognized words obtained from automatic speech recognition (ASR) systems. To incorporate the acoustic similarity, we extend our previous grounding model, the scene-graph-based grounding network (SGGNet), with the ASR model from NVIDIA NeMo. We accomplish this by feeding the latent vector of speech pronunciations into the BERT-based grounding network within SGGNet. We evaluate the effectiveness of using latent vectors of speech commands in grounding through qualitative and quantitative studies. We also demonstrate the capability of SGGNet$^2$ in a speech-based navigation task using a real quadruped robot, RBQ-3, from Rainbow Robotics.
Global sensitivity analysis in the limited data setting with application to char combustion
Authors: Dongjin Lee, Elle Lavichant, Boris Kramer
Abstract
In uncertainty quantification, variance-based global sensitivity analysis quantitatively determines the effect of each input random variable on the output by partitioning the total output variance into contributions from each input. However, computing conditional expectations can be prohibitively costly when working with expensive-to-evaluate models. Surrogate models can accelerate this, yet their accuracy depends on the quality and quantity of training data, which is expensive to generate (experimentally or computationally) for complex engineering systems. Thus, methods that work with limited data are desirable. We propose a diffeomorphic modulation under observable response preserving homotopy (D-MORPH) regression to train a polynomial dimensional decomposition surrogate of the output that minimizes the number of training data. The new method first computes a sparse Lasso solution and uses it to define the cost function. A subsequent D-MORPH regression minimizes the difference between the D-MORPH and Lasso solution. The resulting D-MORPH surrogate is more robust to input variations and more accurate with limited training data. We illustrate the accuracy and computational efficiency of the new surrogate for global sensitivity analysis using mathematical functions and an expensive-to-simulate model of char combustion. The new method is highly efficient, requiring only 15% of the training data compared to conventional regression.
A novel family of finite automata for recognizing and learning $ω$-regular languages
Authors: Yong Li, Sven Schewe, Qiyi Tang
Subjects: Formal Languages and Automata Theory (cs.FL)
Abstract
Families of DFAs (FDFAs) have recently been introduced as a new representation of $\omega$-regular languages. They target ultimately periodic words, with acceptors revolving around accepting some representation $u\cdot v^\omega$. Three canonical FDFAs have been suggested, called periodic, syntactic, and recurrent. We propose a fourth one, limit FDFAs, which can be exponentially coarser than periodic FDFAs and are more succinct than syntactic FDFAs, while they are incomparable (and dual to) recurrent FDFAs. We show that limit FDFAs can be easily used to check not only whether {\omega}-languages are regular, but also whether they are accepted by deterministic B\"uchi automata. We also show that canonical forms can be left behind in applications: the limit and recurrent FDFAs can complement each other nicely, and it may be a good way forward to use a combination of both. Using this observation as a starting point, we explore making more efficient use of Myhill-Nerode's right congruences in aggressively increasing the number of don't-care cases in order to obtain smaller progress automata. In pursuit of this goal, we gain succinctness, but pay a high price by losing constructiveness.
BehAVExplor: Behavior Diversity Guided Testing for Autonomous Driving Systems
Abstract
Testing Autonomous Driving Systems (ADSs) is a critical task for ensuring the reliability and safety of autonomous vehicles. Existing methods mainly focus on searching for safety violations while the diversity of the generated test cases is ignored, which may generate many redundant test cases and failures. Such redundant failures can reduce testing performance and increase failure analysis costs. In this paper, we present a novel behavior-guided fuzzing technique (BehAVExplor) to explore the different behaviors of the ego vehicle (i.e., the vehicle controlled by the ADS under test) and detect diverse violations. Specifically, we design an efficient unsupervised model, called BehaviorMiner, to characterize the behavior of the ego vehicle. BehaviorMiner extracts the temporal features from the given scenarios and performs a clustering-based abstraction to group behaviors with similar features into abstract states. A new test case will be added to the seed corpus if it triggers new behaviors (e.g., cover new abstract states). Due to the potential conflict between the behavior diversity and the general violation feedback, we further propose an energy mechanism to guide the seed selection and the mutation. The energy of a seed quantifies how good it is. We evaluated BehAVExplor on Apollo, an industrial-level ADS, and LGSVL simulation environment. Empirical evaluation results show that BehAVExplor can effectively find more diverse violations than the state-of-the-art.
TALL: Thumbnail Layout for Deepfake Video Detection
Authors: Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, Ran He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The growing threats of deepfakes to society and cybersecurity have raised enormous public concerns, and increasing efforts have been devoted to this critical topic of deepfake video detection. Existing video methods achieve good performance but are computationally intensive. This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. Specifically, consecutive frames are masked in a fixed position in each frame to improve generalization, then resized to sub-images and rearranged into a pre-defined layout as the thumbnail. TALL is model-agnostic and extremely simple by only modifying a few lines of code. Inspired by the success of vision transformers, we incorporate TALL into Swin Transformer, forming an efficient and effective method TALL-Swin. Extensive experiments on intra-dataset and cross-dataset validate the validity and superiority of TALL and SOTA TALL-Swin. TALL-Swin achieves 90.79$\%$ AUC on the challenging cross-dataset task, FaceForensics++ $\to$ Celeb-DF. The code is available at https://github.com/rainy-xu/TALL4Deepfake.
MGit: A Model Versioning and Management System
Authors: Wei Hao, Daniel Mendoza, Rafael da Silva, Deepak Narayanan, Amar Phanishaye
Abstract
Models derived from other models are extremely common in machine learning (ML) today. For example, transfer learning is used to create task-specific models from "pre-trained" models through finetuning. This has led to an ecosystem where models are related to each other, sharing structure and often even parameter values. However, it is hard to manage these model derivatives: the storage overhead of storing all derived models quickly becomes onerous, prompting users to get rid of intermediate models that might be useful for further analysis. Additionally, undesired behaviors in models are hard to track down (e.g., is a bug inherited from an upstream model?). In this paper, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on model derivatives. MGit introduces a lineage graph that records provenance and versioning information between models, optimizations to efficiently store model parameters, as well as abstractions over this lineage graph that facilitate relevant testing, updating and collaboration functionality. MGit is able to reduce the lineage graph's storage footprint by up to 7x and automatically update downstream models in response to updates to upstream models.
Keyword: faster
Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar
Authors: Runwei Guan, Shanliang Yao, Xiaohui Zhu, Ka Lok Man, Eng Gee Lim, Jeremy Smith, Yong Yue, Yutao Yue
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Current perception models for different tasks usually exist in modular forms on Unmanned Surface Vehicles (USVs), which infer extremely slowly in parallel on edge devices, causing the asynchrony between perception results and USV position, and leading to error decisions of autonomous navigation. Compared with Unmanned Ground Vehicles (UGVs), the robust perception of USVs develops relatively slowly. Moreover, most current multi-task perception models are huge in parameters, slow in inference and not scalable. Oriented on this, we propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar. Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation. Besides, models in Achelous family, with less than around 5 million parameters, achieve about 18 FPS on an NVIDIA Jetson AGX Xavier, 11 FPS faster than HybridNets, and exceed YOLOX-Tiny and Segformer-B0 on our collected dataset about 5 mAP$_{\text{50-95}}$ and 0.7 mIoU, especially under situations of adverse weather, dark environments and camera failure. To our knowledge, Achelous is the first comprehensive panoptic perception framework combining vision-level and point-cloud-level tasks for water-surface perception. To promote the development of the intelligent transportation community, we release our codes in \url{https://github.com/GuanRunwei/Achelous}.
Parallelising Glauber dynamics
Authors: Holden Lee
Subjects: Data Structures and Algorithms (cs.DS); Probability (math.PR)
Abstract
For distributions over discrete product spaces $\prod_{i=1}^n \Omegai'$, Glauber dynamics is a Markov chain that at each step, resamples a random coordinate conditioned on the other coordinates. We show that $k$-Glauber dynamics, which resamples a random subset of $k$ coordinates, mixes $k$ times faster in $\chi^2$-divergence, and assuming approximate tensorization of entropy, mixes $k$ times faster in KL-divergence. We apply this to Ising models $\mu{J,h}(x)\propto \exp(\frac1 2\left\langle x,Jx \right\rangle + \langle h,x\rangle)$ with $|J|<1-c$ (the regime where fast mixing is known), where we show that we can implement each step of $\widetilde O(n/|J|_F)$-Glauber dynamics efficiently with a parallel algorithm, resulting in a parallel algorithm with running time $\widetilde O(|J|_F) = \widetilde O(\sqrt n)$.
Improving BERT with Hybrid Pooling Network and Drop Mask
Abstract
Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks. Prior research found that BERT captures a rich hierarchy of linguistic information at different layers. However, the vanilla BERT uses the same self-attention mechanism for each layer to model the different contextual features. In this paper, we propose a HybridBERT model which combines self-attention and pooling networks to encode different contextual features in each layer. Additionally, we propose a simple DropMask method to address the mismatch between pre-training and fine-tuning caused by excessive use of special mask tokens during Masked Language Modeling pre-training. Experiments show that HybridBERT outperforms BERT in pre-training with lower loss, faster training speed (8% relative), lower memory cost (13% relative), and also in transfer learning with 1.5% relative higher accuracies on downstream tasks. Additionally, DropMask improves accuracies of BERT on downstream tasks across various masking rates.
Keyword: mobile
Adaptive Coding and Modulation Aided Mobile Relaying for Millimeter-Wave Flying Ad-Hoc Networks
Authors: Jiankang Zhang, Sheng Chen, Wei Koong Chai, Lajos Hanzo
Abstract
The emerging drone swarms are capable of carrying out sophisticated tasks in support of demanding Internet-of-Things (IoT) applications by synergistically working together. However, the target area may be out of the coverage of the ground station and it may be impractical to deploy a large number of drones in the target area due to cost, electromagnetic interference and flight-safety regulations. By exploiting the innate \emph{agility} and \emph{mobility} of unmanned aerial vehicles (UAVs), we conceive a mobile relaying-assisted drone swarm network architecture, which is capable of extending the coverage of the ground station and enhancing the effective end-to-end throughput. Explicitly, a swarm of drones forms a data-collecting drone swarm (DCDS) designed for sensing and collecting data with the aid of their mounted cameras and/or sensors, and a powerful relay-UAV (RUAV) acts as a mobile relay for conveying data between the DCDS and a ground station (GS). Given a time period, in order to maximize the data delivered whilst minimizing the delay imposed, we harness an $\epsilon$-multiple objective genetic algorithm ($\epsilon$-MOGA) assisted Pareto-optimization scheme. Our simulation results demonstrate that the proposed mobile relaying is capable of delivering more data. As specific examples investigated in our simulations, our mobile relaying-assisted drone swarm network is capable of delivering $45.38\%$ more data than the benchmark solutions, when a stationary relay is available, and it is capable of delivering $26.86\%$ more data than the benchmark solutions when no stationary relay is available.
Reconfigurable Intelligent Surface Assisted Free Space Optical Information and Power Transfer
Authors: Wen Fang, Wen Chen, Qingqing Wu, Kunlun Wang, Shunqing Zhang, Qingwen Liu, Jun Li
Abstract
Free space optical (FSO) transmission has emerged as a key candidate technology for 6G to expand new spectrum and improve network capacity due to its advantages of large bandwidth, low electromagnetic interference, and high energy efficiency. Resonant beam operating in the infrared band utilizes spatially separated laser cavities to enable safe and mobile high-power energy and high-rate information transmission but is limited by line-of-sight (LOS) channel. In this paper, we propose a reconfigurable intelligent surface (RIS) assisted resonant beam simultaneous wireless information and power transfer (SWIPT) system and establish an optical field propagation model to analyze the channel state information (CSI), in which LOS obstruction can be detected sensitively and non-line-of-sight (NLOS) transmission can be realized by changing the phased of resonant beam in RIS. Numerical results demonstrate that, apart from the transmission distance, the NLOS performance depends on both the horizontal and vertical positions of RIS. The maximum NLOS energy efficiency can achieve 55% within a transfer distance of 10m, a translation distance of $\pm$4mm, and rotation angle of $\pm$50{\deg}.
Secure Short-Packet Communications via UAV-Enabled Mobile Relaying: Joint Resource Optimization and 3D Trajectory Design
Authors: Milad Tatar Mamaghani, Xiangyun Zhou, Nan Yang, A. Lee Swindlehurst
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Short-packet communication (SPC) and unmanned aerial vehicles (UAVs) are anticipated to play crucial roles in the development of 5G-and-beyond wireless networks and the Internet of Things (IoT). In this paper, we propose a secure SPC system, where a UAV serves as a mobile decode-and-forward (DF) relay, periodically receiving and relaying small data packets from a remote IoT device to its receiver in two hops with strict latency requirements, in the presence of an eavesdropper. This system requires careful optimization of important design parameters, such as the coding blocklengths of both hops, transmit powers, and UAV's trajectory. While the overall optimization problem is nonconvex, we tackle it by applying a block successive convex approximation (BSCA) approach to divide the original problem into three subproblems and solve them separately. Then, an overall iterative algorithm is proposed to obtain the final design with guaranteed convergence. Our proposed low-complexity algorithm incorporates 3D trajectory design and resource management to optimize the effective average secrecy throughput of the communication system over the course of UAV-relay's mission. Simulation results demonstrate significant performance improvements compared to various benchmark schemes and provide useful design insights on the coding blocklengths and transmit powers along the trajectory of the UAV.
A Tutorial on Extremely Large-Scale MIMO for 6G: Fundamentals, Signal Processing, and Applications
Authors: Zhe Wang, Jiayi Zhang, Hongyang Du, Dusit Niyato, Shuguang Cui, Bo Ai, Mérouane Debbah, Khaled B. Letaief, H. Vincent Poor
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Extremely large-scale multiple-input-multiple-output (XL-MIMO), which offers vast spatial degrees of freedom, has emerged as a potentially pivotal enabling technology for the sixth generation (6G) of wireless mobile networks. With its growing significance, both opportunities and challenges are concurrently manifesting. This paper presents a comprehensive survey of research on XL-MIMO wireless systems. In particular, we introduce four XL-MIMO hardware architectures: uniform linear array (ULA)-based XL-MIMO, uniform planar array (UPA)-based XL-MIMO utilizing either patch antennas or point antennas, and continuous aperture (CAP)-based XL-MIMO. We comprehensively analyze and discuss their characteristics and interrelationships. Following this, we examine exact and approximate near-field channel models for XL-MIMO. Given the distinct electromagnetic properties of near-field communications, we present a range of channel models to demonstrate the benefits of XL-MIMO. We further motivate and discuss low-complexity signal processing schemes to promote the practical implementation of XL-MIMO. Furthermore, we explore the interplay between XL-MIMO and other emergent 6G technologies. Finally, we outline several compelling research directions for future XL-MIMO wireless communication systems.
TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling
Authors: Ke Deng, Zhiyuan He, Hao Zhang, Haohan Lin, Desheng Wang
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
In future 6G Mobile Edge Computing (MEC), autopilot systems require the capability of processing multimodal data with strong interdependencies. However, traditional heuristic algorithms are inadequate for real-time scheduling due to their requirement for multiple iterations to derive the optimal scheme. We propose a novel TSNet-SAC based on Transformer, that utilizes heuristic algorithms solely to guide the training of TSNet. Additionally, a Sliding Augment Component (SAC) is introduced to enhance the robustness and resolve algorithm defects. Furthermore, the Extender component is designed to handle multi-scale training data and provide network scalability, enabling TSNet to adapt to different access scenarios. Simulation demonstrates that TSNet-SAC outperforms existing networks in accuracy and robustness, achieving superior scheduling-making latency compared to heuristic algorithms.
Keyword: pruning
Local elimination in the traveling salesman problem
Authors: William Cook, Keld Helsgaun, Stefan Hougardy, Rasmus T. Schroeder
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Optimization and Control (math.OC)
Abstract
Hougardy and Schroeder (WG 2014) proposed a combinatorial technique for pruning the search space in the traveling salesman problem, establishing that, for a given instance, certain edges cannot be present in any optimal tour. We describe an implementation of their technique, employing an exact TSP solver to locate k-opt moves in the elimination process. In our computational study, we combine LP reduced-cost elimination together with the new combinatorial algorithm. We report results on a set of geometric instances, with the number of points n ranging from 3,038 up to 115,475. The test set includes all TSPLIB instances having at least 3,000 points, together with 250 randomly generated instances, each with 10,000 points, and three currently unsolved instances having 100,000 or more points. In all but two of the test instances, the complete-graph edge sets were reduced to under 3n edges. For the three large unsolved instances, repeated runs of the elimination process reduced the graphs to under 2.5n edges.
Learning Sparse Neural Networks with Identity Layers
Abstract
The sparsity of Deep Neural Networks is well investigated to maximize the performance and reduce the size of overparameterized networks as possible. Existing methods focus on pruning parameters in the training process by using thresholds and metrics. Meanwhile, feature similarity between different layers has not been discussed sufficiently before, which could be rigorously proved to be highly correlated to the network sparsity in this paper. Inspired by interlayer feature similarity in overparameterized models, we investigate the intrinsic link between network sparsity and interlayer feature similarity. Specifically, we prove that reducing interlayer feature similarity based on Centered Kernel Alignment (CKA) improves the sparsity of the network by using information bottleneck theory. Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity. In other words, layers of our sparse network tend to have their own identity compared to each other. Experimentally, we plug the proposed CKA-SR into the training process of sparse network training methods and find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity. Code is included in the supplementary materials.
Structured Pruning of Neural Networks for Constraints Learning
Authors: Matteo Cacciola, Antonio Frangioni, Andrea Lodi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract
In recent years, the integration of Machine Learning (ML) models with Operation Research (OR) tools has gained popularity across diverse applications, including cancer treatment, algorithmic configuration, and chemical process optimization. In this domain, the combination of ML and OR often relies on representing the ML model output using Mixed Integer Programming (MIP) formulations. Numerous studies in the literature have developed such formulations for many ML predictors, with a particular emphasis on Artificial Neural Networks (ANNs) due to their significant interest in many applications. However, ANNs frequently contain a large number of parameters, resulting in MIP formulations that are impractical to solve, thereby impeding scalability. In fact, the ML community has already introduced several techniques to reduce the parameter count of ANNs without compromising their performance, since the substantial size of modern ANNs presents challenges for ML applications as it significantly impacts computational efforts during training and necessitates significant memory resources for storage. In this paper, we showcase the effectiveness of pruning, one of these techniques, when applied to ANNs prior to their integration into MIPs. By pruning the ANN, we achieve significant improvements in the speed of the solution process. We discuss why pruning is more suitable in this context compared to other ML compression techniques, and we identify the most appropriate pruning strategies. To highlight the potential of this approach, we conduct experiments using feed-forward neural networks with multiple layers to construct adversarial examples. Our results demonstrate that pruning offers remarkable reductions in solution times without hindering the quality of the final decision, enabling the resolution of previously unsolvable instances.
Keyword: diffusion
Neuro-symbolic Empowered Denoising Diffusion Probabilistic Models for Real-time Anomaly Detection in Industry 4.0
Authors: Luigi Capogrosso, Alessio Mascolini, Federico Girella, Geri Skenderi, Sebastiano Gaiardelli, Nicola Dall'Ora, Francesco Ponzio, Enrico Fraccaroli, Santa Di Cataldo, Sara Vinco, Enrico Macii, Franco Fummi, Marco Cristani
Abstract
Industry 4.0 involves the integration of digital technologies, such as IoT, Big Data, and AI, into manufacturing and industrial processes to increase efficiency and productivity. As these technologies become more interconnected and interdependent, Industry 4.0 systems become more complex, which brings the difficulty of identifying and stopping anomalies that may cause disturbances in the manufacturing process. This paper aims to propose a diffusion-based model for real-time anomaly prediction in Industry 4.0 processes. Using a neuro-symbolic approach, we integrate industrial ontologies in the model, thereby adding formal knowledge on smart manufacturing. Finally, we propose a simple yet effective way of distilling diffusion models through Random Fourier Features for deployment on an embedded system for direct integration into the manufacturing process. To the best of our knowledge, this approach has never been explored before.
Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
Abstract
We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler. From a theoretical standpoint, we show that this directed generator can effectively learn and sample from the reward-conditioned data distribution. Additionally, our model is capable of recovering the latent subspace representation of data. Moreover, we establish that the model generates a new population that moves closer to a user-specified target reward value, where the optimality gap aligns with the off-policy bandit regret in the feature subspace. The improvement in rewards obtained is influenced by the interplay between the strength of the reward signal, the distribution shift, and the cost of off-support extrapolation. We provide empirical results to validate our theory and highlight the relationship between the strength of extrapolation and the quality of generated samples.
Rician likelihood loss for quantitative MRI using self-supervised deep learning
Authors: Christopher S. Parker, Anna Schroder, Sean C. Epstein, James Cole, Daniel C. Alexander, Hui Zhang
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
Abstract
Purpose: Previous quantitative MR imaging studies using self-supervised deep learning have reported biased parameter estimates at low SNR. Such systematic errors arise from the choice of Mean Squared Error (MSE) loss function for network training, which is incompatible with Rician-distributed MR magnitude signals. To address this issue, we introduce the negative log Rician likelihood (NLR) loss. Methods: A numerically stable and accurate implementation of the NLR loss was developed to estimate quantitative parameters of the apparent diffusion coefficient (ADC) model and intra-voxel incoherent motion (IVIM) model. Parameter estimation accuracy, precision and overall error were evaluated in terms of bias, variance and root mean squared error and compared against the MSE loss over a range of SNRs (5 - 30). Results: Networks trained with NLR loss show higher estimation accuracy than MSE for the ADC and IVIM diffusion coefficients as SNR decreases, with minimal loss of precision or total error. At high effective SNR (high SNR and small diffusion coefficients), both losses show comparable accuracy and precision for all parameters of both models. Conclusion: The proposed NLR loss is numerically stable and accurate across the full range of tested SNRs and improves parameter estimation accuracy of diffusion coefficients using self-supervised deep learning. We expect the development to benefit quantitative MR imaging techniques broadly, enabling more accurate parameter estimation from noisy data.
Improved Flood Insights: Diffusion-Based SAR to EO Image Translation
Authors: Minseok Seo, Youngtack Oh, Doyi Kim, Dongmin Kang, Yeji Choi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Driven by rapid climate change, the frequency and intensity of flood events are increasing. Electro-Optical (EO) satellite imagery is commonly utilized for rapid response. However, its utilities in flood situations are hampered by issues such as cloud cover and limitations during nighttime, making accurate assessment of damage challenging. Several alternative flood detection techniques utilizing Synthetic Aperture Radar (SAR) data have been proposed. Despite the advantages of SAR over EO in the aforementioned situations, SAR presents a distinct drawback: human analysts often struggle with data interpretation. To tackle this issue, this paper introduces a novel framework, Diffusion-Based SAR to EO Image Translation (DSE). The DSE framework converts SAR images into EO images, thereby enhancing the interpretability of flood insights for humans. Experimental results on the Sen1Floods11 and SEN12-FLOOD datasets confirm that the DSE framework not only delivers enhanced visual information but also improves performance across all tested flood segmentation baselines.
Federated Learning-Empowered AI-Generated Content in Wireless Networks
Authors: Xumin Huang, Peichun Li, Hongyang Du, Jiawen Kang, Dusit Niyato, Dong In Kim, Yuan Wu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Abstract
Artificial intelligence generated content (AIGC) has emerged as a promising technology to improve the efficiency, quality, diversity and flexibility of the content creation process by adopting a variety of generative AI models. Deploying AIGC services in wireless networks has been expected to enhance the user experience. However, the existing AIGC service provision suffers from several limitations, e.g., the centralized training in the pre-training, fine-tuning and inference processes, especially their implementations in wireless networks with privacy preservation. Federated learning (FL), as a collaborative learning framework where the model training is distributed to cooperative data owners without the need for data sharing, can be leveraged to simultaneously improve learning efficiency and achieve privacy protection for AIGC. To this end, we present FL-based techniques for empowering AIGC, and aim to enable users to generate diverse, personalized, and high-quality content. Furthermore, we conduct a case study of FL-aided AIGC fine-tuning by using the state-of-the-art AIGC model, i.e., stable diffusion model. Numerical results show that our scheme achieves advantages in effectively reducing the communication cost and training latency and privacy protection. Finally, we highlight several major research directions and open issues for the convergence of FL and AIGC.
Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection
Abstract
Anomalies are rare and anomaly detection is often therefore framed as One-Class Classification (OCC), i.e. trained solely on normalcy. Leading OCC techniques constrain the latent representations of normal motions to limited volumes and detect as abnormal anything outside, which accounts satisfactorily for the openset'ness of anomalies. But normalcy shares the same openset'ness property, since humans can perform the same action in several ways, which the leading techniques neglect. We propose a novel generative model for video anomaly detection (VAD), which assumes that both normality and abnormality are multimodal. We consider skeletal representations and leverage state-of-the-art diffusion probabilistic models to generate multimodal future human poses. We contribute a novel conditioning on the past motion of people, and exploit the improved mode coverage capabilities of diffusion processes to generate different-but-plausible future motions. Upon the statistical aggregation of future modes, anomaly is detected when the generated set of motions is not pertinent to the actual future. We validate our model on 4 established benchmarks: UBnormal, HR-UBnormal, HR-STC, and HR-Avenue, with extensive experiments surpassing state-of-the-art results.
High-order splitting finite element methods for the subdiffusion equation with limited smoothing property
Abstract
In contrast with the diffusion equation which smoothens the initial data to $C^\infty$ for $t>0$ (away from the corners/edges of the domain), the subdiffusion equation only exhibits limited spatial regularity. As a result, one generally cannot expect high-order accuracy in space in solving the subdiffusion equation with nonsmooth initial data. In this paper, a new splitting of the solution is constructed for high-order finite element approximations to the subdiffusion equation with nonsmooth initial data. The method is constructed by splitting the solution into two parts, i.e., a time-dependent smooth part and a time-independent nonsmooth part, and then approximating the two parts via different strategies. The time-dependent smooth part is approximated by using high-order finite element method in space and convolution quadrature in time, while the steady nonsmooth part could be approximated by using smaller mesh size or other methods that could yield high-order accuracy. Several examples are presented to show how to accurately approximate the steady nonsmooth part, including piecewise smooth initial data, Dirac--Delta point initial data, and Dirac measure concentrated on an interface. The argument could be directly extended to subdiffusion equations with nonsmooth source data. Extensive numerical experiments are presented to support the theoretical analysis and to illustrate the performance of the proposed high-order splitting finite element methods.
Inverse Evolution Layers: Physics-informed Regularizers for Deep Neural Networks
Abstract
This paper proposes a novel approach to integrating partial differential equation (PDE)-based evolution models into neural networks through a new type of regularization. Specifically, we propose inverse evolution layers (IELs) based on evolution equations. These layers can achieve specific regularization objectives and endow neural networks' outputs with corresponding properties of the evolution models. Moreover, IELs are straightforward to construct and implement, and can be easily designed for various physical evolutions and neural networks. Additionally, the design process for these layers can provide neural networks with intuitive and mathematical interpretability, thus enhancing the transparency and explainability of the approach. To demonstrate the effectiveness, efficiency, and simplicity of our approach, we present an example of endowing semantic segmentation models with the smoothness property based on the heat diffusion model. To achieve this goal, we design heat-diffusion IELs and apply them to address the challenge of semantic segmentation with noisy labels. The experimental results demonstrate that the heat-diffusion IELs can effectively mitigate the overfitting problem caused by noisy labels.
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
Authors: Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet, and 2) distilling labels obtained from generative networks with task heads onto logits of target backbones. We perform extensive analyses on multiple generative models, dense prediction benchmarks, and several pre-training regimes. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board. Unsupervised ImageNet pre-training with DreamTeacher leads to significant improvements over ImageNet classification pre-training on downstream datasets, showcasing generative models, and diffusion generative models specifically, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.
NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
Authors: Nilesh Kulkarni, Davis Rempe, Kyle Genova, Abhijit Kundu, Justin Johnson, David Fouhey, Leonidas Guibas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We address the problem of generating realistic 3D motions of humans interacting with objects in a scene. Our key idea is to create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input. This interaction field guides the sampling of an object-conditioned human motion diffusion model, so as to encourage plausible contacts and affordance semantics. To support interactions with scarcely available data, we propose an automated synthetic data pipeline. For this, we seed a pre-trained motion model, which has priors for the basics of human movement, with interaction-specific anchor poses extracted from limited motion capture data. Using our guided diffusion model trained on generated synthetic data, we synthesize realistic motions for sitting and lifting with several objects, outperforming alternative approaches in terms of motion quality and successful action completion. We call our framework NIFTY: Neural Interaction Fields for Trajectory sYnthesis.
Keyword: adaptive
Bridging the Gap: Heterogeneous Face Recognition with Conditional Adaptive Instance Modulation
Authors: Anjith George, Sebastien Marcel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Heterogeneous Face Recognition (HFR) aims to match face images across different domains, such as thermal and visible spectra, expanding the applicability of Face Recognition (FR) systems to challenging scenarios. However, the domain gap and limited availability of large-scale datasets in the target domain make training robust and invariant HFR models from scratch difficult. In this work, we treat different modalities as distinct styles and propose a framework to adapt feature maps, bridging the domain gap. We introduce a novel Conditional Adaptive Instance Modulation (CAIM) module that can be integrated into pre-trained FR networks, transforming them into HFR networks. The CAIM block modulates intermediate feature maps, to adapt the style of the target modality effectively bridging the domain gap. Our proposed method allows for end-to-end training with a minimal number of paired samples. We extensively evaluate our approach on multiple challenging benchmarks, demonstrating superior performance compared to state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available.
Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability
Authors: Yanran Wang, David Boyle
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.
A Hybrid Genetic Algorithm for the min-max Multiple Traveling Salesman Problem
Authors: Sasan Mahmoudinazlou, Changhyun Kwon
Subjects: Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
Abstract
This paper proposes a hybrid genetic algorithm for solving the Multiple Traveling Salesman Problem (mTSP) to minimize the length of the longest tour. The genetic algorithm utilizes a TSP sequence as the representation of each individual, and a dynamic programming algorithm is employed to evaluate the individual and find the optimal mTSP solution for the given sequence of cities. A novel crossover operator is designed to combine similar tours from two parents and offers great diversity for the population. For some of the generated offspring, we detect and remove intersections between tours to obtain a solution with no intersections. This is particularly useful for the min-max mTSP. The generated offspring are also improved by a self-adaptive random local search and a thorough neighborhood search. Our algorithm outperforms all existing algorithms on average, with similar cutoff time thresholds, when tested against multiple benchmark sets found in the literature. Additionally, we improve the best-known solutions for 21 out of 89 instances on four benchmark sets.
Adaptive Region Selection for Active Learning in Whole Slide Image Semantic Segmentation
Authors: Jingna Qiu, Frauke Wilm, Mathias Öttl, Maja Schlereth, Chang Liu, Tobias Heimann, Marc Aubreville, Katharina Breininger
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The process of annotating histological gigapixel-sized whole slide images (WSIs) at the pixel level for the purpose of training a supervised segmentation model is time-consuming. Region-based active learning (AL) involves training the model on a limited number of annotated image regions instead of requesting annotations of the entire images. These annotation regions are iteratively selected, with the goal of optimizing model performance while minimizing the annotated area. The standard method for region selection evaluates the informativeness of all square regions of a specified size and then selects a specific quantity of the most informative regions. We find that the efficiency of this method highly depends on the choice of AL step size (i.e., the combination of region size and the number of selected regions per WSI), and a suboptimal AL step size can result in redundant annotation requests or inflated computation costs. This paper introduces a novel technique for selecting annotation regions adaptively, mitigating the reliance on this AL hyperparameter. Specifically, we dynamically determine each region by first identifying an informative area and then detecting its optimal bounding box, as opposed to selecting regions of a uniform predefined shape and size as in the standard method. We evaluate our method using the task of breast cancer metastases segmentation on the public CAMELYON16 dataset and show that it consistently achieves higher sampling efficiency than the standard method across various AL step sizes. With only 2.6\% of tissue area annotated, we achieve full annotation performance and thereby substantially reduce the costs of annotating a WSI dataset. The source code is available at https://github.com/DeepMicroscopy/AdaptiveRegionSelection.
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
Abstract
Federated Learning (FL) emerges as a distributed machine learning paradigm without end-user data transmission, effectively avoiding privacy leakage. Participating devices in FL are usually bandwidth-constrained, and the uplink is much slower than the downlink in wireless networks, which causes a severe uplink communication bottleneck. A prominent direction to alleviate this problem is federated dropout, which drops fractional weights of local models. However, existing federated dropout studies focus on random or ordered dropout and lack theoretical support, resulting in unguaranteed performance. In this paper, we propose Federated learning with Bayesian Inference-based Adaptive Dropout (FedBIAD), which regards weight rows of local models as probability distributions and adaptively drops partial weight rows based on importance indicators correlated with the trend of local training loss. By applying FedBIAD, each client adaptively selects a high-quality dropping pattern with accurate approximations and only transmits parameters of non-dropped weight rows to mitigate uplink costs while improving accuracy. Theoretical analysis demonstrates that the convergence rate of the average generalization error of FedBIAD is minimax optimal up to a squared logarithmic factor. Extensive experiments on image classification and next-word prediction show that compared with status quo approaches, FedBIAD provides 2x uplink reduction with an accuracy increase of up to 2.41% even on non-Independent and Identically Distributed (non-IID) data, which brings up to 72% decrease in training time.
Multiplicative update rules for accelerating deep learning training and increasing robustness
Abstract
Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued to develop robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to best of our knowledge, this is the first work that investigate them in context of DL training acceleration and robustness. In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of task and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.
Numerical cubature on scattered data by adaptive interpolation
Authors: R. Cavoretto, F. Dell'Accio, A. De Rossi, F. Di Tommaso, N.Siar, A. Sommariva, M. Vianello
Abstract
We construct cubature methods on scattered data via resampling on the support of known algebraic cubature formulas, by different kinds of adaptive interpolation (polynomial, RBF, PUM). This approach gives a promising alternative to other recent methods, such as direct meshless cubature by RBF or least-squares cubature formulas.
MaxSR: Image Super-Resolution Using Improved MaxViT
Authors: Bincheng Yang, Gangshan Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
While transformer models have been demonstrated to be effective for natural language processing tasks and high-level vision tasks, only a few attempts have been made to use powerful transformer models for single image super-resolution. Because transformer models have powerful representation capacity and the in-built self-attention mechanisms in transformer models help to leverage self-similarity prior in input low-resolution image to improve performance for single image super-resolution, we present a single image super-resolution model based on recent hybrid vision transformer of MaxViT, named as MaxSR. MaxSR consists of four parts, a shallow feature extraction block, multiple cascaded adaptive MaxViT blocks to extract deep hierarchical features and model global self-similarity from low-level features efficiently, a hierarchical feature fusion block, and finally a reconstruction block. The key component of MaxSR, i.e., adaptive MaxViT block, is based on MaxViT block which mixes MBConv with squeeze-and-excitation, block attention and grid attention. In order to achieve better global modelling of self-similarity in input low-resolution image, we improve block attention and grid attention in MaxViT block to adaptive block attention and adaptive grid attention which do self-attention inside each window across all grids and each grid across all windows respectively in the most efficient way. We instantiate proposed model for classical single image super-resolution (MaxSR) and lightweight single image super-resolution (MaxSR-light). Experiments show that our MaxSR and MaxSR-light establish new state-of-the-art performance efficiently.
An Online Learning Analysis of Minimax Adaptive Control
Authors: Venkatraman Renganathan, Andrea Iannelli, Anders Rantzer
Abstract
We present an online learning analysis of minimax adaptive control for the case where the uncertainty includes a finite set of linear dynamical systems. Precisely, for each system inside the uncertainty set, we define the model-based regret by comparing the state and input trajectories from the minimax adaptive controller against that of an optimal controller in hindsight that knows the true dynamics. We then define the total regret as the worst case model-based regret with respect to all models in the considered uncertainty set. We study how the total regret accumulates over time and its effect on the adaptation mechanism employed by the controller. Moreover, we investigate the effect of the disturbance on the growth of the regret over time and draw connections between robustness of the controller and the associated regret rate.
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Abstract
Image captioning is a significant field across computer vision and natural language processing. We propose and present AIC-AB NET, a novel Attribute-Information-Combined Attention-Based Network that combines spatial attention architecture and text attributes in an encoder-decoder. For caption generation, adaptive spatial attention determines which image region best represents the image and whether to attend to the visual features or the visual sentinel. Text attribute information is synchronously fed into the decoder to help image recognition and reduce uncertainty. We have tested and evaluated our AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The Fashion dataset is employed as a benchmark of single-object images. The results show the superior performance of the proposed model compared to the state-of-the-art baseline and ablated models on both the images from MSCOCO and our single-object images. Our AIC-AB NET outperforms the baseline adaptive attention network by 0.017 (CIDEr score) on the MS COCO dataset and 0.095 (CIDEr score) on the Fashion dataset.
Keyword: efficient
Copy Is All You Need
Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section
An Exploration of the Impact of Mapping Style and Device Roadmap on Simulated ReRAM Architectures for Neuromorphic Computing
Vertex-based Networks to Accelerate Path Planning Algorithms
Rician likelihood loss for quantitative MRI using self-supervised deep learning
State-Robust Observability Measures for Sensor Selection in Nonlinear Dynamic Systems
A Scenario-Based Functional Testing Approach to Improving DNN Performance
More Than React: Investigating The Role of Emoji Reaction in GitHub Pull Requests
Risk-Constrained Control of Mean-Field Linear Quadratic Systems
Parallelising Glauber dynamics
SLSSNN: High energy efficiency spike-train level spiking neural networks with spatio-temporal conversion
A Surrogate Data Assimilation Model for the Estimation of Dynamical System in a Limited Area
A $(3/2 + \varepsilon)$-Approximation for Multiple TSP with a Variable Number of Depots
Analytical Investigation of Two Benchmark Resource Allocation Algorithms for LTE-V2V
MaxSR: Image Super-Resolution Using Improved MaxViT
Numerical evaluation of oscillatory integrals via automated steepest descent contour deformation
Random Wheeler Automata
Sampling-Priors-Augmented Deep Unfolding Network for Robust Video Compressive Sensing
3D Shape-Based Myocardial Infarction Prediction Using Point Cloud Classification Networks
HEAL-SWIN: A Vision Transformer On The Sphere
Distributed Planning for Rigid Robot Formations using Consensus on the Transformation of a Base Configuration
Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy
Fully Coupled Forced Response Analysis of Nonlinear Turbine Blade Vibrations in the Frequency Domain
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
Strategic Budget Selection in a Competitive Autobidding World
Rank Your Summaries: Enhancing Bengali Text Summarization via Ranking-based Approach
SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation
Global sensitivity analysis in the limited data setting with application to char combustion
A novel family of finite automata for recognizing and learning $ω$-regular languages
BehAVExplor: Behavior Diversity Guided Testing for Autonomous Driving Systems
TALL: Thumbnail Layout for Deepfake Video Detection
MGit: A Model Versioning and Management System
Keyword: faster
Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar
Parallelising Glauber dynamics
Improving BERT with Hybrid Pooling Network and Drop Mask
Keyword: mobile
Adaptive Coding and Modulation Aided Mobile Relaying for Millimeter-Wave Flying Ad-Hoc Networks
Reconfigurable Intelligent Surface Assisted Free Space Optical Information and Power Transfer
Secure Short-Packet Communications via UAV-Enabled Mobile Relaying: Joint Resource Optimization and 3D Trajectory Design
A Tutorial on Extremely Large-Scale MIMO for 6G: Fundamentals, Signal Processing, and Applications
TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling
Keyword: pruning
Local elimination in the traveling salesman problem
Learning Sparse Neural Networks with Identity Layers
Structured Pruning of Neural Networks for Constraints Learning
Keyword: diffusion
Neuro-symbolic Empowered Denoising Diffusion Probabilistic Models for Real-time Anomaly Detection in Industry 4.0
Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
Rician likelihood loss for quantitative MRI using self-supervised deep learning
Improved Flood Insights: Diffusion-Based SAR to EO Image Translation
Federated Learning-Empowered AI-Generated Content in Wireless Networks
Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection
High-order splitting finite element methods for the subdiffusion equation with limited smoothing property
Inverse Evolution Layers: Physics-informed Regularizers for Deep Neural Networks
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
Keyword: adaptive
Bridging the Gap: Heterogeneous Face Recognition with Conditional Adaptive Instance Modulation
Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability
A Hybrid Genetic Algorithm for the min-max Multiple Traveling Salesman Problem
Adaptive Region Selection for Active Learning in Whole Slide Image Semantic Segmentation
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
Multiplicative update rules for accelerating deep learning training and increasing robustness
Numerical cubature on scattered data by adaptive interpolation
MaxSR: Image Super-Resolution Using Improved MaxViT
An Online Learning Analysis of Minimax Adaptive Control
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Keyword: quantization
There is no result