Abstract
Perimeter control maintains high traffic efficiency within protected regions by controlling transfer flows among regions to ensure that their traffic densities are below critical values. Existing approaches can be categorized as either model-based or model-free, depending on whether they rely on network transmission models (NTMs) and macroscopic fundamental diagrams (MFDs). Although model-based approaches are more data efficient and have performance guarantees, they are inherently prone to model bias and inaccuracy. For example, NTMs often become imprecise for a large number of protected regions, and MFDs can exhibit scatter and hysteresis that are not captured in existing model-based works. Moreover, no existing studies have employed reinforcement learning for homogeneous flow rate optimization in microscopic simulation, where spatial characteristics, vehicle-level information, and metering realizations -- often overlooked in macroscopic simulations -- are taken into account. To circumvent issues of model-based approaches and macroscopic simulation, we propose a model-free deep reinforcement learning approach that optimizes the flow rate homogeneously at the perimeter at the microscopic level. Results demonstrate that our model-free reinforcement learning approach without any knowledge of NTMs or MFDs can compete and match the performance of a model-based approach, and exhibits enhanced generalizability and scalability.
Cones 2: Customizable Image Synthesis with Multiple Subjects
Authors: Zhiheng Liu, Yifei Zhang, Yujun Shen, Kecheng Zheng, Kai Zhu, Ruili Feng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Synthesizing images with user-specified subjects has received growing attention due to its practical applications. Despite the recent success in single subject customization, existing algorithms suffer from high training cost and low success rate along with increased number of subjects. Towards controllable image synthesis with multiple subjects as the constraints, this work studies how to efficiently represent a particular subject as well as how to appropriately compose different subjects. We find that the text embedding regarding the subject token already serves as a simple yet effective representation that supports arbitrary combinations without any model tuning. Through learning a residual on top of the base embedding, we manage to robustly shift the raw subject to the customized subject given various text conditions. We then propose to employ layout, a very abstract and easy-to-obtain prior, as the spatial guidance for subject arrangement. By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image, significantly alleviating the interference across them. Both qualitative and quantitative experimental results demonstrate our superiority over state-of-the-art alternatives under a variety of settings for multi-subject customization.
Abstract
The projection operation is a critical component in a wide range of optimization algorithms, such as online gradient descent (OGD), for enforcing constraints and achieving optimal regret bounds. However, it suffers from computational complexity limitations in high-dimensional settings or when dealing with ill-conditioned constraint sets. Projection-free algorithms address this issue by replacing the projection oracle with more efficient optimization subroutines. But to date, these methods have been developed primarily in the Euclidean setting, and while there has been growing interest in optimization on Riemannian manifolds, there has been essentially no work in trying to utilize projection-free tools here. An apparent issue is that non-trivial affine functions are generally non-convex in such domains. In this paper, we present methods for obtaining sub-linear regret guarantees in online geodesically convex optimization on curved spaces for two scenarios: when we have access to (a) a separation oracle or (b) a linear optimization oracle. For geodesically convex losses, and when a separation oracle is available, our algorithms achieve $O(T^{1/2}\:)$ and $O(T^{3/4}\;)$ adaptive regret guarantees in the full information setting and the bandit setting, respectively. When a linear optimization oracle is available, we obtain regret rates of $O(T^{3/4}\;)$ for geodesically convex losses and $O(T^{2/3}\; log T )$ for strongly geodesically convex losses
An absolutely convergent fixed-point fast sweeping WENO method on triangular meshes for steady state of hyperbolic conservation laws
Abstract
High order fast sweeping methods for efficiently solving steady state solutions of hyperbolic PDEs were not available yet on unstructured meshes. In this paper, we extend high order fast sweeping methods to unstructured triangular meshes by applying fixed-point iterative sweeping techniques to a fifth-order finite volume unstructured WENO scheme, for solving steady state solutions of hyperbolic conservation laws. An advantage of fixed-point fast sweeping methods which distinguishes them from other fast sweeping methods is that they are explicit and do not require inverse operation of nonlinear local systems. As in the first order fast sweeping methods on triangular meshes, we introduce multiple reference points to determine alternating sweeping directions on unstructured meshes. All the cells on the mesh are ordered according to their centroids' distances to those reference points, and the resulted orderings provide sweeping directions for iterations. To make the residue of the fast sweeping iterations converge to machine zero / round off errors, we follow the approach in our early work of developing the absolutely convergent fixed-point fast sweeping WENO methods on rectangular meshes, and adopt high order WENO scheme with unequal-sized sub-stencils for spatial discretization. Extensive numerical experiments are performed to show the accuracy, computational efficiency, and absolute convergence of the presented fifth-order fast sweeping scheme on triangular meshes. Furthermore, the proposed method is compared with the forward Euler method and the popular third order total variation diminishing Rung-Kutta method for steady state computations. Numerical examples show that the developed fixed-point fast sweeping WENO method is the most efficient scheme among them, and especially it can save up to 70% CPU time costs than TVD-RK3 to converge to steady state solutions.
Blockwise Parallel Transformer for Long Context Large Models
Authors: Hao Liu, Pieter Abbeel
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands posed by the self-attention mechanism and the large feedforward network in Transformers limit their ability to handle long sequences, thereby creating challenges for tasks involving multiple long sequences or long-term dependencies. We present a distinct approach, Blockwise Parallel Transformer (BPT), that leverages blockwise computation of self-attention and feedforward network fusion to minimize memory costs. By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences up to 32 times longer than vanilla Transformers and 2 to 4 times longer than previous memory-efficient methods. Extensive experiments on language modeling and reinforcement learning tasks demonstrate the effectiveness of BPT in reducing memory requirements and improving performance.
Compositional diversity in visual concept learning
Authors: Yanli Zhou, Reuben Feinman, Brenden M. Lake
Abstract
Humans leverage compositionality to efficiently learn new concepts, understanding how familiar parts can combine together to form novel objects. In contrast, popular computer vision models struggle to make the same types of inferences, requiring more data and generalizing less flexibly than people do. Here, we study these distinctively human abilities across a range of different types of visual composition, examining how people classify and generate ``alien figures'' with rich relational structure. We also develop a Bayesian program induction model which searches for the best programs for generating the candidate visual figures, utilizing a large program space containing different compositional mechanisms and abstractions. In few shot classification tasks, we find that people and the program induction model can make a range of meaningful compositional generalizations, with the model providing a strong account of the experimental data as well as interpretable parameters that reveal human assumptions about the factors invariant to category membership (here, to rotation and changing part attachment). In few shot generation tasks, both people and the models are able to construct compelling novel examples, with people behaving in additional structured ways beyond the model capabilities, e.g. making choices that complete a set or reconfiguring existing parts in highly novel ways. To capture these additional behavioral patterns, we develop an alternative model based on neuro-symbolic program induction: this model also composes new concepts from existing parts yet, distinctively, it utilizes neural network modules to successfully capture residual statistical structure. Together, our behavioral and computational findings show how people and models can produce a rich variety of compositional behavior when classifying and generating visual objects.
Generating Finite Element Codes combining Adaptive Octrees with Complex Geometries
Authors: Eric Heisler, Cheng-Hau Yang, Aadesh Deshmukh, Baskar Ganapathysubramanian, Hari Sundar
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
We present a high-level domain-specific language (DSL) interface to drive an adaptive incomplete $k$-d tree-based framework for finite element (FEM) solutions to PDEs. This DSL provides three key advances: (a) it abstracts out the complexity of implementing non-trivial FEM formulations, (b) it simplifies deploying these formulations on arbitrarily complicated and adaptively refined meshes, and (c) it exhibits good parallel performance. Taken together, the DSL interface allows end-users to rapidly and efficiently prototype new mathematical approaches, and deploy them on large clusters for solving practical problems. We illustrate this DSL by implementing a workflow for solving PDEs using the recently developed shifted boundary method (SBM). The SBM requires approximating relatively complicated integrals over boundary surfaces. Using a high-level DSL greatly simplifies this process and allows rapid exploration of variations. We demonstrate these tools on a variety of 2-D and 3-D configurations. With fewer than 20 lines of input, we can produce a parallel code that scales well to thousands of processes. This generated code is made accessible and readable to be easily modified and hand-tuned, making this tool useful even to experts with the target software.
FRAMM: Fair Ranking with Missing Modalities for Clinical Trial Site Selection
Authors: Brandon Theodorou, Lucas Glass, Cao Xiao, Jimeng Sun
Abstract
Despite many efforts to address the disparities, the underrepresentation of gender, racial, and ethnic minorities in clinical trials remains a problem and undermines the efficacy of treatments on minorities. This paper focuses on the trial site selection task and proposes FRAMM, a deep reinforcement learning framework for fair trial site selection. We focus on addressing two real-world challenges that affect fair trial sites selection: the data modalities are often not complete for many potential trial sites, and the site selection needs to simultaneously optimize for both enrollment and diversity since the problem is necessarily a trade-off between the two with the only possible way to increase diversity post-selection being through limiting enrollment via caps. To address the missing data challenge, FRAMM has a modality encoder with a masked cross-attention mechanism for handling missing data, bypassing data imputation and the need for complete data in training. To handle the need for making efficient trade-offs, FRAMM uses deep reinforcement learning with a specifically designed reward function that simultaneously optimizes for both enrollment and fairness. We evaluate FRAMM using 4,392 real-world clinical trials ranging from 2016 to 2021 and show that FRAMM outperforms the leading baseline in enrollment-only settings while also achieving large gains in diversity. Specifically, it is able to produce a 9% improvement in diversity with similar enrollment levels over the leading baselines. That improved diversity is further manifested in achieving up to a 14% increase in Hispanic enrollment, 27% increase in Black enrollment, and 60% increase in Asian enrollment compared to selecting sites with an enrollment-only model.
Are Large Kernels Better Teachers than Transformers for ConvNets?
Authors: Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, Shiwei Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small-kernel ConvNets. While Transformers have led state-of-the-art (SOTA) performance in various fields with ever-larger models and labeled data, small-kernel ConvNets are considered more suitable for resource-limited applications due to the efficient convolution operation and compact weight sharing. KD is widely used to boost the performance of small-kernel ConvNets. However, previous research shows that it is not quite effective to distill knowledge (e.g., global information) from Transformers to small-kernel ConvNets, presumably due to their disparate architectures. We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more effective teachers for small-kernel ConvNets, due to more similar architectures. Our findings are backed up by extensive experiments on both logit-level and feature-level KD ``out of the box", with no dedicated architectural nor training recipe modifications. Notably, we obtain the \textbf{best-ever pure ConvNet} under 30M parameters with \textbf{83.1\%} top-1 accuracy on ImageNet, outperforming current SOTA methods including ConvNeXt V2 and Swin V2. We also find that beneficial characteristics of large-kernel ConvNets, e.g., larger effective receptive fields, can be seamlessly transferred to students through this large-to-small kernel distillation. Code is available at: \url{https://github.com/VITA-Group/SLaK}.
Efficient Training of Energy-Based Models Using Jarzynski Equality
Authors: Davide Carbone, Mengjian Hua, Simon Coste, Eric Vanden-Eijnden
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Numerical Analysis (math.NA); Probability (math.PR)
Abstract
Energy-based models (EBMs) are generative models inspired by statistical physics with a wide range of applications in unsupervised learning. Their performance is best measured by the cross-entropy (CE) of the model distribution relative to the data distribution. Using the CE as the objective for training is however challenging because the computation of its gradient with respect to the model parameters requires sampling the model distribution. Here we show how results for nonequilibrium thermodynamics based on Jarzynski equality together with tools from sequential Monte-Carlo sampling can be used to perform this computation efficiently and avoid the uncontrolled approximations made using the standard contrastive divergence algorithm. Specifically, we introduce a modification of the unadjusted Langevin algorithm (ULA) in which each walker acquires a weight that enables the estimation of the gradient of the cross-entropy at any step during GD, thereby bypassing sampling biases induced by slow mixing of ULA. We illustrate these results with numerical experiments on Gaussian mixture distributions as well as the MNIST dataset. We show that the proposed approach outperforms methods based on the contrastive divergence algorithm in all the considered situations.
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Authors: Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro
Abstract
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
Efficient Implementation of a Multi-Layer Gradient-Free Online-Trainable Spiking Neural Network on FPGA
Authors: Ali Mehrabi, Yeshwanth Bethi, André van Schaik, Andrew Wabnitz, Saeed Afshar
Abstract
This paper presents an efficient hardware implementation of the recently proposed Optimized Deep Event-driven Spiking Neural Network Architecture (ODESA). ODESA is the first network to have end-to-end multi-layer online local supervised training without using gradients and has the combined adaptation of weights and thresholds in an efficient hierarchical structure. This research shows that the network architecture and the online training of weights and thresholds can be implemented efficiently on a large scale in hardware. The implementation consists of a multi-layer Spiking Neural Network (SNN) and individual training modules for each layer that enable online self-learning without using back-propagation. By using simple local adaptive selection thresholds, a Winner-Takes-All (WTA) constraint on each layer, and a modified weight update rule that is more amenable to hardware, the trainer module allocates neuronal resources optimally at each layer without having to pass high-precision error measurements across layers. All elements in the system, including the training module, interact using event-based binary spikes. The hardware-optimized implementation is shown to preserve the performance of the original algorithm across multiple spatial-temporal classification problems with significantly reduced hardware requirements.
Exploring Lottery Prompts for Pre-trained Language Models
Abstract
Consistently scaling pre-trained language models (PLMs) imposes substantial burdens on model adaptation, necessitating more efficient alternatives to conventional fine-tuning. Given the advantage of prompting in the zero-shot setting and the observed performance fluctuation among different prompts, we explore the instance-level prompt and their generalizability. By searching through the prompt space, we first validate the assumption that for every instance, there is almost always a lottery prompt that induces the correct prediction from the PLM, and such prompt can be obtained at a low cost thanks to the inherent ability of PLMs. Meanwhile, we find that some strong lottery prompts have high performance over the whole training set, and they are equipped with distinguishable linguistic features. Lastly, we attempt to generalize the searched strong lottery prompts to unseen data with prompt ensembling method without any parameter tuning. Experiments are conducted on various types of NLP classification tasks and demonstrate that the proposed method can achieve comparable results with other gradient-free and optimization-free baselines.
Kaczmarz-Type Methods for Solving Matrix Equations
Abstract
In this paper, several Kaczmarz-type numerical methods for solving the matrix equation $AX=B$ and $XA=C$ are proposed, where the coefficient matrix $A$ may be full rank or rank deficient. These methods are iterative methods without matrix multiplication. Theoretically, the convergence of these methods is proved. The numerical results show that these methods are more efficient than iterative methods involving matrix multiplication for high-dimensional matrices.
DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator
Authors: Hanqing Zhu, Jiaqi Gu, Hanrui Wang, Zixuan Jiang, Zhekai Zhang, Rongxin Tang, Chenghao Feng, Song Han, Ray T. Chen, David Z. Pan
Abstract
The wide adoption and significant computing resource consumption of attention-based Transformers, e.g., Vision Transformer and large language models, have driven the demands for efficient hardware accelerators. While electronic accelerators have been commonly used, there is a growing interest in exploring photonics as an alternative technology due to its high energy efficiency and ultra-fast processing speed. Optical neural networks (ONNs) have demonstrated promising results for convolutional neural network (CNN) workloads that only require weight-static linear operations. However, they fail to efficiently support Transformer architectures with attention operations due to the lack of ability to process dynamic full-range tensor multiplication. In this work, we propose a customized high-performance and energy-efficient photonic Transformer accelerator, DOTA. To overcome the fundamental limitation of existing ONNs, we introduce a novel photonic tensor core, consisting of a crossbar array of interference-based optical vector dot-product engines, that supports highly-parallel, dynamic, and full-range matrix-matrix multiplication. Our comprehensive evaluation demonstrates that DOTA achieves a >4x energy and a >10x latency reduction compared to prior photonic accelerators, and delivers over 20x energy reduction and 2 to 3 orders of magnitude lower latency compared to the electronic Transformer accelerator. Our work highlights the immense potential of photonic computing for efficient hardware accelerators, particularly for advanced machine learning workloads.
Replicability in Reinforcement Learning
Authors: Amin Karbasi, Grigoris Velegkas, Lin F. Yang, Felix Zhou
Abstract
We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). We focus on the fundamental setting of discounted tabular MDPs with access to a generative model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is replicable if, with high probability, it outputs the exact same policy after two executions on i.i.d. samples drawn from the generator when its internal randomness is the same. We first provide an efficient $\rho$-replicable algorithm for $(\varepsilon, \delta)$-optimal policy estimation with sample and time complexity $\widetilde O\left(\frac{N^3\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$, where $N$ is the number of state-action pairs. Next, for the subclass of deterministic algorithms, we provide a lower bound of order $\Omega\left(\frac{N^3}{(1-\gamma)^3\cdot\varepsilon^2\cdot\rho^2}\right)$. Then, we study a relaxed version of replicability proposed by Kalavasis et al. [2023] called TV indistinguishability. We design a computationally efficient TV indistinguishable algorithm for policy estimation whose sample complexity is $\widetilde O\left(\frac{N^2\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$. At the cost of $\exp(N)$ running time, we transform these TV indistinguishable algorithms to $\rho$-replicable ones without increasing their sample complexity. Finally, we introduce the notion of approximate-replicability where we only require that two outputted policies are close under an appropriate statistical divergence (e.g., Renyi) and show an improved sample complexity of $\widetilde O\left(\frac{N\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$.
Zero-Shot Automatic Pronunciation Assessment
Authors: Hongfu Liu, Mingqian Shi, Ye Wang
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract
Automatic Pronunciation Assessment (APA) is vital for computer-assisted language learning. Prior methods rely on annotated speech-text data to train Automatic Speech Recognition (ASR) models or speech-score data to train regression models. In this work, we propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT. Our method involves encoding speech input and corrupting them via a masking module. We then employ the Transformer encoder and apply k-means clustering to obtain token sequences. Finally, a scoring module is designed to measure the number of wrongly recovered tokens. Experimental results on speechocean762 demonstrate that the proposed method achieves comparable performance to supervised regression baselines and outperforms non-regression baselines in terms of Pearson Correlation Coefficient (PCC). Additionally, we analyze how masking strategies affect the performance of APA.
Confidential Signal Cancellation Phenomenon in Interference Alignment Networks: Cause and Cure
Authors: Lin Hu, Jiabing Fan, Hong Wen, Jie Tang, Qianbin Chen
Abstract
This paper investigates physical layer security (PLS) in wireless interference networks. Specifically, we consider confidential transmission from a legitimate transmitter (Alice) to a legitimate receiver (Bob), in the presence of non-colluding passive eavesdroppers (Eves), as well as multiple legitimate transceivers. To mitigate interference at legitimate receivers and enhance PLS, artificial noise (AN) aided interference alignment (IA) is explored. However, the conventional leakage minimization (LM) based IA may exhibit confidential signal cancellation phenomenon. We theoretically analyze the cause and then establish a condition under which this phenomenon will occur almost surely. Moreover, we propose a means of avoiding this phenomenon by integrating the max-eigenmode beamforming (MEB) into the traditional LM based IA. By assuming that only statistical channel state informations (CSIs) of Eves and local CSIs of legitimate users are available, we derive a closed form expression for the secrecy outage probability (SOP), and establish a condition under which positive secrecy rate is achievable. To enhance security performance, an SOP constrained secrecy rate maximization (SRM) problem is formulated and an efficient numerical method is developed for the optimal solution. Numerical results confirm the effectiveness and the usefulness of the proposed approach.
Towards Omni-generalizable Neural Methods for Vehicle Routing Problems
Abstract
Learning heuristics for vehicle routing problems (VRPs) has gained much attention due to the less reliance on hand-crafted rules. However, existing methods are typically trained and tested on the same task with a fixed size and distribution (of nodes), and hence suffer from limited generalization performance. This paper studies a challenging yet realistic setting, which considers generalization across both size and distribution in VRPs. We propose a generic meta-learning framework, which enables effective training of an initialized model with the capability of fast adaptation to new tasks during inference. We further develop a simple yet efficient approximation method to reduce the training overhead. Extensive experiments on both synthetic and benchmark instances of the traveling salesman problem (TSP) and capacitated vehicle routing problem (CVRP) demonstrate the effectiveness of our method. The code is available at: https://github.com/RoyalSkye/Omni-VRP.
Neural Kernel Surface Reconstruction
Authors: Jiahui Huang, Zan Gojcic, Matan Atzmon, Or Litany, Sanja Fidler, Francis Williams
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We present a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point cloud. Our approach builds upon the recently introduced Neural Kernel Fields (NKF) representation. It enjoys similar generalization capabilities to NKF, while simultaneously addressing its main limitations: (a) We can scale to large scenes through compactly supported kernel functions, which enable the use of memory-efficient sparse linear solvers. (b) We are robust to noise, through a gradient fitting solve. (c) We minimize training requirements, enabling us to learn from any dataset of dense oriented points, and even mix training data consisting of objects and scenes at different scales. Our method is capable of reconstructing millions of points in a few seconds, and handling very large scenes in an out-of-core fashion. We achieve state-of-the-art results on reconstruction benchmarks consisting of single objects, indoor scenes, and outdoor scenes.
Monitoring the evolution of dimensional accuracy and product properties in property-controlled forming processes
Authors: Sophie Charlotte Stebner, Juri Martschin, Bahman Arian, Stefan Dietrich, Martin Feistle, Sebastian Hütter, Rémi Lafarge, Robert Laue, Xinyang Li, Christopher Schulte, Daniel Spies, Ferdinand Thein, Frank Wendler, Malte Wrobel, Julian Rozo Vasquez, Michael Dölz, Sebastian Münstermann
Abstract
As recent trends in manufacturing engineering disciplines show a clear development in the sustainable as well as economically efficient design of forming processes, monitoring techniques have been gaining in relevance. In terms of monitoring of product properties, most processes are currently open-loop controlled, entailing that the microstructure evolution, which determines the final product properties, is not considered. However, a closed-loop control that can adjust and manipulate the process actuators according to the required product properties of the component will lead to a considerable increase in efficiency of the processes regarding resources and will decrease postproduction of the component. For most forming processes, one set of component dimensions will result in a certain set of product properties. However, to successfully establish closed-loop property controls for the processes, a systematic understanding of the reciprocity of the dimensions after forming and final product properties must be established. This work investigates the evolution of dimensional accuracy as well as product properties for a series of forming processes that utilize different degrees of freedom for process control.
Measuring and Predicting the Quality of a Join for Data Discovery
Authors: Sergi Nadal, Raquel Panadero, Javier Flores, Oscar Romero
Abstract
We study the problem of discovering joinable datasets at scale. We approach the problem from a learning perspective relying on profiles. These are succinct representations that capture the underlying characteristics of the schemata and data values of datasets, which can be efficiently extracted in a distributed and parallel fashion. Profiles are then compared, to predict the quality of a join operation among a pair of attributes from different datasets. In contrast to the state-of-the-art, we define a novel notion of join quality that relies on a metric considering both the containment and cardinality proportion between join candidate attributes. We implement our approach in a system called NextiaJD, and present experiments to show the predictive performance and computational efficiency of our method. Our experiments show that NextiaJD obtains greater predictive performance to that of hash-based methods while we are able to scale-up to larger volumes of data.
Improving Expressivity of Graph Neural Networks using Localization
Abstract
In this paper, we propose localized versions of Weisfeiler-Leman (WL) algorithms in an effort to both increase the expressivity, as well as decrease the computational overhead. We focus on the specific problem of subgraph counting and give localized versions of $k-$WL for any $k$. We analyze the power of Local $k-$WL and prove that it is more expressive than $k-$WL and at most as expressive as $(k+1)-$WL. We give a characterization of patterns whose count as a subgraph and induced subgraph are invariant if two graphs are Local $k-$WL equivalent. We also introduce two variants of $k-$WL: Layer $k-$WL and recursive $k-$WL. These methods are more time and space efficient than applying $k-$WL on the whole graph. We also propose a fragmentation technique that guarantees the exact count of all induced subgraphs of size at most 4 using just $1-$WL. The same idea can be extended further for larger patterns using $k>1$. We also compare the expressive power of Local $k-$WL with other GNN hierarchies and show that given a bound on the time-complexity, our methods are more expressive than the ones mentioned in Papp and Wattenhofer[2022a].
Vandermonde Neural Operators
Authors: Levi Lingsch, Mike Michelis, Sirani M. Perera, Robert K. Katzschmann, Siddartha Mishra
Abstract
Fourier Neural Operators (FNOs) have emerged as very popular machine learning architectures for learning operators, particularly those arising in PDEs. However, as FNOs rely on the fast Fourier transform for computational efficiency, the architecture can be limited to input data on equispaced Cartesian grids. Here, we generalize FNOs to handle input data on non-equispaced point distributions. Our proposed model, termed as Vandermonde Neural Operator (VNO), utilizes Vandermonde-structured matrices to efficiently compute forward and inverse Fourier transforms, even on arbitrarily distributed points. We present numerical experiments to demonstrate that VNOs can be significantly faster than FNOs, while retaining comparable accuracy, and improve upon accuracy of comparable non-equispaced methods such as the Geo-FNO.
Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA
Authors: Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
Abstract
To increase the generalization capability of VQA systems, many recent studies have tried to de-bias spurious language or vision associations that shortcut the question or image to the answer. Despite these efforts, the literature fails to address the confounding effect of vision and language simultaneously. As a result, when they reduce bias learned from one modality, they usually increase bias from the other. In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect. The model trained in this strategy can concurrently and efficiently reduce vision and language bias. To the best of our knowledge, this is the first work to reduce biases resulting from confounding effects of vision and language in VQA, leveraging causal explain-away relations. We accompany our method with an explain-away strategy, pushing the accuracy of the questions with numerical answers results compared to existing methods that have been an open problem. The proposed method outperforms the state-of-the-art methods in VQA-CP v2 datasets.
Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation
Authors: Joonhyuk Yang, Dongpil Shin, Hye Won Chung
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Abstract
We consider the problem of graph matching, or learning vertex correspondence, between two correlated stochastic block models (SBMs). The graph matching problem arises in various fields, including computer vision, natural language processing and bioinformatics, and in particular, matching graphs with inherent community structure has significance related to de-anonymization of correlated social networks. Compared to the correlated Erdos-Renyi (ER) model, where various efficient algorithms have been developed, among which a few algorithms have been proven to achieve the exact matching with constant edge correlation, no low-order polynomial algorithm has been known to achieve exact matching for the correlated SBMs with constant correlation. In this work, we propose an efficient algorithm for matching graphs with community structure, based on the comparison between partition trees rooted from each vertex, by extending the idea of Mao et al. (2021) to graphs with communities. The partition tree divides the large neighborhoods of each vertex into disjoint subsets using their edge statistics to different communities. Our algorithm is the first low-order polynomial-time algorithm achieving exact matching between two correlated SBMs with high probability in dense graphs.
VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning Challenges
Authors: Robert-Jan Bruintjes, Attila Lengyel, Marcos Baptista Rios, Osman Semih Kayhan, Davide Zambrano, Nergis Tomen, Jan van Gemert
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The third edition of the "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" workshop featured four data-impaired challenges, focusing on addressing the limitations of data availability in training deep learning models for computer vision tasks. The challenges comprised of four distinct data-impaired tasks, where participants were required to train models from scratch using a reduced number of training samples. The primary objective was to encourage novel approaches that incorporate relevant inductive biases to enhance the data efficiency of deep learning models. To foster creativity and exploration, participants were strictly prohibited from utilizing pre-trained checkpoints and other transfer learning techniques. Significant advancements were made compared to the provided baselines, where winning solutions surpassed the baselines by a considerable margin in all four tasks. These achievements were primarily attributed to the effective utilization of extensive data augmentation policies, model ensembling techniques, and the implementation of data-efficient training methods, including self-supervised representation learning. This report highlights the key aspects of the challenges and their outcomes.
An Efficient Machine Learning-based Channel Prediction Technique for OFDM Sub-Bands
Authors: Pedro E. G. Silva, Jules M. Moualeu, Pedro H. Nardelli, Rausley A. A. de Souza
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
The acquisition of accurate channel state information (CSI) is of utmost importance since it provides performance improvement of wireless communication systems. However, acquiring accurate CSI, which can be done through channel estimation or channel prediction, is an intricate task due to the complexity of the time-varying and frequency selectivity of the wireless environment. To this end, we propose an efficient machine learning (ML)-based technique for channel prediction in orthogonal frequency-division multiplexing (OFDM) sub-bands. The novelty of the proposed approach lies in the training of channel fading samples used to estimate future channel behaviour in selective fading.
Isogeometric Multi-Resolution Full Waveform Inversion based on the Finite Cell Method
Authors: Tim Bürchner, Philipp Kopp, Stefan Kollmannsberger, Ernst Rank
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Full waveform inversion (FWI) is an iterative identification process that serves to minimize the misfit of model-based simulated and experimentally measured wave field data, with the goal of identifying a field of parameters for a given physical object. The inverse optimization process of FWI is based on forward and backward solutions of the (elastic or acoustic) eave equation. In a previous paper [1], we explored opportunities of using the finite cell method (FCM) as the wave field solver to incorporate highly complex geometric models. Furthermore, we demonstrated that the identification of the model's density outperforms that of the velocity -- particularly in cases where unknown voids characterized by homogeneous Neumann boundary conditions need to be detected. The paper at hand extends this previous study: The isogeometric finite cell analysis (IGA-FCM) -- a combination of isogeometric analysis (IGA) and FCM -- is applied for the wave field solver, with the advantage that the polynomial degree and subsequently also the sampling frequency of the wave field can be increased quite easily. Since the inversion efficiency strongly depends on the accuracy of the forward and backward wave field solution and of the gradient of the functional, consistent and lumped mass matrix discretization are compared. The resolution of the grid describing the unknown material density is the decouple from the knot span grid. Finally, we propose an adaptive multi-resolution algorithm that refines the material grid only locally using an image processing-based refinement indicator. The developed inversion framework allows fast and memory-efficient wave simulation and object identification. While we study the general behavior of the proposed approach on 2D benchmark problems, a final 3D problem shows that it can also be used to identify voids in geometrically complex spatial structures.
Low-Complexity Dynamic Directional Modulation: Vulnerability and Information Leakage
Authors: Pedro E. Gória Silva, Adam Narbudowicz, Nicola Marchetti, Pedro H. J. Nardelli, Rausley A. A. de Souza, Jules M. Moualeu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
In this paper, the privacy of wireless transmissions is improved through the use of an efficient technique termed dynamic directional modulation (DDM), and is subsequently assessed in terms of the measure of information leakage. Recently, a variation of DDM termed low-power dynamic directional modulation (LPDDM) has attracted significant attention as a prominent secure transmission method due to its ability to further improve the privacy of wireless communications. Roughly speaking, this modulation operates by randomly selecting the transmitting antenna from an antenna array whose radiation pattern is well known. Thereafter, the modulator adjusts the constellation phase so as to ensure that only the legitimate receiver recovers the information. To begin with, we highlight some privacy boundaries inherent to the underlying system. In addition, we propose features that the antenna array must meet in order to increase the privacy of a wireless communication system. Last, we adopt a uniform circular monopole antenna array with equiprobable transmitting antennas in order to assess the impact of DDM on the information leakage. It is shown that the bit error rate, while being a useful metric in the evaluation of wireless communication systems, does not provide the full information about the vulnerability of the underlying system.
Abstract
Nowadays, the extensive exploitation of Deep Neural Networks (DNNs) in safety-critical applications raises new reliability concerns. In practice, methods for fault injection by emulation in hardware are efficient and widely used to study the resilience of DNN architectures for mitigating reliability issues already at the early design stages. However, the state-of-the-art methods for fault injection by emulation incur a spectrum of time-, design- and control-complexity problems. To overcome these issues, a novel resiliency assessment method called APPRAISER is proposed that applies functional approximation for a non-conventional purpose and employs approximate computing errors for its interest. By adopting this concept in the resiliency assessment domain, APPRAISER provides thousands of times speed-up in the assessment process, while keeping high accuracy of the analysis. In this paper, APPRAISER is validated by comparing it with state-of-the-art approaches for fault injection by emulation in FPGA. By this, the feasibility of the idea is demonstrated, and a new perspective in resiliency evaluation for DNNs is opened.
Off-By-One Implementation Error in J-UNIWARD
Authors: Benedikt Lorch
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract
J-UNIWARD is a popular steganography method for hiding secret messages in JPEG cover images. As a content-adaptive method, J-UNIWARD aims to embed into textured image regions where changes are difficult to detect. To this end, J-UNIWARD first assigns to each DCT coefficient an embedding cost calculated based on the image's Wavelet residual, and then uses a coding method that minimizes the cost while embedding the desired payload. Changing one DCT coefficient affects a 23x23 window of Wavelet coefficients. To speed up the costmap computation, the original implementation pre-computes the Wavelet residual and then considers per changed DCT coefficient a 23x23 window of the Wavelet residual. However, the implementation accesses a window accidentally shifted by one pixel to the bottom right. In this report, we evaluate the effect of this off-by-one error on the resulting costmaps. Some image blocks are over-priced while other image blocks are under-priced, but the difference is relatively small. The off-by-one error seems to make little difference for learning-based steganalysis.
IDAS: Intent Discovery with Abstractive Summarization
Authors: Maarten De Raedt, Fréderic Godin, Thomas Demeester, Chris Develder
Abstract
Intent discovery is the task of inferring latent intents from a set of unlabeled utterances, and is a useful step towards the efficient creation of new conversational agents. We show that recent competitive methods in intent discovery can be outperformed by clustering utterances based on abstractive summaries, i.e., "labels", that retain the core elements while removing non-essential information. We contribute the IDAS approach, which collects a set of descriptive utterance labels by prompting a Large Language Model, starting from a well-chosen seed set of prototypical utterances, to bootstrap an In-Context Learning procedure to generate labels for non-prototypical utterances. The utterances and their resulting noisy labels are then encoded by a frozen pre-trained encoder, and subsequently clustered to recover the latent intents. For the unsupervised task (without any intent labels) IDAS outperforms the state-of-the-art by up to +7.42% in standard cluster metrics for the Banking, StackOverflow, and Transport datasets. For the semi-supervised task (with labels for a subset of intents) IDAS surpasses 2 recent methods on the CLINC benchmark without even using labeled data.
A Survey of Label-Efficient Deep Learning for 3D Point Clouds
Authors: Aoran Xiao, Xiaoqin Zhang, Ling Shao, Shijian Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In the past decade, deep neural networks have achieved significant progress in point cloud learning. However, collecting large-scale precisely-annotated training data is extremely laborious and expensive, which hinders the scalability of existing point cloud datasets and poses a bottleneck for efficient exploration of point cloud data in various tasks and applications. Label-efficient learning offers a promising solution by enabling effective deep network training with much-reduced annotation efforts. This paper presents the first comprehensive survey of label-efficient learning of point clouds. We address three critical questions in this emerging research field: i) the importance and urgency of label-efficient learning in point cloud processing, ii) the subfields it encompasses, and iii) the progress achieved in this area. To achieve this, we propose a taxonomy that organizes label-efficient learning methods based on the data prerequisites provided by different types of labels. We categorize four typical label-efficient learning approaches that significantly reduce point cloud annotation efforts: data augmentation, domain transfer learning, weakly-supervised learning, and pretrained foundation models. For each approach, we outline the problem setup and provide an extensive literature review that showcases relevant progress and challenges. Finally, we share insights into current research challenges and potential future directions. A project associated with this survey has been built at \url{https://github.com/xiaoaoran/3D_label_efficient_learning}.
Homogenization of nondivergence-form elliptic equations with discontinuous coefficients and finite element approximation of the homogenized problem
Authors: Timo Sprekeler
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
We study the homogenization of the equation $-A(\frac{\cdot}{\varepsilon}):D^2 u_{\varepsilon} = f$ posed in a bounded convex domain $\Omega\subset \mathbb{R}^n$ subject to a Dirichlet boundary condition and the numerical approximation of the corresponding homogenized problem, where the measurable, uniformly elliptic, periodic and symmetric diffusion matrix $A$ is merely assumed to be essentially bounded and (if $n>2$) to satisfy the Cordes condition. In the first part, we show existence and uniqueness of an invariant measure by reducing to a Lax--Milgram-type problem, we obtain $L^2$-bounds for periodic problems in double-divergence-form, we prove homogenization under minimal regularity assumptions, and we generalize known corrector bounds and results on optimal convergence rates from the classical case of H\"{o}lder continuous coefficients to the present case. In the second part, we suggest and rigorously analyze an approximation scheme for the effective coefficient matrix and the solution to the homogenized problem based on a finite element method for the approximation of the invariant measure, and we demonstrate the performance of the scheme through numerical experiments.
Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization
Authors: Anthony Bardou, Patrick Thiran, Thomas Begin
Abstract
Bayesian Optimization (BO) is typically used to optimize an unknown function $f$ that is noisy and costly to evaluate, by exploiting an acquisition function that must be maximized at each optimization step. Although provably asymptotically optimal BO algorithms are efficient at optimizing low-dimensional functions, scaling them to high-dimensional spaces remains an open research problem, often tackled by assuming an additive structure for $f$. However, such algorithms introduce additional restrictive assumptions on the additive structure that reduce their applicability domain. In this paper, we relax the restrictive assumptions on the additive structure of $f$, at the expense of weakening the maximization guarantees of the acquisition function, and we address the over-exploration problem for decentralized BO algorithms. To these ends, we propose DuMBO, an asymptotically optimal decentralized BO algorithm that achieves very competitive performance against state-of-the-art BO algorithms, especially when the additive structure of $f$ does not exist or comprises high-dimensional factors.
Parametric Shape Holomorphy of Boundary Integral Operators with Applications
Abstract
We consider a family of boundary integral operators supported on a collection of parametrically defined bounded Lipschitz boundaries. Thus, the boundary integral operators themselves also depend on the parametric variables, leading to a parameter-to-operator map. One of the main results of this article is to establish the analytic or holomorphic dependence of said boundary integral operators upon the parametric variables, i.e., of the parameter-to-operator map. As a direct consequence we also establish holomorphic dependence of solutions to boundary integral equations, i.e., holomorphy of the parameter-to-solution map. To this end, we construct a holomorphic extension to complex-valued boundary deformations and investigate the complex Fr\'echet differentiability of boundary integral operators with respect to each parametric variable. The established parametric holomorphy results have been identified as a key property to derive best $N$-term approximation rates to overcome the so-called curse of dimensionality in the approximation of parametric maps with distributed, high-dimensional inputs. To demonstrate the applicability of the derived results, we consider as a concrete example the sound-soft Helmholtz acoustic scattering problem and its frequency-robust boundary integral formulations. For this particular application, we explore the consequences of our results in reduced order modelling, Bayesian shape inversion, and the construction of efficient surrogates using artificial neural networks.
Multi-Channel Operation for the Release 2 of ETSI Cooperative Intelligent Transport Systems
Authors: Alessandro Bazzi, Miguel Sepulcre, Quentin Delooz, Andreas Festag, Jonas Vogt, Horst Wieker, Friedbert Berens, Paul Spaanderman
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Vehicles and road infrastructure are starting to be equipped with vehicle-to-everything (V2X) communication solutions to increase road safety and provide new services to drivers and passengers. In Europe, the deployment is based on a set of Release 1 standards developed by ETSI to support basic use cases for cooperative intelligent transport systems (C-ITS). For them, the capacity of a single 10 MHz channel in the ITS band at 5.9 GHz is considered sufficient. At the same time, the ITS stakeholders are working towards several advanced use cases, which imply a significant increment of data traffic and the need for multiple channels. To address this issue, ETSI has recently standardized a new multi-channel operation (MCO) concept for flexible, efficient, and future-proof use of multiple channels. This new concept is defined in a set of new specifications that represent the foundation for the future releases of C-ITS standards. The present paper provides a comprehensive review of the new set of specifications, describing the main entities extending the C-ITS architecture at the different layers of the protocol stack, In addition, the paper provides representative examples that describe how these MCO standards will be used in the future and discusses some of the main open issues arising. The review and analysis of this paper facilitate the understanding and motivation of the new set of Release 2 ETSI specifications for MCO and the identification of new research opportunities.
Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN
Authors: Yangfan Hu, Qian Zheng, Xudong Jiang, Gang Pan
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Spiking neural networks (SNNs) have shown advantages in computation and energy efficiency over traditional artificial neural networks (ANNs) thanks to their event-driven representations. SNNs also replace weight multiplications in ANNs with additions, which are more energy-efficient and less computationally intensive. However, it remains a challenge to train deep SNNs due to the discrete spike function. A popular approach to circumvent this challenge is ANN-to-SNN conversion. However, due to the quantization error and accumulating error, it often requires lots of time steps (high inference latency) to achieve high performance, which negates SNN's advantages. To this end, this paper proposes Fast-SNN that achieves high performance with low latency. We demonstrate the equivalent mapping between temporal quantization in SNNs and spatial quantization in ANNs, based on which the minimization of the quantization error is transferred to quantized ANN training. With the minimization of the quantization error, we show that the sequential error is the primary cause of the accumulating error, which is addressed by introducing a signed IF neuron model and a layer-wise fine-tuning mechanism. Our method achieves state-of-the-art performance and low latency on various computer vision tasks, including image classification, object detection, and semantic segmentation. Codes are available at: https://github.com/yangfan-hu/Fast-SNN.
Handling Large Discrete Action Spaces via Dynamic Neighborhood Construction
Authors: Fabian Akkerman, Julius Luy, Wouter van Heeswijk, Maximilian Schiffer
Abstract
Large discrete action spaces remain a central challenge for reinforcement learning methods. Such spaces are encountered in many real-world applications, e.g., recommender systems, multi-step planning, and inventory replenishment. The mapping of continuous proxies to discrete actions is a promising paradigm for handling large discrete action spaces. Existing continuous-to-discrete mapping approaches involve searching for discrete neighboring actions in a static pre-defined neighborhood, which requires discrete neighbor lookups across the entire action space. Hence, scalability issues persist. To mitigate this drawback, we propose a novel Dynamic Neighborhood Construction (DNC) method, which dynamically constructs a discrete neighborhood to map the continuous proxy, thus efficiently exploiting the underlying action space. We demonstrate the robustness of our method by benchmarking it against three state-of-the-art approaches designed for large discrete action spaces across three different environments. Our results show that DNC matches or outperforms state-of-the-art approaches while being more computationally efficient. Furthermore, our method scales to action spaces that so far remained computationally intractable for existing methodologies.
Hidden Stabilizers, the Isogeny To Endomorphism Ring Problem and the Cryptanalysis of pSIDH
Authors: Mingjie Chen, Muhammad Imran, Gábor Ivanyos, Péter Kutas, Antonin Leroux, Christophe Petit
Abstract
The Isogeny to Endomorphism Ring Problem (IsERP) asks to compute the endomorphism ring of the codomain of an isogeny between supersingular curves in characteristic $p$ given only a representation for this isogeny, i.e. some data and an algorithm to evaluate this isogeny on any torsion point. This problem plays a central role in isogeny-based cryptography; it underlies the security of pSIDH protocol (ASIACRYPT 2022) and it is at the heart of the recent attacks that broke the SIDH key exchange. Prior to this work, no efficient algorithm was known to solve IsERP for a generic isogeny degree, the hardest case seemingly when the degree is prime. In this paper, we introduce a new quantum polynomial-time algorithm to solve IsERP for isogenies whose degrees are odd and have $O(\log\log p)$ many prime factors. As main technical tools, our algorithm uses a quantum algorithm for computing hidden Borel subgroups, a group action on supersingular isogenies from EUROCRYPT 2021, various algorithms for the Deuring correspondence and a new algorithm to lift arbitrary quaternion order elements modulo an odd integer $N$ with $O(\log\log p)$ many prime factors to powersmooth elements. As a main consequence for cryptography, we obtain a quantum polynomial-time key recovery attack on pSIDH. The technical tools we use may also be of independent interest.
Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations
Authors: Raphael Trumpp, Martin Büchner, Abhinav Valada, Marco Caccamo
Abstract
Autonomous driving involves complex decision-making in highly interactive environments, requiring thoughtful negotiation with other traffic participants. While reinforcement learning provides a way to learn such interaction behavior, efficient learning critically depends on scalable state representations. Contrary to imitation learning methods, high-dimensional state representations still constitute a major bottleneck for deep reinforcement learning methods in autonomous driving. In this paper, we study the challenges of constructing bird's-eye-view representations for autonomous driving and propose a recurrent learning architecture for long-horizon driving. Our PPO-based approach, called RecurrDriveNet, is demonstrated on a simulated autonomous driving task in CARLA, where it outperforms traditional frame-stacking methods while only requiring one million experiences for training. RecurrDriveNet causes less than one infraction per driven kilometer by interacting safely with other road users.
How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases
Abstract
Accurate syntactic representations are essential for robust generalization in natural language. Recent work has found that pre-training can teach language models to rely on hierarchical syntactic features - as opposed to incorrect linear features - when performing tasks after fine-tuning. We test what aspects of pre-training are important for endowing encoder-decoder Transformers with an inductive bias that favors hierarchical syntactic generalizations. We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus, diagnosing inductive biases using two syntactic transformation tasks: question formation and passivization, both in English. We find that the number of parameters alone does not explain hierarchical generalization: model depth plays greater role than model width. We also find that pre-training on simpler language, such as child-directed speech, induces a hierarchical bias using an order-of-magnitude less data than pre-training on more typical datasets based on web text or Wikipedia; this suggests that in cognitively plausible language acquisition settings, neural language models may be more data-efficient than previously thought.
Neural LerPlane Representations for Fast 4D Reconstruction of Deformable Tissues
Abstract
Reconstructing deformable tissues from endoscopic stereo videos in robotic surgery is crucial for various clinical applications. However, existing methods relying only on implicit representations are computationally expensive and require dozens of hours, which limits further practical applications. To address this challenge, we introduce LerPlane, a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting. LerPlane treats surgical procedures as 4D volumes and factorizes them into explicit 2D planes of static and dynamic fields, leading to a compact memory footprint and significantly accelerated optimization. The efficient factorization is accomplished by fusing features obtained through linear interpolation of each plane and enables using lightweight neural networks to model surgical scenes. Besides, LerPlane shares static fields, significantly reducing the workload of dynamic tissue modeling. We also propose a novel sample scheme to boost optimization and improve performance in regions with tool occlusion and large motions. Experiments on DaVinci robotic surgery videos demonstrate that LerPlane accelerates optimization by over 100$\times$ while maintaining high quality across various non-rigid deformations, showing significant promise for future intraoperative surgery applications.
Fully Dynamic Submodular Maximization over Matroids
Abstract
Maximizing monotone submodular functions under a matroid constraint is a classic algorithmic problem with multiple applications in data mining and machine learning. We study this classic problem in the fully dynamic setting, where elements can be both inserted and deleted in real-time. Our main result is a randomized algorithm that maintains an efficient data structure with an $\tilde{O}(k^2)$ amortized update time (in the number of additions and deletions) and yields a $4$-approximate solution, where $k$ is the rank of the matroid.
MSKdeX: Musculoskeletal (MSK) decomposition from an X-ray image for fine-grained estimation of lean muscle mass and muscle volume
Authors: Yi Gu, Yoshito Otake, Keisuke Uemura, Masaki Takao, Mazen Soufi, Yuta Hiasa, Hugues Talbot, Seiji Okata, Nobuhiko Sugano, Yoshinobu Sato
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Musculoskeletal diseases such as sarcopenia and osteoporosis are major obstacles to health during aging. Although dual-energy X-ray absorptiometry (DXA) and computed tomography (CT) can be used to evaluate musculoskeletal conditions, frequent monitoring is difficult due to the cost and accessibility (as well as high radiation exposure in the case of CT). We propose a method (named MSKdeX) to estimate fine-grained muscle properties from a plain X-ray image, a low-cost, low-radiation, and highly accessible imaging modality, through musculoskeletal decomposition leveraging fine-grained segmentation in CT. We train a multi-channel quantitative image translation model to decompose an X-ray image into projections of CT of individual muscles to infer the lean muscle mass and muscle volume. We propose the object-wise intensity-sum loss, a simple yet surprisingly effective metric invariant to muscle deformation and projection direction, utilizing information in CT and X-ray images collected from the same patient. While our method is basically an unpaired image-to-image translation, we also exploit the nature of the bone's rigidity, which provides the paired data through 2D-3D rigid registration, adding strong pixel-wise supervision in unpaired training. Through the evaluation using a 539-patient dataset, we showed that the proposed method significantly outperformed conventional methods. The average Pearson correlation coefficient between the predicted and CT-derived ground truth metrics was increased from 0.460 to 0.863. We believe our method opened up a new musculoskeletal diagnosis method and has the potential to be extended to broader applications in multi-channel quantitative image translation tasks. Our source code will be released soon.
Image Registration of In Vivo Micro-Ultrasound and Ex Vivo Pseudo-Whole Mount Histopathology Images of the Prostate: A Proof-of-Concept Study
Authors: Muhammad Imran, Brianna Nguyen, Jake Pensa, Sara M. Falzarano, Anthony E. Sisk, Muxua Liang, John Michael DiBianco, Li-Ming Su, Yuyin Zhou, Wayne G. Brisbane, Wei Shao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Early diagnosis of prostate cancer significantly improves a patient's 5-year survival rate. Biopsy of small prostate cancers is improved with image-guided biopsy. MRI-ultrasound fusion-guided biopsy is sensitive to smaller tumors but is underutilized due to the high cost of MRI and fusion equipment. Micro-ultrasound (micro-US), a novel high-resolution ultrasound technology, provides a cost-effective alternative to MRI while delivering comparable diagnostic accuracy. However, the interpretation of micro-US is challenging due to subtle gray scale changes indicating cancer vs normal tissue. This challenge can be addressed by training urologists with a large dataset of micro-US images containing the ground truth cancer outlines. Such a dataset can be mapped from surgical specimens (histopathology) onto micro-US images via image registration. In this paper, we present a semi-automated pipeline for registering in vivo micro-US images with ex vivo whole-mount histopathology images. Our pipeline begins with the reconstruction of pseudo-whole-mount histopathology images and a 3D micro-US volume. Each pseudo-whole-mount histopathology image is then registered with the corresponding axial micro-US slice using a two-stage approach that estimates an affine transformation followed by a deformable transformation. We evaluated our registration pipeline using micro-US and histopathology images from 18 patients who underwent radical prostatectomy. The results showed a Dice coefficient of 0.94 and a landmark error of 2.7 mm, indicating the accuracy of our registration pipeline. This proof-of-concept study demonstrates the feasibility of accurately aligning micro-US and histopathology images. To promote transparency and collaboration in research, we will make our code and dataset publicly available.
A Geometric Perspective on Diffusion Models
Authors: Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang
Abstract
Recent years have witnessed significant progress in developing efficient training and fast sampling approaches for diffusion models. A recent remarkable advancement is the use of stochastic differential equations (SDEs) to describe data perturbation and generative modeling in a unified mathematical framework. In this paper, we reveal several intriguing geometric structures of diffusion models and contribute a simple yet powerful interpretation to their sampling dynamics. Through carefully inspecting a popular variance-exploding SDE and its marginal-preserving ordinary differential equation (ODE) for sampling, we discover that the data distribution and the noise distribution are smoothly connected with an explicit, quasi-linear sampling trajectory, and another implicit denoising trajectory, which even converges faster in terms of visual quality. We also establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm, with which we can characterize the asymptotic behavior of diffusion models and identify the score deviation. These new geometric observations enable us to improve previous sampling algorithms, re-examine latent interpolation, as well as re-explain the working principles of distillation-based fast sampling techniques.
MicroSegNet: A Deep Learning Approach for Prostate Segmentation on Micro-Ultrasound Images
Authors: Hongxu Jiang, Muhammad Imran, Preethika Muralidharan, Anjali Patel, Jake Pensa, Muxuan Liang, Tarik Benidir, Joseph R. Grajo, Jason P. Joseph, Russell Terry, John Michael DiBianco, Li-Ming Su, Yuyin Zhou, Wayne G. Brisbane, Wei Shao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, delivering comparable accuracy for diagnosing prostate cancer to MRI but at a lower cost. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. This paper proposes a deep learning approach for automated, fast, and accurate prostate segmentation on micro-US images. Prostate segmentation on micro-US is challenging due to artifacts and indistinct borders between the prostate, bladder, and urethra in the midline. We introduce MicroSegNet, a multi-scale annotation-guided Transformer UNet model to address this challenge. During the training process, MicroSegNet focuses more on regions that are hard to segment (challenging regions), where expert and non-expert annotations show discrepancies. We achieve this by proposing an annotation-guided cross entropy loss that assigns larger weight to pixels in hard regions and lower weight to pixels in easy regions. We trained our model using micro-US images from 55 patients, followed by evaluation on 20 patients. Our MicroSegNet model achieved a Dice coefficient of 0.942 and a Hausdorff distance of 2.11 mm, outperforming several state-of-the-art segmentation methods, as well as three human annotators with different experience levels. We will make our code and dataset publicly available to promote transparency and collaboration in research.
DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Authors: Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although Transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection and recognition simultaneously and efficiently. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations. Furthermore, we show the surprisingly good extensibility of our method, in terms of character class, language type, and task. On the one hand, DeepSolo not only performs well in English scenes but also masters the Chinese transcription with complex font structure and a thousand-level character classes. On the other hand, based on the extensibility of DeepSolo, we launch DeepSolo++ for multilingual text spotting, making a further step to let Transformer decoder with explicit points solo for multilingual text detection, recognition, and script identification all at once. Extensive experiments on public benchmarks demonstrate that our simple approach achieves better training efficiency compared with Transformer-based models and outperforms the previous state-of-the-art. In addition, DeepSolo and DeepSolo++ are also compatible with line annotations, which require much less annotation cost than polygons. The code is available at \url{https://github.com/ViTAE-Transformer/DeepSolo}.
A Study of Bayesian Neural Network Surrogates for Bayesian Optimization
Authors: Yucen Lily Li, Tim G. J. Rudner, Andrew Gordon Wilson
Abstract
Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs and partially stochastic models such as deep kernel learning. We evaluate this collection of surrogate models on diverse problems with varying dimensionality, number of objectives, non-stationarity, and discrete and continuous inputs. We find: (i) the ranking of methods is highly problem dependent, suggesting the need for tailored inductive biases; (ii) HMC is the most successful approximate inference procedure for fully stochastic BNNs; (iii) full stochasticity may be unnecessary as deep kernel learning is relatively competitive; (iv) infinite-width BNNs are particularly promising, especially in high dimensions.
Simulation and Retargeting of Complex Multi-Character Interactions
Authors: Yunbo Zhang, Deepak Gopinath, Yuting Ye, Jessica Hodgins, Greg Turk, Jungdam Won
Abstract
We present a method for reproducing complex multi-character interactions for physically simulated humanoid characters using deep reinforcement learning. Our method learns control policies for characters that imitate not only individual motions, but also the interactions between characters, while maintaining balance and matching the complexity of reference data. Our approach uses a novel reward formulation based on an interaction graph that measures distances between pairs of interaction landmarks. This reward encourages control policies to efficiently imitate the character's motion while preserving the spatial relationships of the interactions in the reference motion. We evaluate our method on a variety of activities, from simple interactions such as a high-five greeting to more complex interactions such as gymnastic exercises, Salsa dancing, and box carrying and throwing. This approach can be used to ``clean-up'' existing motion capture data to produce physically plausible interactions or to retarget motion to new characters with different sizes, kinematics or morphologies while maintaining the interactions in the original data.
Deception by Omission: Using Adversarial Missingness to Poison Causal Structure Learning
Authors: Deniz Koyuncu, Alex Gittens, Bülent Yener, Moti Yung
Abstract
Inference of causal structures from observational data is a key component of causal machine learning; in practice, this data may be incompletely observed. Prior work has demonstrated that adversarial perturbations of completely observed training data may be used to force the learning of inaccurate causal structural models (SCMs). However, when the data can be audited for correctness (e.g., it is crytographically signed by its source), this adversarial mechanism is invalidated. This work introduces a novel attack methodology wherein the adversary deceptively omits a portion of the true training data to bias the learned causal structures in a desired manner. Theoretically sound attack mechanisms are derived for the case of arbitrary SCMs, and a sample-efficient learning-based heuristic is given for Gaussian SCMs. Experimental validation of these approaches on real and synthetic data sets demonstrates the effectiveness of adversarial missingness attacks at deceiving popular causal structure learning algorithms.
Latent Exploration for Reinforcement Learning
Authors: Alberto Silvio Chiappa, Alessandro Marin Vargas, Ann Zixiang Huang, Alexander Mathis
Abstract
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. Due to the curse of dimensionality, learning policies that map high-dimensional sensory input to motor output is particularly challenging. During training, state of the art methods (SAC, PPO, etc.) explore the environment by perturbing the actuation with independent Gaussian noise. While this unstructured exploration has proven successful in numerous tasks, it ought to be suboptimal for overactuated systems. When multiple actuators, such as motors or muscles, drive behavior, uncorrelated perturbations risk diminishing each other's effect, or modifying the behavior in a task-irrelevant way. While solutions to introduce time correlation across action perturbations exist, introducing correlation across actuators has been largely ignored. Here, we propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network, which can be seamlessly integrated with on- and off-policy algorithms. We demonstrate that the noisy actions generated by perturbing the network's activations can be modeled as a multivariate Gaussian distribution with a full covariance matrix. In the PyBullet locomotion tasks, Lattice-SAC achieves state of the art results, and reaches 18% higher reward than unstructured exploration in the Humanoid environment. In the musculoskeletal control environments of MyoSuite, Lattice-PPO achieves higher reward in most reaching and object manipulation tasks, while also finding more energy-efficient policies with reductions of 20-60%. Overall, we demonstrate the effectiveness of structured action noise in time and actuator space for complex motor control tasks.
Decision-Oriented Dialogue for Human-AI Collaboration
Authors: Jessy Lin, Nicholas Tomlin, Jacob Andreas, Jason Eisner
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
We describe a class of tasks called dialogue decision problems, in which AI assistants must collaborate with one or more humans via natural language to help them make complex decisions. We formalize three domains in which users face everyday decisions: (1) choosing an assignment of reviewers to conference papers, (2) planning a multi-step itinerary in a city, and (3) negotiating travel plans for a group of friends. In each of these settings, AI assistants and users have disparate abilities that they must combine to arrive at the best decision: assistants can access and process large amounts of information, while users have preferences and constraints external to the system. For each task, we build a dialogue environment where agents receive a reward based on the quality of the final decision they reach. Using these environments, we collect human-human dialogues with humans playing the role of assistant. To compare how current AI assistants communicate in these settings, we present baselines using large language models in self-play. Finally, we highlight a number of challenges models face in decision-oriented dialogues, ranging from efficient communication to reasoning and optimization, and release our environments as a testbed for future modeling work.
Efficient Diffusion Policies for Offline Reinforcement Learning
Authors: Bingyi Kang, Xiao Ma, Chao Du, Tianyu Pang, Shuicheng Yan
Abstract
Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. EDP approximately constructs actions from corrupted ones at training to avoid running the sampling chain. We conduct extensive experiments on the D4RL benchmark. The results show that EDP can reduce the diffusion policy training time from 5 days to 5 hours on gym-locomotion tasks. Moreover, we show that EDP is compatible with various offline RL algorithms (TD3, CRR, and IQL) and achieves new state-of-the-art on D4RL by large margins over previous methods. Our code is available at https://github.com/sail-sg/edp.
Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
Abstract
Recent years have witnessed considerable achievements in editing images with text instructions. When applying these editors to dynamic scene editing, the new-style scene tends to be temporally inconsistent due to the frame-by-frame nature of these 2D editors. To tackle this issue, we propose Control4D, a novel approach for high-fidelity and temporally consistent 4D portrait editing. Control4D is built upon an efficient 4D representation with a 2D diffusion-based editor. Instead of using direct supervisions from the editor, our method learns a 4D GAN from it and avoids the inconsistent supervision signals. Specifically, we employ a discriminator to learn the generation distribution based on the edited images and then update the generator with the discrimination signals. For more stable training, multi-level information is extracted from the edited images and used to facilitate the learning of the generator. Experimental results show that Control4D surpasses previous approaches and achieves more photo-realistic and consistent 4D editing performances. The link to our project website is https://control4darxiv.github.io.
Too Large; Data Reduction for Vision-Language Pre-Training
Authors: Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major steps. First, a codebook-based encoder-decoder captioner is developed to select representative samples. Second, a new caption is generated to complement the original captions for selected samples, mitigating the text-image misalignment problem while maintaining uniqueness. As the result, TL;DR enables us to reduce the large dataset into a small set of high-quality data, which can serve as an alternative pre-training dataset. This algorithm significantly speeds up the time-consuming pretraining process. Specifically, TL;DR can compress the mainstream VLP datasets at a high ratio, e.g., reduce well-cleaned CC3M dataset from 2.82M to 0.67M ($\sim$24\%) and noisy YFCC15M from 15M to 2.5M ($\sim$16.7\%). Extensive experiments with three popular VLP models over seven downstream tasks show that VLP model trained on the compressed dataset provided by TL;DR can perform similar or even better results compared with training on the full-scale dataset. The code will be made available at \url{https://github.com/showlab/data-centric.vlp}.
Keyword: faster
AdANNS: A Framework for Adaptive Semantic Search
Authors: Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Abstract
Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive retrieval. In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. To this end, we introduce AdANNS, a novel ANNS design framework that explicitly leverages the flexibility of Matryoshka Representations. We demonstrate state-of-the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF at the same compute budget; and matches accuracy while being up to 90x faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline constructed using rigid representations -- same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. Code is open-sourced at https://github.com/RAIVNLab/AdANNS.
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Authors: Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro
Abstract
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
Graph Entropy Minimization for Semi-supervised Node Classification
Authors: Yi Luo, Guangchun Luo, Ke Qin, Aiguo Chen
Abstract
Node classifiers are required to comprehensively reduce prediction errors, training resources, and inference latency in the industry. However, most graph neural networks (GNN) concentrate only on one or two of them. The compromised aspects thus are the shortest boards on the bucket, hindering their practical deployments for industrial-level tasks. This work proposes a novel semi-supervised learning method termed Graph Entropy Minimization (GEM) to resolve the three issues simultaneously. GEM benefits its one-hop aggregation from massive uncategorized nodes, making its prediction accuracy comparable to GNNs with two or more hops message passing. It can be decomposed to support stochastic training with mini-batches of independent edge samples, achieving extremely fast sampling and space-saving training. While its one-hop aggregation is faster in inference than deep GNNs, GEM can be further accelerated to an extreme by deriving a non-hop classifier via online knowledge distillation. Thus, GEM can be a handy choice for latency-restricted and error-sensitive services running on resource-constraint hardware. Code is available at https://github.com/cf020031308/GEM.
Recasting Self-Attention with Holographic Reduced Representations
Authors: Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt
Abstract
In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the $\mathcal{O}(T^2)$ memory and $\mathcal{O}(T^2 H)$ compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of $T \geq 100,000$ are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a ``Hrrformer'' we obtain several benefits including $\mathcal{O}(T H \log H)$ time complexity, $\mathcal{O}(T H)$ space complexity, and convergence in $10\times$ fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to $280\times$ faster to train on the Long Range Arena benchmark. Code is available at \url{https://github.com/NeuromorphicComputationResearchProgram/Hrrformer}
Vandermonde Neural Operators
Authors: Levi Lingsch, Mike Michelis, Sirani M. Perera, Robert K. Katzschmann, Siddartha Mishra
Abstract
Fourier Neural Operators (FNOs) have emerged as very popular machine learning architectures for learning operators, particularly those arising in PDEs. However, as FNOs rely on the fast Fourier transform for computational efficiency, the architecture can be limited to input data on equispaced Cartesian grids. Here, we generalize FNOs to handle input data on non-equispaced point distributions. Our proposed model, termed as Vandermonde Neural Operator (VNO), utilizes Vandermonde-structured matrices to efficiently compute forward and inverse Fourier transforms, even on arbitrarily distributed points. We present numerical experiments to demonstrate that VNOs can be significantly faster than FNOs, while retaining comparable accuracy, and improve upon accuracy of comparable non-equispaced methods such as the Geo-FNO.
Deep learning and MCMC with aggVAE for shifting administrative boundaries: mapping malaria prevalence in Kenya
Authors: Elizaveta Semenova, Swapnil Mishra, Samir Bhatt, Seth Flaxman, H Juliette T Unwin
Abstract
Model-based disease mapping remains a fundamental policy-informing tool in public health and disease surveillance with hierarchical Bayesian models being the current state-of-the-art approach. When working with areal data, e.g. aggregates at the administrative unit level such as district or province, routinely used models rely on the adjacency structure of areal units to account for spatial correlations. The goal of disease surveillance systems is to track disease outcomes over time, but this provides challenging in situations of crises, such as political changes, leading to changes of administrative boundaries. Kenya is an example of such country. Moreover, adjacency-based approach ignores the continuous nature of spatial processes and cannot solve the change-of-support problem, i.e. when administrative boundaries change. We present a novel, practical, and easy to implement solution relying on a methodology combining deep generative modelling and fully Bayesian inference. We build on the recent work of PriorVAE able to encode spatial priors over small areas with variational autoencoders, to map malaria prevalence in Kenya. We solve the change-of-support problem arising from Kenya changing its district boundaries in 2010. We draw realisations of the Gaussian Process (GP) prior over a fine artificial spatial grid representing continuous space and then aggregate these realisations to the level of administrative boundaries. The aggregated values are then encoded using the PriorVAE technique. The trained priors (aggVAE) are then used at the inference stage instead of the GP priors within a Markov chain Monte Carlo (MCMC) scheme. We demonstrate that it is possible to use the flexible and appropriate model for areal data based on aggregation of continuous priors, and that inference is orders of magnitude faster when using aggVAE than combining the original GP priors and the aggregation step.
A technique to jointly estimate depth and depth uncertainty for unmanned aerial vehicles
Abstract
When used by autonomous vehicles for trajectory planning or obstacle avoidance, depth estimation methods need to be reliable. Therefore, estimating the quality of the depth outputs is critical. In this paper, we show how M4Depth, a state-of-the-art depth estimation method designed for unmanned aerial vehicle (UAV) applications, can be enhanced to perform joint depth and uncertainty estimation. For that, we present a solution to convert the uncertainty estimates related to parallax generated by M4Depth into uncertainty estimates related to depth, and show that it outperforms the standard probabilistic approach. Our experiments on various public datasets demonstrate that our method performs consistently, even in zero-shot transfer. Besides, our method offers a compelling value when compared to existing multi-view depth estimation methods as it performs similarly on a multi-view depth estimation benchmark despite being 2.5 times faster and causal, as opposed to other methods. The code of our method is publicly available at https://github.com/michael-fonder/M4DepthU .
A Geometric Perspective on Diffusion Models
Authors: Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang
Abstract
Recent years have witnessed significant progress in developing efficient training and fast sampling approaches for diffusion models. A recent remarkable advancement is the use of stochastic differential equations (SDEs) to describe data perturbation and generative modeling in a unified mathematical framework. In this paper, we reveal several intriguing geometric structures of diffusion models and contribute a simple yet powerful interpretation to their sampling dynamics. Through carefully inspecting a popular variance-exploding SDE and its marginal-preserving ordinary differential equation (ODE) for sampling, we discover that the data distribution and the noise distribution are smoothly connected with an explicit, quasi-linear sampling trajectory, and another implicit denoising trajectory, which even converges faster in terms of visual quality. We also establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm, with which we can characterize the asymptotic behavior of diffusion models and identify the score deviation. These new geometric observations enable us to improve previous sampling algorithms, re-examine latent interpolation, as well as re-explain the working principles of distillation-based fast sampling techniques.
Feature Learning in Image Hierarchies using Functional Maximal Correlation
Authors: Bo Hu, Yuheng Bu, José C. Príncipe
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Abstract
This paper proposes the Hierarchical Functional Maximal Correlation Algorithm (HFMCA), a hierarchical methodology that characterizes dependencies across two hierarchical levels in multiview systems. By framing view similarities as dependencies and ensuring contrastivity by imposing orthonormality, HFMCA achieves faster convergence and increased stability in self-supervised learning. HFMCA defines and measures dependencies within image hierarchies, from pixels and patches to full images. We find that the network topology for approximating orthonormal basis functions aligns with a vanilla CNN, enabling the decomposition of density ratios between neighboring layers of feature maps. This approach provides powerful interpretability, revealing the resemblance between supervision and self-supervision through the lens of internal representations.
Keyword: mobile
LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model
Abstract
This paper presents a novel approach in autonomous robot control, named LLM-BRAIn, that makes possible robot behavior generation, based on operator's commands. LLM-BRAIn is a transformer-based Large Language Model (LLM) fine-tuned from Stanford Alpaca 7B model to generate robot behavior tree (BT) from the text description. We train the LLM-BRAIn on 8,5k instruction-following demonstrations, generated in the style of self-instruct using text-davinchi-003. The developed model accurately builds complex robot behavior while remaining small enough to be run on the robot's onboard microcomputer. The model gives structural and logical correct BTs and can successfully manage instructions that were not presented in training set. The experiment did not reveal any significant subjective differences between BTs generated by LLM-BRAIn and those created by humans (on average, participants were able to correctly distinguish between LLM-BRAIn generated BTs and human-created BTs in only 4.53 out of 10 cases, indicating that their performance was close to random chance). The proposed approach potentially can be applied to mobile robotics, drone operation, robot manipulator systems and Industry 4.0.
Vision Transformers for Mobile Applications: A Short Survey
Abstract
Vision Transformers (ViTs) have demonstrated state-of-the-art performance on many Computer Vision Tasks. Unfortunately, deploying these large-scale ViTs is resource-consuming and impossible for many mobile devices. While most in the community are building for larger and larger ViTs, we ask a completely opposite question: How small can a ViT be within the tradeoffs of accuracy and inference latency that make it suitable for mobile deployment? We look into a few ViTs specifically designed for mobile applications and observe that they modify the transformer's architecture or are built around the combination of CNN and transformer. Recent work has also attempted to create sparse ViT networks and proposed alternatives to the attention module. In this paper, we study these architectures, identify the challenges and analyze what really makes a vision transformer suitable for mobile applications. We aim to serve as a baseline for future research direction and hopefully lay the foundation to choose the exemplary vision transformer architecture for your application running on mobile devices.
User Driven Functionality Deletion for Mobile Apps
Authors: Maleknaz Nayebi, Konstantin Kuznetsov, Andreas Zeller, Guenther Ruhe
Abstract
Evolving software with an increasing number of features is harder to understand and thus harder to use. Software release planning has been concerned with planning these additions. Moreover, software of increasing size takes more effort to be maintained. In the domain of mobile apps, too much functionality can easily impact usability, maintainability, and resource consumption. Hence, it is important to understand the extent to which the law of continuous growth applies to mobile apps. Previous work showed that the deletion of functionality is common and sometimes driven by user reviews. However, it is not known if these deletions are visible or important to the app users. In this study, we performed a survey study with 297 mobile app users to understand the significance of functionality deletion for them. Our results showed that for the majority of users, the deletion of features corresponds with negative sentiments and change in usage and even churn. Motivated by these preliminary results, we propose RADIATION to input user reviews and recommend if any functionality should be deleted from an app's User Interface (UI). We evaluate RADIATION using historical data and surveying developers' opinions. From the analysis of 190,062 reviews from 115 randomly selected apps, we show that RADIATION can recommend functionality deletion with an average F-Score of 74% and if sufficiently many negative user reviews suggest so.
Rare Life Event Detection via Mobile Sensing Using Multi-Task Learning
Authors: Arvind Pillai, Subigya Nepal, Andrew Campbell
Abstract
Rare life events significantly impact mental health, and their detection in behavioral studies is a crucial step towards health-based interventions. We envision that mobile sensing data can be used to detect these anomalies. However, the human-centered nature of the problem, combined with the infrequency and uniqueness of these events makes it challenging for unsupervised machine learning methods. In this paper, we first investigate granger-causality between life events and human behavior using sensing data. Next, we propose a multi-task framework with an unsupervised autoencoder to capture irregular behavior, and an auxiliary sequence predictor that identifies transitions in workplace performance to contextualize events. We perform experiments using data from a mobile sensing study comprising N=126 information workers from multiple industries, spanning 10106 days with 198 rare events (<2%). Through personalized inference, we detect the exact day of a rare event with an F1 of 0.34, demonstrating that our method outperforms several baselines. Finally, we discuss the implications of our work from the context of real-world deployment.
Keyword: pruning
Budget-Aware Graph Convolutional Network Design using Probabilistic Magnitude Pruning
Authors: Hichem Sahbi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Graph convolutional networks (GCNs) are nowadays becoming mainstream in solving many image processing tasks including skeleton-based recognition. Their general recipe consists in learning convolutional and attention layers that maximize classification performances. With multi-head attention, GCNs are highly accurate but oversized, and their deployment on edge devices requires their pruning. Among existing methods, magnitude pruning (MP) is relatively effective but its design is clearly suboptimal as network topology selection and weight retraining are achieved independently. In this paper, we devise a novel lightweight GCN design dubbed as Probabilistic Magnitude Pruning (PMP) that jointly trains network topology and weights. Our method is variational and proceeds by aligning the weight distribution of the learned networks with an a priori distribution. This allows implementing any fixed pruning rate, and also enhancing the generalization performances of the designed lightweight GCNs. Extensive experiments conducted on the challenging task of skeleton-based recognition show a substantial gain of our lightweight GCNs particularly at very high pruning regimes.
infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information
Authors: Jaehyung Kim, Yekyung Kim, Karin de Langis, Jinwoo Shin, Dongyeop Kang
Abstract
The success of NLP systems often relies on the availability of large, high-quality datasets. However, not all samples in these datasets are equally valuable for learning, as some may be redundant or noisy. Several methods for characterizing datasets based on model-driven meta-information (e.g., model's confidence) have been developed, but the relationship and complementary effects of these methods have received less attention. In this paper, we introduce infoVerse, a universal framework for dataset characterization, which provides a new feature space that effectively captures multidimensional characteristics of datasets by incorporating various model-driven meta-information. infoVerse reveals distinctive regions of the dataset that are not apparent in the original semantic space, hence guiding users (or models) in identifying which samples to focus on for exploration, assessment, or annotation. Additionally, we propose a novel sampling method on infoVerse to select a set of data points that maximizes informativeness. In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines in all applications. Our code and demo are publicly available.
Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Abstract
Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy on resource-limited devices. In this paper, we propose a novel compression strategy that leverages structured pruning and knowledge distillation to reduce the model size and inference cost of the Conformer model while preserving high recognition performance. Our approach utilizes a set of binary masks to indicate whether to retain or prune each Conformer module, and employs L0 regularization to learn the optimal mask values. To further enhance pruning performance, we use a layerwise distillation strategy to transfer knowledge from unpruned to pruned models. Our method outperforms all pruning baselines on the widely used LibriSpeech benchmark, achieving a 50% reduction in model size and a 28% reduction in inference cost with minimal performance loss.
Keyword: diffusion
Fine-grained Text Style Transfer with Diffusion-Based Language Models
Authors: Yiwei Lyu, Tiange Luo, Jiacheng Shi, Todd C. Hollon, Honglak Lee
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Diffusion probabilistic models have shown great success in generating high-quality images controllably, and researchers have tried to utilize this controllability into text generation domain. Previous works on diffusion-based language models have shown that they can be trained without external knowledge (such as pre-trained weights) and still achieve stable performance and controllability. In this paper, we trained a diffusion-based model on StylePTB dataset, the standard benchmark for fine-grained text style transfers. The tasks in StylePTB requires much more refined control over the output text compared to tasks evaluated in previous works, and our model was able to achieve state-of-the-art performance on StylePTB on both individual and compositional transfers. Moreover, our model, trained on limited data from StylePTB without external knowledge, outperforms previous works that utilized pretrained weights, embeddings, and external grammar parsers, and this may indicate that diffusion-based language models have great potential under low-resource settings.
Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels
Abstract
Learning from noisy labels is an important and long-standing problem in machine learning for real applications. One of the main research lines focuses on learning a label corrector to purify potential noisy labels. However, these methods typically rely on strict assumptions and are limited to certain types of label noise. In this paper, we reformulate the label-noise problem from a generative-model perspective, $\textit{i.e.}$, labels are generated by gradually refining an initial random guess. This new perspective immediately enables existing powerful diffusion models to seamlessly learn the stochastic generative process. Once the generative uncertainty is modeled, we can perform classification inference using maximum likelihood estimation of labels. To mitigate the impact of noisy labels, we propose the $\textbf{L}$abel-$\textbf{R}$etrieval-$\textbf{A}$ugmented (LRA) diffusion model, which leverages neighbor consistency to effectively construct pseudo-clean labels for diffusion training. Our model is flexible and general, allowing easy incorporation of different types of conditional information, $\textit{e.g.}$, use of pre-trained models, to further boost model performance. Extensive experiments are conducted for evaluation. Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets. Remarkably, by incorporating conditional information from the powerful CLIP model, our method can boost the current SOTA accuracy by 10-20 absolute points in many cases.
Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model
Authors: Haisong Ding, Bozhi Luan, Dongnan Gui, Kai Chen, Qiang Huo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Constructing a highly accurate handwritten OCR system requires large amounts of representative training data, which is both time-consuming and expensive to collect. To mitigate the issue, we propose a denoising diffusion probabilistic model (DDPM) to generate training samples. This model conditions on a printed glyph image and creates mappings between printed characters and handwritten images, thus enabling the generation of photo-realistic handwritten samples with diverse styles and unseen text contents. However, the text contents in synthetic images are not always consistent with the glyph conditional images, leading to unreliable labels of synthetic samples. To address this issue, we further propose a progressive data filtering strategy to add those samples with a high confidence of correctness to the training set. Experimental results on IAM benchmark task show that OCR model trained with augmented DDPM-synthesized training samples can achieve about 45% relative word error rate reduction compared with the one trained on real data only.
Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards
Abstract
Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from given text prompts. However, previous methods fail to perform accurate modality alignment between text concepts and generated images due to the lack of fine-level semantic guidance that successfully diagnoses the modality discrepancy. In this paper, we propose FineRewards to improve the alignment between text and images in text-to-image diffusion models by introducing two new fine-grained semantic rewards: the caption reward and the Semantic Segment Anything (SAM) reward. From the global semantic view, the caption reward generates a corresponding detailed caption that depicts all important contents in the synthetic image via a BLIP-2 model and then calculates the reward score by measuring the similarity between the generated caption and the given prompt. From the local semantic view, the SAM reward segments the generated images into local parts with category labels, and scores the segmented parts by measuring the likelihood of each category appearing in the prompted scene via a large language model, i.e., Vicuna-7B. Additionally, we adopt an assemble reward-ranked learning strategy to enable the integration of multiple reward functions to jointly guide the model training. Adapting results of text-to-image models on the MS-COCO benchmark show that the proposed semantic reward outperforms other baseline reward functions with a considerable margin on both visual quality and semantic similarity with the input prompt. Moreover, by adopting the assemble reward-ranked learning strategy, we further demonstrate that model performance is further improved when adapting under the unifying of the proposed semantic reward with the current image rewards.
Mask, Stitch, and Re-Sample: Enhancing Robustness and Generalizability in Anomaly Detection through Automatic Diffusion Models
Authors: Cosmin I. Bercea, Michael Neumayr, Daniel Rueckert, Julia A. Schnabel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract
The introduction of diffusion models in anomaly detection has paved the way for more effective and accurate image reconstruction in pathologies. However, the current limitations in controlling noise granularity hinder diffusion models' ability to generalize across diverse anomaly types and compromise the restoration of healthy tissues. To overcome these challenges, we propose AutoDDPM, a novel approach that enhances the robustness of diffusion models. AutoDDPM utilizes diffusion models to generate initial likelihood maps of potential anomalies and seamlessly integrates them with the original image. Through joint noised distribution re-sampling, AutoDDPM achieves harmonization and in-painting effects. Our study demonstrates the efficacy of AutoDDPM in replacing anomalous regions while preserving healthy tissues, considerably surpassing diffusion models' limitations. It also contributes valuable insights and analysis on the limitations of current diffusion models, promoting robust and interpretable anomaly detection in medical imaging - an essential aspect of building autonomous clinical decision systems with higher interpretability.
Deep Stochastic Mechanics
Authors: Elena Orlova, Aleksei Ustimenko, Ruoxi Jiang, Peter Y. Lu, Rebecca Willett
Abstract
This paper introduces a novel deep-learning-based approach for numerical simulation of a time-evolving Schr\"odinger equation inspired by stochastic mechanics and generative diffusion models. Unlike existing approaches, which exhibit computational complexity that scales exponentially in the problem dimension, our method allows us to adapt to the latent low-dimensional structure of the wave function by sampling from the Markovian diffusion. Depending on the latent dimension, our method may have far lower computational complexity in higher dimensions. Moreover, we propose novel equations for stochastic quantum mechanics, resulting in linear computational complexity with respect to the number of dimensions. Numerical simulations verify our theoretical findings and show a significant advantage of our method compared to other deep-learning-based approaches used for quantum mechanics.
Spontaneous symmetry breaking in generative diffusion models
Authors: Gabriel Raya, Luca Ambrogioni
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Generative diffusion models have recently emerged as a leading approach for generating high-dimensional data. In this paper, we show that the dynamics of these models exhibit a spontaneous symmetry breaking that divides the generative dynamics into two distinct phases: 1) A linear steady-state dynamics around a central fixed-point and 2) an attractor dynamics directed towards the data manifold. These two "phases" are separated by the change in stability of the central fixed-point, with the resulting window of instability being responsible for the diversity of the generated samples. Using both theoretical and empirical evidence, we show that an accurate simulation of the early dynamics does not significantly contribute to the final generation, since early fluctuations are reverted to the central fixed point. To leverage this insight, we propose a Gaussian late initialization scheme, which significantly improves model performance, achieving up to 3x FID improvements on fast samplers, while also increasing sample diversity (e.g., racial composition of generated CelebA images). Our work offers a new way to understand the generative dynamics of diffusion models that has the potential to bring about higher performance and less biased fast-samplers.
A hybridizable discontinuous Galerkin method for magnetic advection-diffusion problems
Abstract
We propose and analyze a hybridizable discontinuous Galerkin (HDG) method for solving a mixed magnetic advection-diffusion problem within a more general Friedrichs system framework. With carefully constructed numerical traces, we introduce two distinct stabilization parameters: $\tau_t$ for the tangential trace and $\tau_n$ for the normal trace. These parameters are tailored to satisfy different requirements, ensuring the stability and convergence of the method. Furthermore, we incorporate a weight function to facilitate the establishment of stability conditions. We also investigate an elementwise postprocessing technique that proves to be effective for both two-dimensional and three-dimensional problems in terms of broken $H({\rm curl})$ semi-norm accuracy improvement. Extensive numerical examples are presented to showcase the performance and effectiveness of the HDG method and the postprocessing techniques.
Direct Diffusion Bridge using Data Consistency for Inverse Problems
Authors: Hyungjin Chung, Jeongsol Kim, Jong Chul Ye
Abstract
Diffusion model-based inverse problem solvers have shown impressive performance, but are limited in speed, mostly as they require reverse diffusion sampling starting from noise. Several recent works have tried to alleviate this problem by building a diffusion process, directly bridging the clean and the corrupted for specific inverse problems. In this paper, we first unify these existing works under the name Direct Diffusion Bridges (DDB), showing that while motivated by different theories, the resulting algorithms only differ in the choice of parameters. Then, we highlight a critical limitation of the current DDB framework, namely that it does not ensure data consistency. To address this problem, we propose a modified inference procedure that imposes data consistency without the need for fine-tuning. We term the resulting method data Consistent DDB (CDDB), which outperforms its inconsistent counterpart in terms of both perception and distortion metrics, thereby effectively pushing the Pareto-frontier toward the optimum. Our proposed method achieves state-of-the-art results on both evaluation criteria, showcasing its superiority over existing methods.
Homogenization of nondivergence-form elliptic equations with discontinuous coefficients and finite element approximation of the homogenized problem
Authors: Timo Sprekeler
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
We study the homogenization of the equation $-A(\frac{\cdot}{\varepsilon}):D^2 u_{\varepsilon} = f$ posed in a bounded convex domain $\Omega\subset \mathbb{R}^n$ subject to a Dirichlet boundary condition and the numerical approximation of the corresponding homogenized problem, where the measurable, uniformly elliptic, periodic and symmetric diffusion matrix $A$ is merely assumed to be essentially bounded and (if $n>2$) to satisfy the Cordes condition. In the first part, we show existence and uniqueness of an invariant measure by reducing to a Lax--Milgram-type problem, we obtain $L^2$-bounds for periodic problems in double-divergence-form, we prove homogenization under minimal regularity assumptions, and we generalize known corrector bounds and results on optimal convergence rates from the classical case of H\"{o}lder continuous coefficients to the present case. In the second part, we suggest and rigorously analyze an approximation scheme for the effective coefficient matrix and the solution to the homogenized problem based on a finite element method for the approximation of the invariant measure, and we demonstrate the performance of the scheme through numerical experiments.
Inverse-design of nonlinear mechanical metamaterials via video denoising diffusion models
Authors: Jan-Hendrik Bastek, Dennis M. Kochmann
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
The accelerated inverse design of complex material properties - such as identifying a material with a given stress-strain response over a nonlinear deformation path - holds great potential for addressing challenges from soft robotics to biomedical implants and impact mitigation. While machine learning models have provided such inverse mappings, they are typically restricted to linear target properties such as stiffness. To tailor the nonlinear response, we here show that video diffusion generative models trained on full-field data of periodic stochastic cellular structures can successfully predict and tune their nonlinear deformation and stress response under compression in the large-strain regime, including buckling and contact. Unlike commonly encountered black-box models, our framework intrinsically provides an estimate of the expected deformation path, including the full-field internal stress distribution closely agreeing with finite element simulations. This work has thus the potential to simplify and accelerate the identification of materials with complex target performance.
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
Authors: Fei Ni, Jianye Hao, Yao Mu, Yifu Yuan, Yan Zheng, Bin Wang, Zhixuan Liang
Abstract
Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL). However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation. The key is to learn a context conditioned diffusion model which can generate task-oriented trajectories for planning across diverse tasks. To enhance the dynamics consistency of the generated trajectories while encouraging trajectories to achieve high returns, we further design a dual-guided module in the sampling process of the diffusion model. The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task and the flexibility to incorporate with different task representation method. The experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines, demonstrating the outstanding conditional generation ability of diffusion architecture.
A Geometric Perspective on Diffusion Models
Authors: Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang
Abstract
Recent years have witnessed significant progress in developing efficient training and fast sampling approaches for diffusion models. A recent remarkable advancement is the use of stochastic differential equations (SDEs) to describe data perturbation and generative modeling in a unified mathematical framework. In this paper, we reveal several intriguing geometric structures of diffusion models and contribute a simple yet powerful interpretation to their sampling dynamics. Through carefully inspecting a popular variance-exploding SDE and its marginal-preserving ordinary differential equation (ODE) for sampling, we discover that the data distribution and the noise distribution are smoothly connected with an explicit, quasi-linear sampling trajectory, and another implicit denoising trajectory, which even converges faster in terms of visual quality. We also establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm, with which we can characterize the asymptotic behavior of diffusion models and identify the score deviation. These new geometric observations enable us to improve previous sampling algorithms, re-examine latent interpolation, as well as re-explain the working principles of distillation-based fast sampling techniques.
GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations
Authors: Pietro Melzi, Christian Rathgeb, Ruben Tolosana, Ruben Vera-Rodriguez, Dominik Lawatsch, Florian Domin, Maxim Schaubert
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Face recognition systems have significantly advanced in recent years, driven by the availability of large-scale datasets. However, several issues have recently came up, including privacy concerns that have led to the discontinuation of well-established public datasets. Synthetic datasets have emerged as a solution, even though current synthesis methods present other drawbacks such as limited intra-class variations, lack of realism, and unfair representation of demographic groups. This study introduces GANDiffFace, a novel framework for the generation of synthetic datasets for face recognition that combines the power of Generative Adversarial Networks (GANs) and Diffusion models to overcome the limitations of existing synthetic datasets. In GANDiffFace, we first propose the use of GANs to synthesize highly realistic identities and meet target demographic distributions. Subsequently, we fine-tune Diffusion models with the images generated with GANs, synthesizing multiple images of the same identity with a variety of accessories, poses, expressions, and contexts. We generate multiple synthetic datasets by changing GANDiffFace settings, and compare their mated and non-mated score distributions with the distributions provided by popular real-world datasets for face recognition, i.e. VGG2 and IJB-C. Our results show the feasibility of the proposed GANDiffFace, in particular the use of Diffusion models to enhance the (limited) intra-class variations provided by GANs towards the level of real-world datasets.
Protein Design with Guided Discrete Diffusion
Authors: Nate Gruver, Samuel Stanton, Nathan C. Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, Andrew Gordon Wilson
Abstract
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to a therapeutic target under locality and liability constraints, with 97% expression rate and 25% binding rate in exploratory in vitro experiments.
Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust
Authors: Yuxin Wen, John Kirchenbauer, Jonas Geiping, Tom Goldstein
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content. In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. Unlike existing methods that perform post-hoc modifications to images after sampling, Tree-Ring Watermarking subtly influences the entire sampling process, resulting in a model fingerprint that is invisible to humans. The watermark embeds a pattern into the initial noise vector used for sampling. These patterns are structured in Fourier space so that they are invariant to convolutions, crops, dilations, flips, and rotations. After image generation, the watermark signal is detected by inverting the diffusion process to retrieve the noise vector, which is then checked for the embedded signal. We demonstrate that this technique can be easily applied to arbitrary diffusion models, including text-conditioned Stable Diffusion, as a plug-in with negligible loss in FID. Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed. Code is available at github.com/YuxinWenRick/tree-ring-watermark.
A Unified Conditional Framework for Diffusion-based Image Restoration
Authors: Yi Zhang, Xiaoyu Shi, Dasong Li, Xiaogang Wang, Jian Wang, Hongsheng Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Diffusion Probabilistic Models (DPMs) have recently shown remarkable performance in image generation tasks, which are capable of generating highly realistic images. When adopting DPMs for image restoration tasks, the crucial aspect lies in how to integrate the conditional information to guide the DPMs to generate accurate and natural output, which has been largely overlooked in existing works. In this paper, we present a unified conditional framework based on diffusion models for image restoration. We leverage a lightweight UNet to predict initial guidance and the diffusion model to learn the residual of the guidance. By carefully designing the basic module and integration module for the diffusion model block, we integrate the guidance and other auxiliary conditional information into every block of the diffusion model to achieve spatially-adaptive generation conditioning. To handle high-resolution images, we propose a simple yet effective inter-step patch-splitting strategy to produce arbitrary-resolution images without grid artifacts. We evaluate our conditional framework on three challenging tasks: extreme low-light denoising, deblurring, and JPEG restoration, demonstrating its significant improvements in perceptual quality and the generalization to restoration tasks.
Efficient Diffusion Policies for Offline Reinforcement Learning
Authors: Bingyi Kang, Xiao Ma, Chao Du, Tianyu Pang, Shuicheng Yan
Abstract
Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. EDP approximately constructs actions from corrupted ones at training to avoid running the sampling chain. We conduct extensive experiments on the D4RL benchmark. The results show that EDP can reduce the diffusion policy training time from 5 days to 5 hours on gym-locomotion tasks. Moreover, we show that EDP is compatible with various offline RL algorithms (TD3, CRR, and IQL) and achieves new state-of-the-art on D4RL by large margins over previous methods. Our code is available at https://github.com/sail-sg/edp.
Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
Abstract
Recent years have witnessed considerable achievements in editing images with text instructions. When applying these editors to dynamic scene editing, the new-style scene tends to be temporally inconsistent due to the frame-by-frame nature of these 2D editors. To tackle this issue, we propose Control4D, a novel approach for high-fidelity and temporally consistent 4D portrait editing. Control4D is built upon an efficient 4D representation with a 2D diffusion-based editor. Instead of using direct supervisions from the editor, our method learns a 4D GAN from it and avoids the inconsistent supervision signals. Specifically, we employ a discriminator to learn the generation distribution based on the edited images and then update the generator with the discrimination signals. For more stable training, multi-level information is extracted from the edited images and used to facilitate the learning of the generator. Experimental results show that Control4D surpasses previous approaches and achieves more photo-realistic and consistent 4D editing performances. The link to our project website is https://control4darxiv.github.io.
Understanding and Mitigating Copying in Diffusion Models
Authors: Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom Goldstein
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, we observe that the text conditioning of the model plays a similarly important role. In fact, we see in our experiments that data replication often does not happen for unconditional models, while it is common in the text-conditional case. Motivated by our findings, we then propose several techniques for reducing data replication at both training and inference time by randomizing and augmenting image captions in the training set.
Keyword: dynamic
Memory as a Mass-based Graph: Towards a Conceptual Framework for the Simulation Model of Human Memory in AI
Abstract
There are two approaches for simulating memory as well as learning in artificial intelligence; the functionalistic approach and the cognitive approach. The necessary condition to put the second approach into account is to provide a model of brain activity that contains a quite good congruence with observational facts such as mistakes and forgotten experiences. Given that human memory has a solid core that includes the components of our identity, our family and our hometown, the major and determinative events of our lives, and the countless repeated and accepted facts of our culture, the more we go to the peripheral spots the data becomes flimsier and more easily exposed to oblivion. It was essential to propose a model in which the topographical differences are quite distinguishable. In our proposed model, we have translated this topographical situation into quantities, which are attributed to the nodes. The result is an edge-weighted graph with mass-based values on the nodes which demonstrates the importance of each atomic proposition, as a truth, for an intelligent being. Furthermore, it dynamically develops and modifies, and in successive phases, it changes the mass of the nodes and weight of the edges depending on gathered inputs from the environment.
Tracking the whole-body centre of mass while seated in a wheelchair using motion capture
Abstract
Estimating the position of the whole-body centre of mass (CoM) based on skin markers and anthropometric tables requires tracking the pelvis and lower body, which is impossible for wheelchair users due to occlusion. In this work, we present a method to track the user's whole-body CoM using visible markers affixed to the user and wheelchair where the user remains seated in their wheelchair, by expressing the pelvis and lower body segments in wheelchair coordinates. The accuracy of this method was evaluated on the anterior-posterior (AP) and medial-lateral (ML) axes by comparing the projected CoM to the centre of pressure measured by four force plates, for 11 able-bodied participants adopting 9 static postures that include extreme reaching postures. The estimation accuracy was within 33 mm (AP) and 9 mm (ML), with a precision within 23 mm (AP) and 12 mm (ML). Tracking the whole-body CoM during wheelchair propulsion will allow researchers to better understand the dynamics of propulsion, which may help devise new approaches to increase the energy transfer from the arms to the ground and reduce the risks of developing musculoskeletal disorders.
DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling
Abstract
Learning from noisy labels is a challenge that arises in many real-world applications where training data can contain incorrect or corrupted labels. When fine-tuning language models with noisy labels, models can easily overfit the label noise, leading to decreased performance. Most existing methods for learning from noisy labels use static input features for denoising, but these methods are limited by the information they can provide on true label distributions and can result in biased or incorrect predictions. In this work, we propose the Dynamics-Enhanced Generative Model (DyGen), which uses dynamic patterns in the embedding space during the fine-tuning process of language models to improve noisy label predictions. DyGen uses the variational auto-encoding framework to infer the posterior distributions of true labels from noisy labels and training dynamics. Additionally, a co-regularization mechanism is used to minimize the impact of potentially noisy labels and priors. DyGen demonstrates an average accuracy improvement of 3.10% on two synthetic noise datasets and 1.48% on three real-world noise datasets compared to the previous state-of-the-art. Extensive experiments and analyses show the effectiveness of each component in DyGen. Our code is available for reproducibility on GitHub.
Efficient Training of Energy-Based Models Using Jarzynski Equality
Authors: Davide Carbone, Mengjian Hua, Simon Coste, Eric Vanden-Eijnden
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Numerical Analysis (math.NA); Probability (math.PR)
Abstract
Energy-based models (EBMs) are generative models inspired by statistical physics with a wide range of applications in unsupervised learning. Their performance is best measured by the cross-entropy (CE) of the model distribution relative to the data distribution. Using the CE as the objective for training is however challenging because the computation of its gradient with respect to the model parameters requires sampling the model distribution. Here we show how results for nonequilibrium thermodynamics based on Jarzynski equality together with tools from sequential Monte-Carlo sampling can be used to perform this computation efficiently and avoid the uncontrolled approximations made using the standard contrastive divergence algorithm. Specifically, we introduce a modification of the unadjusted Langevin algorithm (ULA) in which each walker acquires a weight that enables the estimation of the gradient of the cross-entropy at any step during GD, thereby bypassing sampling biases induced by slow mixing of ULA. We illustrate these results with numerical experiments on Gaussian mixture distributions as well as the MNIST dataset. We show that the proposed approach outperforms methods based on the contrastive divergence algorithm in all the considered situations.
Dynamic Sparsity Is Channel-Level Sparsity Learner
Authors: Lu Yin, Gen Li, Meng Fang, Li Shen, Tianjin Huang, Zhangyang Wang, Vlado Menkovski, Xiaolong Ma, Mykola Pechenizkiy, Shiwei Liu
Abstract
Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference. Dynamic sparse training (DST), as a leading sparse training approach, can train deep neural networks at high sparsity from scratch to match the performance of their dense counterparts. However, most if not all DST prior arts demonstrate their effectiveness on unstructured sparsity with highly irregular sparse patterns, which receives limited support in common hardware. This limitation hinders the usage of DST in practice. In this paper, we propose Channel-aware dynamic sparse (Chase), which for the first time seamlessly translates the promise of unstructured dynamic sparsity to GPU-friendly channel-level sparsity (not fine-grained N:M or group sparsity) during one end-to-end training process, without any ad-hoc operations. The resulting small sparse networks can be directly accelerated by commodity hardware, without using any particularly sparsity-aware hardware accelerators. This appealing outcome is partially motivated by a hidden phenomenon of dynamic sparsity: off-the-shelf unstructured DST implicitly involves biased parameter reallocation across channels, with a large fraction of channels (up to 60\%) being sparser than others. By progressively identifying and removing these channels during training, our approach translates unstructured sparsity to channel-wise sparsity. Our experimental results demonstrate that Chase achieves 1.7 X inference throughput speedup on common GPU devices without compromising accuracy with ResNet-50 on ImageNet. We release our codes in https://github.com/luuyin/chase.
DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator
Authors: Hanqing Zhu, Jiaqi Gu, Hanrui Wang, Zixuan Jiang, Zhekai Zhang, Rongxin Tang, Chenghao Feng, Song Han, Ray T. Chen, David Z. Pan
Abstract
The wide adoption and significant computing resource consumption of attention-based Transformers, e.g., Vision Transformer and large language models, have driven the demands for efficient hardware accelerators. While electronic accelerators have been commonly used, there is a growing interest in exploring photonics as an alternative technology due to its high energy efficiency and ultra-fast processing speed. Optical neural networks (ONNs) have demonstrated promising results for convolutional neural network (CNN) workloads that only require weight-static linear operations. However, they fail to efficiently support Transformer architectures with attention operations due to the lack of ability to process dynamic full-range tensor multiplication. In this work, we propose a customized high-performance and energy-efficient photonic Transformer accelerator, DOTA. To overcome the fundamental limitation of existing ONNs, we introduce a novel photonic tensor core, consisting of a crossbar array of interference-based optical vector dot-product engines, that supports highly-parallel, dynamic, and full-range matrix-matrix multiplication. Our comprehensive evaluation demonstrates that DOTA achieves a >4x energy and a >10x latency reduction compared to prior photonic accelerators, and delivers over 20x energy reduction and 2 to 3 orders of magnitude lower latency compared to the electronic Transformer accelerator. Our work highlights the immense potential of photonic computing for efficient hardware accelerators, particularly for advanced machine learning workloads.
About Decisiveness of Dynamic Probabilistic Models
Authors: Alain Finkel, Serge Haddad, Lina Ye
Subjects: Formal Languages and Automata Theory (cs.FL)
Abstract
Decisiveness of infinite Markov chains with respect to some (finite or infinite) target set of states is a key property that allows to compute the reachability probability of this set up to an arbitrary precision. Most of the existing works assume constant weights for defining the probability of a transition in the considered models. However numerous probabilistic modelings require (dynamic) weights that depend on both the current state and the transition. So we introduce a dynamic probabilistic version of counter machine (pCM). After establishing that decisiveness is undecidable for pCMs even with constant weights, we study the decidability of decisiveness for subclasses of pCM. We show that, without restrictions on dynamic weights, decisiveness is undecidable with a single state and single counter pCM. On the contrary with polynomial weights, decisiveness becomes decidable for single counter pCMs under mild conditions. Then we show that decisiveness of probabilistic Petri nets (pPNs) with polynomial weights is undecidable even when the target set is upward-closed unlike the case of constant weights. Finally we prove that the standard subclass of pPNs with a regular language is decisive with respect to a finite set whatever the kind of weights.
Economics of Spot Instance Service: A Two-stage Dynamic Game Apporach
Authors: Hyojung Lee, Lam Vu, Minsung Jang
Subjects: Computer Science and Game Theory (cs.GT); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
This paper presents the economic impacts of spot instance service on the cloud service providers (CSPs) and the customers when the CSPs offer it along with the on-demand instance service to the customers. We model the interaction between CSPs and customers as a non-cooperative two-stage dynamic game. Our equilibrium analysis reveals (i) the techno-economic interrelationship between the customers' heterogeneity, resource availability, and CSPs' pricing policy, and (ii) the impacts of the customers' service selection (spot vs. on-demand) and the CSPs' pricing decision on the CSPs' market share and revenue, as well as the customers' utility. The key technical challenges lie in, first, how we capture the strategic interactions between CSPs and customers, and second, how we consider the various practical aspects of cloud services, such as heterogeneity of customers' willingness to pay for the quality of service (QoS) and the fluctuating resource availability. The main contribution of this paper is to provide CSPs and customers with a better understanding of the economic impact caused by a certain price policy for the spot service when the equilibrium price, which from our two-stage dynamic game analysis, is able to set as the baseline price for their spot service.
Medication Recommendation via Domain Knowledge Informed Deep Learning
Abstract
Medication recommendation is a fundamental yet crucial branch of healthcare, which provides opportunities to support clinical physicians with more accurate medication prescriptions for patients with complex health conditions. Learning from electronic health records (EHR) to recommend medications is the most common way in previous studies. However, most of them neglect incorporating domain knowledge according to the clinical manifestations in the EHR of the patient. To address these issues, we propose a novel \textbf{D}omain \textbf{K}nowledge \textbf{I}nformed \textbf{Net}work (DKINet) to integrate domain knowledge with observable clinical manifestations of the patient, which is the first dynamic domain knowledge informed framework toward medication recommendation. In particular, we first design a knowledge-driven encoder to capture the domain information and then develop a data-driven encoder to integrate domain knowledge into the observable EHR. To endow the model with the capability of temporal decision, we design an explicit medication encoder for learning the longitudinal dependence of the patient. Extensive experiments on three publicly available datasets verify the superiority of our method. The code will be public upon acceptance.
Adaptive Compatible Performance Control for Spacecraft Attitude Control under Motion Constraints with Guaranteed Accuracy
Authors: Jiakun Lei, Tao Meng, Yang Zhu, Kun Wang, Weijia Wang
Abstract
This paper focuses on the problem of spacecraft attitude control in the presence of time-varying parameter uncertainties and multiple constraints, accounting for angular velocity limitation, performance requirements, and input saturation. To tackle this problem, we propose a modified framework called Compatible Performance Control (CPC), which integrates the Prescribed Performance Control (PPC) scheme with a contradiction detection and alleviation strategy. Firstly, by introducing the Zeroing Barrier Function (ZBF) concept, we propose a detection strategy to yield judgment on the compatibility between the angular velocity constraint and the performance envelope constraint. Subsequently, we propose a projection operator-governed dynamical system with a varying upper bound to generate an appropriate bounded performance envelope-modification signal if a contradiction exists, thereby alleviating the contradiction and promoting compatibility within the system. Next, a dynamical filter technique is introduced to construct a bounded reference velocity signal to address the angular velocity limitation. Furthermore, we employ a time-varying gain technique to address the challenge posed by time-varying parameter uncertainties, further developing an adaptive strategy that exhibits robustness on disturbance rejection. By utilizing the proposed CPC scheme and time-varying gain adaptive strategy, we construct an adaptive CPC controller, which guarantees the ultimate boundedness of the system, and all constraints are satisfied simultaneously during the whole control process. Finally, numerical simulation results are presented to show the effectiveness of the proposed framework.
Composite Triggered Intermittent Control for Constrained Spacecraft Attitude Tracking
Authors: Jiakun Lei, Tao Meng, Kun Wang, Weijia Wang, Shujian Sun
Abstract
This paper focuses on the spacecraft attitude control problem with intermittent actuator activation, taking into account the attitude rotation rate limitation and input saturation issue simultaneously. To address this problem, we first propose a composite event-trigger mechanism, which composed of two state-dependent trigger that governing the activation and deactivation of actuators. Subsequently, by introducing the cascaded decomposition of Backstepping control philosophy, the designed trigger mechanism is then applied to the decomposed dynamical subsystem, providing a layered intermittent stabilization strategy. Further, the basic intermittent attitude controller is extended to a "constrained version" by introducing a strictly bounded virtual control law and an input saturation compensation auxiliary system. By analyzing the local boundedness of the system on each inter-event time interval, a uniformly, strictly decreasing upper boundary of the lumped system is further characterized, thereby completing the proof of the system's uniformly ultimately boundedness (UUB). Finally, numerical simulation results are illustrated to demonstrate the effectiveness of the proposed scheme.
Adaptive Reduced Attitude Control for Spacecraft Boresight Alignment under Multiple Constraints with Guaranteed Accuracy
Authors: Jiakun Lei, Tao Meng, Kun Wang, Weijia Wang, Shujian Sun, Lei Wang
Abstract
This paper investigates the boresight-alignment attitude control problem of spacecraft under time-varying parameter uncertainties in a manner that ensures both safety and performance, accounting for pointing-forbidden constraints, attitude rotation rate limitations, and pointing accuracy requirements. To tackle this problem, we propose a modified composite framework that combines the Artificial Potential Field (APF) methodology with the Prescribed Performance Control (PPC) scheme. The APF scheme is employed to ensure safety, while the PPC scheme is employed for performance consideration. Further, a switching prescribed performance function (S-PPF) is proposed to promote compatibility between safety and performance when the system encounters obstacle regions. To further address the concern regarding parameter uncertainties, we utilize the Immersion-and-Invariance (I\&I) adaptive control technique, and we further introduce the concept called "congelation of variables" to build a compensating auxiliary system, facilitating the stability by providing a feedforwardly compensating for time-varying unknown parameter dynamics. Based on the proposed scheme, an adaptive composite controller is derived, guaranteeing the asymptotic convergence of the closed-loop system. Finally, numerical simulation results are carried out to validate the effectiveness of the proposed scheme.
Equivalence between State-Space Stability Analysis and Generalized Nyquist Criterion with Frequency Dynamics Included
Authors: Pablo Rodriguez-Ortega, Javier Roldan-Perez, Milan Prodanovic
Abstract
Stability of power electronic converters connected to power grids is commonly assessed by using the impedance criterion. On the contrary, stability of power grids is typically analysed by using network state-space representation and system eigenvalues. It has been already reported in the literature that the impedance criterion may lead to erroneous results if the dynamics of the grid frequency is not taken into consideration. Meanwhile, the eigenvalue analysis is still considered as a reliable method for finding instabilities. However, the equivalence between these two methods (impedance criterion with frequency dynamics included and eigenvalue analysis) has not been thoroughly studied before. In this paper, the impedance method with frequency dynamics included is compared with the conventional eigenvalue analysis for a system fully based on electronic interfaces. Analytical expressions that link both methods are provided, showing that they are identical under certain conditions. Then, both methods are applied to a system based on a grid-forming and a grid-following converter. The results obtained from real-time simulations performed in OPAL-RT are used to validate the theoretical developments and main contributions of this work.
Spontaneous symmetry breaking in generative diffusion models
Authors: Gabriel Raya, Luca Ambrogioni
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Generative diffusion models have recently emerged as a leading approach for generating high-dimensional data. In this paper, we show that the dynamics of these models exhibit a spontaneous symmetry breaking that divides the generative dynamics into two distinct phases: 1) A linear steady-state dynamics around a central fixed-point and 2) an attractor dynamics directed towards the data manifold. These two "phases" are separated by the change in stability of the central fixed-point, with the resulting window of instability being responsible for the diversity of the generated samples. Using both theoretical and empirical evidence, we show that an accurate simulation of the early dynamics does not significantly contribute to the final generation, since early fluctuations are reverted to the central fixed point. To leverage this insight, we propose a Gaussian late initialization scheme, which significantly improves model performance, achieving up to 3x FID improvements on fast samplers, while also increasing sample diversity (e.g., racial composition of generated CelebA images). Our work offers a new way to understand the generative dynamics of diffusion models that has the potential to bring about higher performance and less biased fast-samplers.
Optimal Decision Trees for Separable Objectives: Pushing the Limits of Dynamic Programming
Authors: Jacobus G. M. van der Linden, Mathijs M. de Weerdt, Emir Demirović
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
Abstract
Global optimization of decision trees has shown to be promising in terms of accuracy, size, and consequently human comprehensibility. However, many of the methods used rely on general-purpose solvers for which scalability remains an issue. Dynamic programming methods have been shown to scale much better because they exploit the tree structure by solving subtrees as independent subproblems. However, this only works when an objective can be optimized separately for subtrees. We explore this relationship in detail and show necessary and sufficient conditions for such separability and generalize previous dynamic programming approaches into a framework that can optimize any combination of separable objectives and constraints. Experiments on four application domains show the general applicability of this framework, while outperforming the scalability of general-purpose solvers by a large margin.
Low-Complexity Dynamic Directional Modulation: Vulnerability and Information Leakage
Authors: Pedro E. Gória Silva, Adam Narbudowicz, Nicola Marchetti, Pedro H. J. Nardelli, Rausley A. A. de Souza, Jules M. Moualeu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
In this paper, the privacy of wireless transmissions is improved through the use of an efficient technique termed dynamic directional modulation (DDM), and is subsequently assessed in terms of the measure of information leakage. Recently, a variation of DDM termed low-power dynamic directional modulation (LPDDM) has attracted significant attention as a prominent secure transmission method due to its ability to further improve the privacy of wireless communications. Roughly speaking, this modulation operates by randomly selecting the transmitting antenna from an antenna array whose radiation pattern is well known. Thereafter, the modulator adjusts the constellation phase so as to ensure that only the legitimate receiver recovers the information. To begin with, we highlight some privacy boundaries inherent to the underlying system. In addition, we propose features that the antenna array must meet in order to increase the privacy of a wireless communication system. Last, we adopt a uniform circular monopole antenna array with equiprobable transmitting antennas in order to assess the impact of DDM on the information leakage. It is shown that the bit error rate, while being a useful metric in the evaluation of wireless communication systems, does not provide the full information about the vulnerability of the underlying system.
Direct Learning-Based Deep Spiking Neural Networks: A Review
Authors: Yufei Guo, Xuhui Huang, Zhe Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The spiking neural network (SNN), as a promising brain-inspired computational model with binary spike information transmission mechanism, rich spatially-temporal dynamics, and event-driven characteristics, has received extensive attention. However, its intricately discontinuous spike mechanism brings difficulty to the optimization of the deep SNN. Since the surrogate gradient method can greatly mitigate the optimization difficulty and shows great potential in directly training deep SNNs, a variety of direct learning-based deep SNN works have been proposed and achieved satisfying progress in recent years. In this paper, we present a comprehensive survey of these direct learning-based deep SNN works, mainly categorized into accuracy improvement methods, efficiency improvement methods, and temporal dynamics utilization methods. In addition, we also divide these categorizations into finer granularities further to better organize and introduce them. Finally, the challenges and trends that may be faced in future research are prospected.
Learning Representations without Compositional Assumptions
Authors: Tennison Liu, Jeroen Berrevoets, Zhaozhi Qian, Mihaela van der Schaar
Abstract
This paper addresses unsupervised representation learning on tabular data containing multiple views generated by distinct sources of measurement. Traditional methods, which tackle this problem using the multi-view framework, are constrained by predefined assumptions that assume feature sets share the same information and representations should learn globally shared factors. However, this assumption is not always valid for real-world tabular datasets with complex dependencies between feature sets, resulting in localized information that is harder to learn. To overcome this limitation, we propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges. Furthermore, we introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically. This approach results in latent graph components that specialize in capturing localized information from different regions of the input, leading to superior downstream performance.
Abstract
Markov jump processes are continuous-time stochastic processes with a wide range of applications in both natural and social sciences. Despite their widespread use, inference in these models is highly non-trivial and typically proceeds via either Monte Carlo or expectation-maximization methods. In this work we introduce an alternative, variational inference algorithm for Markov jump processes which relies on neural ordinary differential equations, and is trainable via back-propagation. Our methodology learns neural, continuous-time representations of the observed data, that are used to approximate the initial distribution and time-dependent transition probability rates of the posterior Markov jump process. The time-independent rates of the prior process are in contrast trained akin to generative adversarial networks. We test our approach on synthetic data sampled from ground-truth Markov jump processes, experimental switching ion channel data and molecular dynamics simulations. Source code to reproduce our experiments is available online.
Tracking Power System Events with Accuracy-Based PMU Adaptive Reporting Rate
Authors: Guglielmo Frigo, Paolo Attilio Pegoraro, Sergio Toscani
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
Abstract
Fast dynamics and transient events are becoming more and more frequent in power systems, due to the high penetration of renewable energy sources and the consequent lack of inertia. In this scenario, Phasor Measurement Units (PMUs) are expected to track the monitored quantities. Such functionality is related not only to the PMU accuracy (as per the IEC/IEEE 60255-118-1 standard) but also to the PMU reporting rate (RR). High RRs allow tracking fast dynamics, but produce many redundant measurement data in normal conditions. In view of an effective tradeoff, the present paper proposes an adaptive RR mechanism based on a real-time selection of the measurements, with the target of preserving the information content while reducing the data rate. The proposed method has been tested considering real-world datasets and applied to four different PMU algorithms. The results prove the method effectiveness in reducing the average data throughput as well as its scalability at PMU concentrator or storage level.
Evolutionary Solution Adaption for Multi-Objective Metal Cutting Process Optimization
Authors: Leo Francoso Dal Piccol Sotto, Sebastian Mayer, Hemanth Janarthanam, Alexander Butz, Jochen Garcke
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Optimizing manufacturing process parameters is typically a multi-objective problem with often contradictory objectives such as production quality and production time. If production requirements change, process parameters have to be optimized again. Since optimization usually requires costly simulations based on, for example, the Finite Element method, it is of great interest to have means to reduce the number of evaluations needed for optimization. To this end, we consider optimizing for different production requirements from the viewpoint of a framework for system flexibility that allows us to study the ability of an algorithm to transfer solutions from previous optimization tasks, which also relates to dynamic evolutionary optimization. Based on the extended Oxley model for orthogonal metal cutting, we introduce a multi-objective optimization benchmark where different materials define related optimization tasks, and use it to study the flexibility of NSGA-II, which we extend by two variants: 1) varying goals, that optimizes solutions for two tasks simultaneously to obtain in-between source solutions expected to be more adaptable, and 2) active-inactive genotype, that accommodates different possibilities that can be activated or deactivated. Results show that adaption with standard NSGA-II greatly reduces the number of evaluations required for optimization to a target goal, while the proposed variants further improve the adaption costs, although further work is needed towards making the methods advantageous for real applications.
An Empirical Study of Fault Localization in Python Programs
Abstract
Despite its massive popularity as a programming language, especially in novel domains like data science programs, there is comparatively little research about fault localization that targets Python. Even though it is plausible that several findings about programming languages like C/C++ and Java -- the most common choices for fault localization research -- carry over to other languages, whether the dynamic nature of Python and how the language is used in practice affect the capabilities of classic fault localization approaches remain open questions to investigate. This paper is the first large-scale empirical study of fault localization on real-world Python programs and faults. Using Zou et al.'s recent large-scale empirical study of fault localization in Java as the basis of our study, we investigated the effectiveness (i.e., localization accuracy), efficiency (i.e., runtime performance), and other features (e.g., different entity granularities) of seven well-known fault-localization techniques in four families (spectrum-based, mutation-based, predicate switching, and stack-trace based) on 135 faults from 13 open-source Python projects from the BugsInPy curated collection. The results replicate for Python several results known about Java, and shed light on whether Python's peculiarities affect the capabilities of fault localization. The replication package that accompanies this paper includes detailed data about our experiments, as well as the tool FauxPy that we implemented to conduct the study.
Learning Task-preferred Inference Routes for Gradient De-conflict in Multi-output DNNs
Abstract
Multi-output deep neural networks(MONs) contain multiple task branches, and these tasks usually share partial network filters that lead to the entanglement of different task inference routes. Due to the inconsistent optimization objectives, the task gradients used for training MONs will interfere with each other on the shared routes, which will decrease the overall model performance. To address this issue, we propose a novel gradient de-conflict algorithm named DR-MGF(Dynamic Routes and Meta-weighted Gradient Fusion) in this work. Different from existing de-conflict methods, DR-MGF achieves gradient de-conflict in MONs by learning task-preferred inference routes. The proposed method is motivated by our experimental findings: the shared filters are not equally important to different tasks. By designing the learnable task-specific importance variables, DR-MGF evaluates the importance of filters for different tasks. Through making the dominances of tasks over filters be proportional to the task-specific importance of filters, DR-MGF can effectively reduce the inter-task interference. The task-specific importance variables ultimately determine task-preferred inference routes at the end of training iterations. Extensive experimental results on CIFAR, ImageNet, and NYUv2 illustrate that DR-MGF outperforms the existing de-conflict methods both in prediction accuracy and convergence speed of MONs. Furthermore, DR-MGF can be extended to general MONs without modifying the overall network structures.
Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive
Abstract
Modern consumer cameras usually employ the rolling shutter (RS) mechanism, where images are captured by scanning scenes row-by-row, yielding RS distortions for dynamic scenes. To correct RS distortions, existing methods adopt a fully supervised learning manner, where high framerate global shutter (GS) images should be collected as ground-truth supervision. In this paper, we propose a Self-supervised learning framework for Dual reversed RS distortions Correction (SelfDRSC), where a DRSC network can be learned to generate a high framerate GS video only based on dual RS images with reversed distortions. In particular, a bidirectional distortion warping module is proposed for reconstructing dual reversed RS images, and then a self-supervised loss can be deployed to train DRSC network by enhancing the cycle consistency between input and reconstructed dual reversed RS images. Besides start and end RS scanning time, GS images at arbitrary intermediate scanning time can also be supervised in SelfDRSC, thus enabling the learned DRSC network to generate a high framerate GS video. Moreover, a simple yet effective self-distillation strategy is introduced in self-supervised loss for mitigating boundary artifacts in generated GS images. On synthetic dataset, SelfDRSC achieves better or comparable quantitative metrics in comparison to state-of-the-art methods trained in the full supervision manner. On real-world RS cases, our SelfDRSC can produce high framerate GS videos with finer correction textures and better temporary consistency. The source code and trained models are made publicly available at https://github.com/shangwei5/SelfDRSC.
Stability and convergence of in time approximations of hyperbolic elastodynamics via stepwise minimization
Authors: Antonín Češík, Sebastian Schwarzacher
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
We study step-wise time approximations of non-linear hyperbolic initial value problems. The technique used here is a generalization of the minimizing movements method, using two time-scales: one for velocity, the other (potentially much larger) for acceleration. The main applications are from elastodynamics namely so-called generalized solids. The evolution follows an underlying variational structure exploited by step-wise minimisation. We show for a large family of (elastic) energies that the introduced scheme is stable; allowing for non-linearities of highest order. If the highest order can assumed to be linear, we show that the limit solutions are regular and that the minimizing movements scheme converges with optimal linear rate. Thus this work extends numerical time-step minimization methods to the realm of hyperbolic problems.
Handling Large Discrete Action Spaces via Dynamic Neighborhood Construction
Authors: Fabian Akkerman, Julius Luy, Wouter van Heeswijk, Maximilian Schiffer
Abstract
Large discrete action spaces remain a central challenge for reinforcement learning methods. Such spaces are encountered in many real-world applications, e.g., recommender systems, multi-step planning, and inventory replenishment. The mapping of continuous proxies to discrete actions is a promising paradigm for handling large discrete action spaces. Existing continuous-to-discrete mapping approaches involve searching for discrete neighboring actions in a static pre-defined neighborhood, which requires discrete neighbor lookups across the entire action space. Hence, scalability issues persist. To mitigate this drawback, we propose a novel Dynamic Neighborhood Construction (DNC) method, which dynamically constructs a discrete neighborhood to map the continuous proxy, thus efficiently exploiting the underlying action space. We demonstrate the robustness of our method by benchmarking it against three state-of-the-art approaches designed for large discrete action spaces across three different environments. Our results show that DNC matches or outperforms state-of-the-art approaches while being more computationally efficient. Furthermore, our method scales to action spaces that so far remained computationally intractable for existing methodologies.
Neural LerPlane Representations for Fast 4D Reconstruction of Deformable Tissues
Abstract
Reconstructing deformable tissues from endoscopic stereo videos in robotic surgery is crucial for various clinical applications. However, existing methods relying only on implicit representations are computationally expensive and require dozens of hours, which limits further practical applications. To address this challenge, we introduce LerPlane, a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting. LerPlane treats surgical procedures as 4D volumes and factorizes them into explicit 2D planes of static and dynamic fields, leading to a compact memory footprint and significantly accelerated optimization. The efficient factorization is accomplished by fusing features obtained through linear interpolation of each plane and enables using lightweight neural networks to model surgical scenes. Besides, LerPlane shares static fields, significantly reducing the workload of dynamic tissue modeling. We also propose a novel sample scheme to boost optimization and improve performance in regions with tool occlusion and large motions. Experiments on DaVinci robotic surgery videos demonstrate that LerPlane accelerates optimization by over 100$\times$ while maintaining high quality across various non-rigid deformations, showing significant promise for future intraoperative surgery applications.
Fully Dynamic Submodular Maximization over Matroids
Abstract
Maximizing monotone submodular functions under a matroid constraint is a classic algorithmic problem with multiple applications in data mining and machine learning. We study this classic problem in the fully dynamic setting, where elements can be both inserted and deleted in real-time. Our main result is a randomized algorithm that maintains an efficient data structure with an $\tilde{O}(k^2)$ amortized update time (in the number of additions and deletions) and yields a $4$-approximate solution, where $k$ is the rank of the matroid.
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
Authors: Fei Ni, Jianye Hao, Yao Mu, Yifu Yuan, Yan Zheng, Bin Wang, Zhixuan Liang
Abstract
Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL). However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation. The key is to learn a context conditioned diffusion model which can generate task-oriented trajectories for planning across diverse tasks. To enhance the dynamics consistency of the generated trajectories while encouraging trajectories to achieve high returns, we further design a dual-guided module in the sampling process of the diffusion model. The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task and the flexibility to incorporate with different task representation method. The experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines, demonstrating the outstanding conditional generation ability of diffusion architecture.
A Geometric Perspective on Diffusion Models
Authors: Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang
Abstract
Recent years have witnessed significant progress in developing efficient training and fast sampling approaches for diffusion models. A recent remarkable advancement is the use of stochastic differential equations (SDEs) to describe data perturbation and generative modeling in a unified mathematical framework. In this paper, we reveal several intriguing geometric structures of diffusion models and contribute a simple yet powerful interpretation to their sampling dynamics. Through carefully inspecting a popular variance-exploding SDE and its marginal-preserving ordinary differential equation (ODE) for sampling, we discover that the data distribution and the noise distribution are smoothly connected with an explicit, quasi-linear sampling trajectory, and another implicit denoising trajectory, which even converges faster in terms of visual quality. We also establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm, with which we can characterize the asymptotic behavior of diffusion models and identify the score deviation. These new geometric observations enable us to improve previous sampling algorithms, re-examine latent interpolation, as well as re-explain the working principles of distillation-based fast sampling techniques.
Regulated Pure Pursuit for Robot Path Tracking
Authors: Steve Macenski, Shrijit Singh, Francisco Martin, Jonatan Gines
Abstract
The accelerated deployment of service robots have spawned a number of algorithm variations to better handle real-world conditions. Many local trajectory planning techniques have been deployed on practical robot systems successfully. While most formulations of Dynamic Window Approach and Model Predictive Control can progress along paths and optimize for additional criteria, the use of pure path tracking algorithms is still commonplace. Decades later, Pure Pursuit and its variants continues to be one of the most commonly utilized classes of local trajectory planners. However, few Pure Pursuit variants have been proposed with schema for variable linear velocities - they either assume a constant velocity or fails to address the point at all. This paper presents a variant of Pure Pursuit designed with additional heuristics to regulate linear velocities, built atop the existing Adaptive variant. The Regulated Pure Pursuit algorithm makes incremental improvements on state of the art by adjusting linear velocities with particular focus on safety in constrained and partially observable spaces commonly negotiated by deployed robots. We present experiments with the Regulated Pure Pursuit algorithm on industrial-grade service robots. We also provide a high-quality reference implementation that is freely included ROS 2 Nav2 framework at https://github.com/ros-planning/navigation2 for fast evaluation.
Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance
Abstract
Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria or multiple learning tasks. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL.
Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
Abstract
Recent years have witnessed considerable achievements in editing images with text instructions. When applying these editors to dynamic scene editing, the new-style scene tends to be temporally inconsistent due to the frame-by-frame nature of these 2D editors. To tackle this issue, we propose Control4D, a novel approach for high-fidelity and temporally consistent 4D portrait editing. Control4D is built upon an efficient 4D representation with a 2D diffusion-based editor. Instead of using direct supervisions from the editor, our method learns a 4D GAN from it and avoids the inconsistent supervision signals. Specifically, we employ a discriminator to learn the generation distribution based on the edited images and then update the generator with the discrimination signals. For more stable training, multi-level information is extracted from the edited images and used to facilitate the learning of the generator. Experimental results show that Control4D surpasses previous approaches and achieves more photo-realistic and consistent 4D editing performances. The link to our project website is https://control4darxiv.github.io.
Keyword: adaptive
Indoor Localization using Bluetooth and Inertial Motion Sensors in Distributed Edge and Cloud Computing Environment
Authors: Yashar Kiarashi, Chaitra Hedge, Venkata Siva Krishna Madala, ArjunSinh Nakum, Ratan Singh, Robert Tweedy, Gari D. Clifford, Hyeokhyen Kwon
Abstract
Spatial navigation of indoor space usage patterns reveals important cues about the cognitive health of individuals. In this work, we present a low-cost, scalable, open-source edge computing system using Bluetooth Low Energy (BLE) and Inertial Measurement Unit sensors (IMU) for tracking indoor movements for a large indoor facility (over 1600 m^2) that was designed to facilitate therapeutic activities for individuals with Mild Cognitive Impairment. The facility is instrumented with 39 edge computing systems with an on-premise fog server, and subjects carry BLE beacon and IMU sensors on-body. We proposed an adaptive trilateration approach that considers the temporal density of hits from the BLE beacon to surrounding edge devices to handle inconsistent coverage of edge devices in large spaces with varying signal strength that leads to intermittent detection of beacons. The proposed BLE-based localization is further enhanced by fusing with an IMU-based tracking method using a dead-reckoning technique. Our experiment results, achieved in a real clinical environment, suggest that an ordinary medical facility can be transformed into a smart space that enables automatic assessment of the individual patients' movements.
Abstract
The projection operation is a critical component in a wide range of optimization algorithms, such as online gradient descent (OGD), for enforcing constraints and achieving optimal regret bounds. However, it suffers from computational complexity limitations in high-dimensional settings or when dealing with ill-conditioned constraint sets. Projection-free algorithms address this issue by replacing the projection oracle with more efficient optimization subroutines. But to date, these methods have been developed primarily in the Euclidean setting, and while there has been growing interest in optimization on Riemannian manifolds, there has been essentially no work in trying to utilize projection-free tools here. An apparent issue is that non-trivial affine functions are generally non-convex in such domains. In this paper, we present methods for obtaining sub-linear regret guarantees in online geodesically convex optimization on curved spaces for two scenarios: when we have access to (a) a separation oracle or (b) a linear optimization oracle. For geodesically convex losses, and when a separation oracle is available, our algorithms achieve $O(T^{1/2}\:)$ and $O(T^{3/4}\;)$ adaptive regret guarantees in the full information setting and the bandit setting, respectively. When a linear optimization oracle is available, we obtain regret rates of $O(T^{3/4}\;)$ for geodesically convex losses and $O(T^{2/3}\; log T )$ for strongly geodesically convex losses
Bearing-Constrained Leader-Follower Formation of Single-Integrators with Disturbance Rejection: Adaptive Variable-Structure Approaches
Authors: Thanh Truong Nguyen, Dung Van Vu, Tuynh Van Pham, Minh Hoang Trinh
Abstract
This paper studies the problem of stabilizing a leader-follower formation specified by a set of bearing constraints and being disturbed by unknown uniformly bounded disturbance. A set of leaders are positioned at the desired positions, while each follower agent is modeled by a single integrator with disturbance of which the upper bound is unavailable for the control design. Adaptive variable-structure formation control laws using only displacements or bearing vectors are provided to stabilize the agents to a desired formation. Thanks to the adaptive mechanism, the control laws require neither the information of the bearing Laplacian nor the upper bounds of the disturbance. It is further proved that when the leaders are moving with the same bounded uniformly continuous velocity, the moving target formation can still be achieved under the proposed control laws. Simulation results are also given to support the stability analysis.
Generating Finite Element Codes combining Adaptive Octrees with Complex Geometries
Authors: Eric Heisler, Cheng-Hau Yang, Aadesh Deshmukh, Baskar Ganapathysubramanian, Hari Sundar
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
We present a high-level domain-specific language (DSL) interface to drive an adaptive incomplete $k$-d tree-based framework for finite element (FEM) solutions to PDEs. This DSL provides three key advances: (a) it abstracts out the complexity of implementing non-trivial FEM formulations, (b) it simplifies deploying these formulations on arbitrarily complicated and adaptively refined meshes, and (c) it exhibits good parallel performance. Taken together, the DSL interface allows end-users to rapidly and efficiently prototype new mathematical approaches, and deploy them on large clusters for solving practical problems. We illustrate this DSL by implementing a workflow for solving PDEs using the recently developed shifted boundary method (SBM). The SBM requires approximating relatively complicated integrals over boundary surfaces. Using a high-level DSL greatly simplifies this process and allows rapid exploration of variations. We demonstrate these tools on a variety of 2-D and 3-D configurations. With fewer than 20 lines of input, we can produce a parallel code that scales well to thousands of processes. This generated code is made accessible and readable to be easily modified and hand-tuned, making this tool useful even to experts with the target software.
Incremental Learning for Heterogeneous Structure Segmentation in Brain Tumor MRI
Authors: Xiaofeng Liu, Helen A. Shih, Fangxu Xing, Emiliano Santarnecchi, Georges El Fakhri, Jonghye Woo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Medical Physics (physics.med-ph)
Abstract
Deep learning (DL) models for segmenting various anatomical structures have achieved great success via a static DL model that is trained in a single source domain. Yet, the static DL model is likely to perform poorly in a continually evolving environment, requiring appropriate model updates. In an incremental learning setting, we would expect that well-trained static models are updated, following continually evolving target domain data -- e.g., additional lesions or structures of interest -- collected from different sites, without catastrophic forgetting. This, however, poses challenges, due to distribution shifts, additional structures not seen during the initial model training, and the absence of training data in a source domain. To address these challenges, in this work, we seek to progressively evolve an ``off-the-shelf" trained segmentation model to diverse datasets with additional anatomical categories in a unified manner. Specifically, we first propose a divergence-aware dual-flow module with balanced rigidity and plasticity branches to decouple old and new tasks, which is guided by continuous batch renormalization. Then, a complementary pseudo-label training scheme with self-entropy regularized momentum MixUp decay is developed for adaptive network optimization. We evaluated our framework on a brain tumor segmentation task with continually changing target domains -- i.e., new MRI scanners/modalities with incremental structures. Our framework was able to well retain the discriminability of previously learned structures, hence enabling the realistic life-long segmentation model extension along with the widespread accumulation of big medical data.
Adapting Fairness Interventions to Missing Values
Authors: Raymond Feng, Flavio P. Calmon, Hao Wang
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Information Theory (cs.IT); Machine Learning (stat.ML)
Abstract
Missing values in real-world data pose a significant and unique challenge to algorithmic fairness. Different demographic groups may be unequally affected by missing data, and the standard procedure for handling missing values where first data is imputed, then the imputed data is used for classification -- a procedure referred to as "impute-then-classify" -- can exacerbate discrimination. In this paper, we analyze how missing values affect algorithmic fairness. We first prove that training a classifier from imputed data can significantly worsen the achievable values of group fairness and average accuracy. This is because imputing data results in the loss of the missing pattern of the data, which often conveys information about the predictive label. We present scalable and adaptive algorithms for fair classification with missing values. These algorithms can be combined with any preexisting fairness-intervention algorithm to handle all possible missing patterns while preserving information encoded within the missing patterns. Numerical experiments with state-of-the-art fairness interventions demonstrate that our adaptive algorithms consistently achieve higher fairness and accuracy than impute-then-classify across different datasets.
AdANNS: A Framework for Adaptive Semantic Search
Authors: Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Abstract
Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive retrieval. In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. To this end, we introduce AdANNS, a novel ANNS design framework that explicitly leverages the flexibility of Matryoshka Representations. We demonstrate state-of-the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF at the same compute budget; and matches accuracy while being up to 90x faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline constructed using rigid representations -- same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. Code is open-sourced at https://github.com/RAIVNLab/AdANNS.
OWAdapt: An adaptive loss function for deep learning using OWA operators
Authors: Sebastián Maldonado, Carla Vairetti, Katherine Jara, Miguel Carrasco, Julio López
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we propose a fuzzy adaptive loss function for enhancing deep learning performance in classification tasks. Specifically, we redefine the cross-entropy loss to effectively address class-level noise conditions, including the challenging problem of class imbalance. Our approach introduces aggregation operators, leveraging the power of fuzzy logic to improve classification accuracy. The rationale behind our proposed method lies in the iterative up-weighting of class-level components within the loss function, focusing on those with larger errors. To achieve this, we employ the ordered weighted average (OWA) operator and combine it with an adaptive scheme for gradient-based learning. Through extensive experimentation, our method outperforms other commonly used loss functions, such as the standard cross-entropy or focal loss, across various binary and multiclass classification tasks. Furthermore, we explore the influence of hyperparameters associated with the OWA operators and present a default configuration that performs well across different experimental settings.
Efficient Implementation of a Multi-Layer Gradient-Free Online-Trainable Spiking Neural Network on FPGA
Authors: Ali Mehrabi, Yeshwanth Bethi, André van Schaik, Andrew Wabnitz, Saeed Afshar
Abstract
This paper presents an efficient hardware implementation of the recently proposed Optimized Deep Event-driven Spiking Neural Network Architecture (ODESA). ODESA is the first network to have end-to-end multi-layer online local supervised training without using gradients and has the combined adaptation of weights and thresholds in an efficient hierarchical structure. This research shows that the network architecture and the online training of weights and thresholds can be implemented efficiently on a large scale in hardware. The implementation consists of a multi-layer Spiking Neural Network (SNN) and individual training modules for each layer that enable online self-learning without using back-propagation. By using simple local adaptive selection thresholds, a Winner-Takes-All (WTA) constraint on each layer, and a modified weight update rule that is more amenable to hardware, the trainer module allocates neuronal resources optimally at each layer without having to pass high-precision error measurements across layers. All elements in the system, including the training module, interact using event-based binary spikes. The hardware-optimized implementation is shown to preserve the performance of the original algorithm across multiple spatial-temporal classification problems with significantly reduced hardware requirements.
Is Learning in Games Good for the Learners?
Authors: William Brown, Jon Schneider, Kiran Vodrahalli
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Abstract
We consider a number of questions related to tradeoffs between reward and regret in repeated gameplay between two agents. To facilitate this, we introduce a notion of {\it generalized equilibrium} which allows for asymmetric regret constraints, and yields polytopes of feasible values for each agent and pair of regret constraints, where we show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. As a central example, we highlight the case one agent is no-swap and the other's regret is unconstrained. We show that this captures an extension of {\it Stackelberg} equilibria with a matching optimal value, and that there exists a wide class of games where a player can significantly increase their utility by deviating from a no-swap-regret algorithm against a no-swap learner (in fact, almost any game without pure Nash equilibria is of this form). Additionally, we make use of generalized equilibria to consider tradeoffs in terms of the opponent's algorithm choice. We give a tight characterization for the maximal reward obtainable against {\it some} no-regret learner, yet we also show a class of games in which this is bounded away from the value obtainable against the class of common ``mean-based'' no-regret algorithms. Finally, we consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when the game is initially unknown. Again we show tradeoffs depending on the opponent's learning algorithm: the Stackelberg strategy is learnable in exponential time with any no-regret agent (and in polynomial time with any no-{\it adaptive}-regret agent) for any game where it is learnable via queries, and there are games where it is learnable in polynomial time against any no-swap-regret agent but requires exponential time against a mean-based no-regret agent.
Perception and Semantic Aware Regularization for Sequential Confidence Calibration
Abstract
Deep sequence recognition (DSR) models receive increasing attention due to their superior application to various applications. Most DSR models use merely the target sequences as supervision without considering other related sequences, leading to over-confidence in their predictions. The DSR models trained with label smoothing regularize labels by equally and independently smoothing each token, reallocating a small value to other tokens for mitigating overconfidence. However, they do not consider tokens/sequences correlations that may provide more effective information to regularize training and thus lead to sub-optimal performance. In this work, we find tokens/sequences with high perception and semantic correlations with the target ones contain more correlated and effective information and thus facilitate more effective regularization. To this end, we propose a Perception and Semantic aware Sequence Regularization framework, which explore perceptively and semantically correlated tokens/sequences as regularization. Specifically, we introduce a semantic context-free recognition and a language model to acquire similar sequences with high perceptive similarities and semantic correlation, respectively. Moreover, over-confidence degree varies across samples according to their difficulties. Thus, we further design an adaptive calibration intensity module to compute a difficulty score for each samples to obtain finer-grained regularization. Extensive experiments on canonical sequence recognition tasks, including scene text and speech recognition, demonstrate that our method sets novel state-of-the-art results. Code is available at https://github.com/husterpzh/PSSR.
Abstract
We introduce the problem of active causal structure learning with advice. In the typical well-studied setting, the learning algorithm is given the essential graph for the observational distribution and is asked to recover the underlying causal directed acyclic graph (DAG) $G^$ while minimizing the number of interventions made. In our setting, we are additionally given side information about $G^$ as advice, e.g. a DAG $G$ purported to be $G^$. We ask whether the learning algorithm can benefit from the advice when it is close to being correct, while still having worst-case guarantees even when the advice is arbitrarily bad. Our work is in the same space as the growing body of research on algorithms with predictions. When the advice is a DAG $G$, we design an adaptive search algorithm to recover $G^$ whose intervention cost is at most $O(\max{1, \log \psi})$ times the cost for verifying $G^$; here, $\psi$ is a distance measure between $G$ and $G^$ that is upper bounded by the number of variables $n$, and is exactly 0 when $G=G^*$. Our approximation factor matches the state-of-the-art for the advice-less setting.
Federated Learning on Heterogeneous Data via Adaptive Self-Distillation
Abstract
Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data. In practice, there can often be substantial heterogeneity (e.g., class imbalance) across the local data distributions observed by each of these clients. Under such non-iid data distributions across clients, FL suffers from the 'client-drift' problem where every client converges to its own local optimum. This results in slower convergence and poor performance of the aggregated model. To address this limitation, we propose a novel regularization technique based on adaptive self-distillation (ASD) for training models on the client side. Our regularization scheme adaptively adjusts to the client's training data based on: (1) the closeness of the local model's predictions with that of the global model and (2) the client's label distribution. The proposed regularization can be easily integrated atop existing, state-of-the-art FL algorithms leading to a further boost in the performance of these off-the-shelf methods. We demonstrate the efficacy of our proposed FL approach through extensive experiments on multiple real-world benchmarks (including datasets with common corruptions and perturbations) and show substantial gains in performance over the state-of-the-art methods.
Adaptive Compatible Performance Control for Spacecraft Attitude Control under Motion Constraints with Guaranteed Accuracy
Authors: Jiakun Lei, Tao Meng, Yang Zhu, Kun Wang, Weijia Wang
Abstract
This paper focuses on the problem of spacecraft attitude control in the presence of time-varying parameter uncertainties and multiple constraints, accounting for angular velocity limitation, performance requirements, and input saturation. To tackle this problem, we propose a modified framework called Compatible Performance Control (CPC), which integrates the Prescribed Performance Control (PPC) scheme with a contradiction detection and alleviation strategy. Firstly, by introducing the Zeroing Barrier Function (ZBF) concept, we propose a detection strategy to yield judgment on the compatibility between the angular velocity constraint and the performance envelope constraint. Subsequently, we propose a projection operator-governed dynamical system with a varying upper bound to generate an appropriate bounded performance envelope-modification signal if a contradiction exists, thereby alleviating the contradiction and promoting compatibility within the system. Next, a dynamical filter technique is introduced to construct a bounded reference velocity signal to address the angular velocity limitation. Furthermore, we employ a time-varying gain technique to address the challenge posed by time-varying parameter uncertainties, further developing an adaptive strategy that exhibits robustness on disturbance rejection. By utilizing the proposed CPC scheme and time-varying gain adaptive strategy, we construct an adaptive CPC controller, which guarantees the ultimate boundedness of the system, and all constraints are satisfied simultaneously during the whole control process. Finally, numerical simulation results are presented to show the effectiveness of the proposed framework.
Adaptive Reduced Attitude Control for Spacecraft Boresight Alignment under Multiple Constraints with Guaranteed Accuracy
Authors: Jiakun Lei, Tao Meng, Kun Wang, Weijia Wang, Shujian Sun, Lei Wang
Abstract
This paper investigates the boresight-alignment attitude control problem of spacecraft under time-varying parameter uncertainties in a manner that ensures both safety and performance, accounting for pointing-forbidden constraints, attitude rotation rate limitations, and pointing accuracy requirements. To tackle this problem, we propose a modified composite framework that combines the Artificial Potential Field (APF) methodology with the Prescribed Performance Control (PPC) scheme. The APF scheme is employed to ensure safety, while the PPC scheme is employed for performance consideration. Further, a switching prescribed performance function (S-PPF) is proposed to promote compatibility between safety and performance when the system encounters obstacle regions. To further address the concern regarding parameter uncertainties, we utilize the Immersion-and-Invariance (I\&I) adaptive control technique, and we further introduce the concept called "congelation of variables" to build a compensating auxiliary system, facilitating the stability by providing a feedforwardly compensating for time-varying unknown parameter dynamics. Based on the proposed scheme, an adaptive composite controller is derived, guaranteeing the asymptotic convergence of the closed-loop system. Finally, numerical simulation results are carried out to validate the effectiveness of the proposed scheme.
Isogeometric Multi-Resolution Full Waveform Inversion based on the Finite Cell Method
Authors: Tim Bürchner, Philipp Kopp, Stefan Kollmannsberger, Ernst Rank
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Full waveform inversion (FWI) is an iterative identification process that serves to minimize the misfit of model-based simulated and experimentally measured wave field data, with the goal of identifying a field of parameters for a given physical object. The inverse optimization process of FWI is based on forward and backward solutions of the (elastic or acoustic) eave equation. In a previous paper [1], we explored opportunities of using the finite cell method (FCM) as the wave field solver to incorporate highly complex geometric models. Furthermore, we demonstrated that the identification of the model's density outperforms that of the velocity -- particularly in cases where unknown voids characterized by homogeneous Neumann boundary conditions need to be detected. The paper at hand extends this previous study: The isogeometric finite cell analysis (IGA-FCM) -- a combination of isogeometric analysis (IGA) and FCM -- is applied for the wave field solver, with the advantage that the polynomial degree and subsequently also the sampling frequency of the wave field can be increased quite easily. Since the inversion efficiency strongly depends on the accuracy of the forward and backward wave field solution and of the gradient of the functional, consistent and lumped mass matrix discretization are compared. The resolution of the grid describing the unknown material density is the decouple from the knot span grid. Finally, we propose an adaptive multi-resolution algorithm that refines the material grid only locally using an image processing-based refinement indicator. The developed inversion framework allows fast and memory-efficient wave simulation and object identification. While we study the general behavior of the proposed approach on 2D benchmark problems, a final 3D problem shows that it can also be used to identify voids in geometrically complex spatial structures.
Adaptive and Explainable Deployment of Navigation Skills via Hierarchical Deep Reinforcement Learning
Abstract
For robotic vehicles to navigate robustly and safely in unseen environments, it is crucial to decide the most suitable navigation policy. However, most existing deep reinforcement learning based navigation policies are trained with a hand-engineered curriculum and reward function which are difficult to be deployed in a wide range of real-world scenarios. In this paper, we propose a framework to learn a family of low-level navigation policies and a high-level policy for deploying them. The main idea is that, instead of learning a single navigation policy with a fixed reward function, we simultaneously learn a family of policies that exhibit different behaviors with a wide range of reward functions. We then train the high-level policy which adaptively deploys the most suitable navigation skill. We evaluate our approach in simulation and the real world and demonstrate that our method can learn diverse navigation skills and adaptively deploy them. We also illustrate that our proposed hierarchical learning framework presents explainability by providing semantics for the behavior of an autonomous agent.
Tracking Power System Events with Accuracy-Based PMU Adaptive Reporting Rate
Authors: Guglielmo Frigo, Paolo Attilio Pegoraro, Sergio Toscani
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
Abstract
Fast dynamics and transient events are becoming more and more frequent in power systems, due to the high penetration of renewable energy sources and the consequent lack of inertia. In this scenario, Phasor Measurement Units (PMUs) are expected to track the monitored quantities. Such functionality is related not only to the PMU accuracy (as per the IEC/IEEE 60255-118-1 standard) but also to the PMU reporting rate (RR). High RRs allow tracking fast dynamics, but produce many redundant measurement data in normal conditions. In view of an effective tradeoff, the present paper proposes an adaptive RR mechanism based on a real-time selection of the measurements, with the target of preserving the information content while reducing the data rate. The proposed method has been tested considering real-world datasets and applied to four different PMU algorithms. The results prove the method effectiveness in reducing the average data throughput as well as its scalability at PMU concentrator or storage level.
Off-By-One Implementation Error in J-UNIWARD
Authors: Benedikt Lorch
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract
J-UNIWARD is a popular steganography method for hiding secret messages in JPEG cover images. As a content-adaptive method, J-UNIWARD aims to embed into textured image regions where changes are difficult to detect. To this end, J-UNIWARD first assigns to each DCT coefficient an embedding cost calculated based on the image's Wavelet residual, and then uses a coding method that minimizes the cost while embedding the desired payload. Changing one DCT coefficient affects a 23x23 window of Wavelet coefficients. To speed up the costmap computation, the original implementation pre-computes the Wavelet residual and then considers per changed DCT coefficient a 23x23 window of the Wavelet residual. However, the implementation accesses a window accidentally shifted by one pixel to the bottom right. In this report, we evaluate the effect of this off-by-one error on the resulting costmaps. Some image blocks are over-priced while other image blocks are under-priced, but the difference is relatively small. The off-by-one error seems to make little difference for learning-based steganalysis.
Adaptive Conformal Regression with Jackknife+ Rescaled Scores
Authors: Nicolas Deutschmann, Mattia Rigotti, Maria Rodriguez Martinez
Abstract
Conformal regression provides prediction intervals with global coverage guarantees, but often fails to capture local error distributions, leading to non-homogeneous coverage. We address this with a new adaptive method based on rescaling conformal scores with an estimate of local score distribution, inspired by the Jackknife+ method, which enables the use of calibration data in conformal scores without breaking calibration-test exchangeability. Our approach ensures formal global coverage guarantees and is supported by new theoretical results on local coverage, including an a posteriori bound on any calibration score. The strength of our approach lies in achieving local coverage without sacrificing calibration set size, improving the applicability of conformal prediction intervals in various settings. As a result, our method provides prediction intervals that outperform previous methods, particularly in the low-data regime, making it especially relevant for real-world applications such as healthcare and biomedical domains where uncertainty needs to be quantified accurately despite low sample data.
Joint Adaptive Representations for Image-Language Learning
Authors: AJ Piergiovanni, Anelia Angelova
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Image-language learning has made unprecedented progress in visual understanding. These developments have come at high costs, as contemporary vision-language models require large model scales and amounts of data. We here propose a much easier recipe for image-language learning, which produces effective models, outperforming bigger and more expensive ones, often trained on orders of magnitude larger datasets. Our key finding is the joint learning of a compact vision and language representation, which adaptively and iteratively fuses the multi-modal features. This results in a more effective image-language learning, greatly lowering the FLOPs by combining and reducing the number of tokens for both text and images, e.g. a 33\% reduction in FLOPs is achieved, compared to baseline fusion techniques used by popular image-language models, while improving performance. This also allows the model to scale without a large increase in FLOPs or memory. In addition, we propose adaptive pre-training data sampling which improves the data efficiency. The proposed approach achieves competitive performance compared to much larger models, and does so with significantly less data and FLOPs. With only 40M training examples and with 39 GFLOPs our lightweight model outperforms many times larger state-of-the-art models of 2-20x more FLOPs and using bigger datasets some of which with close to 1B training examples.
Regulated Pure Pursuit for Robot Path Tracking
Authors: Steve Macenski, Shrijit Singh, Francisco Martin, Jonatan Gines
Abstract
The accelerated deployment of service robots have spawned a number of algorithm variations to better handle real-world conditions. Many local trajectory planning techniques have been deployed on practical robot systems successfully. While most formulations of Dynamic Window Approach and Model Predictive Control can progress along paths and optimize for additional criteria, the use of pure path tracking algorithms is still commonplace. Decades later, Pure Pursuit and its variants continues to be one of the most commonly utilized classes of local trajectory planners. However, few Pure Pursuit variants have been proposed with schema for variable linear velocities - they either assume a constant velocity or fails to address the point at all. This paper presents a variant of Pure Pursuit designed with additional heuristics to regulate linear velocities, built atop the existing Adaptive variant. The Regulated Pure Pursuit algorithm makes incremental improvements on state of the art by adjusting linear velocities with particular focus on safety in constrained and partially observable spaces commonly negotiated by deployed robots. We present experiments with the Regulated Pure Pursuit algorithm on industrial-grade service robots. We also provide a high-quality reference implementation that is freely included ROS 2 Nav2 framework at https://github.com/ros-planning/navigation2 for fast evaluation.
A Unified Conditional Framework for Diffusion-based Image Restoration
Authors: Yi Zhang, Xiaoyu Shi, Dasong Li, Xiaogang Wang, Jian Wang, Hongsheng Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Diffusion Probabilistic Models (DPMs) have recently shown remarkable performance in image generation tasks, which are capable of generating highly realistic images. When adopting DPMs for image restoration tasks, the crucial aspect lies in how to integrate the conditional information to guide the DPMs to generate accurate and natural output, which has been largely overlooked in existing works. In this paper, we present a unified conditional framework based on diffusion models for image restoration. We leverage a lightweight UNet to predict initial guidance and the diffusion model to learn the residual of the guidance. By carefully designing the basic module and integration module for the diffusion model block, we integrate the guidance and other auxiliary conditional information into every block of the diffusion model to achieve spatially-adaptive generation conditioning. To handle high-resolution images, we propose a simple yet effective inter-step patch-splitting strategy to produce arbitrary-resolution images without grid artifacts. We evaluate our conditional framework on three challenging tasks: extreme low-light denoising, deblurring, and JPEG restoration, demonstrating its significant improvements in perceptual quality and the generalization to restoration tasks.
Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision
Abstract
Attribution algorithms are frequently employed to explain the decisions of neural network models. Integrated Gradients (IG) is an influential attribution method due to its strong axiomatic foundation. The algorithm is based on integrating the gradients along a path from a reference image to the input image. Unfortunately, it can be observed that gradients computed from regions where the output logit changes minimally along the path provide poor explanations for the model decision, which is called the saturation effect problem. In this paper, we propose an attribution algorithm called integrated decision gradients (IDG). The algorithm focuses on integrating gradients from the region of the path where the model makes its decision, i.e., the portion of the path where the output logit rapidly transitions from zero to its final value. This is practically realized by scaling each gradient by the derivative of the output logit with respect to the path. The algorithm thereby provides a principled solution to the saturation problem. Additionally, we minimize the errors within the Riemann sum approximation of the path integral by utilizing non-uniform subdivisions determined by adaptive sampling. In the evaluation on ImageNet, it is demonstrated that IDG outperforms IG, left-IG, guided IG, and adversarial gradient integration both qualitatively and quantitatively using standard insertion and deletion metrics across three common models.
Keyword: quantization
Low Precision Quantization-aware Training in Spiking Neural Networks with Differentiable Quantization Function
Authors: Ayan Shymyrbay, Mohammed E. Fouda, Ahmed Eltawil
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Deep neural networks have been proven to be highly effective tools in various domains, yet their computational and memory costs restrict them from being widely deployed on portable devices. The recent rapid increase of edge computing devices has led to an active search for techniques to address the above-mentioned limitations of machine learning frameworks. The quantization of artificial neural networks (ANNs), which converts the full-precision synaptic weights into low-bit versions, emerged as one of the solutions. At the same time, spiking neural networks (SNNs) have become an attractive alternative to conventional ANNs due to their temporal information processing capability, energy efficiency, and high biological plausibility. Despite being driven by the same motivation, the simultaneous utilization of both concepts has yet to be thoroughly studied. Therefore, this work aims to bridge the gap between recent progress in quantized neural networks and SNNs. It presents an extensive study on the performance of the quantization function, represented as a linear combination of sigmoid functions, exploited in low-bit weight quantization in SNNs. The presented quantization function demonstrates the state-of-the-art performance on four popular benchmarks, CIFAR10-DVS, DVS128 Gesture, N-Caltech101, and N-MNIST, for binary networks (64.05\%, 95.45\%, 68.71\%, and 99.43\% respectively) with small accuracy drops and up to 31$\times$ memory savings, which outperforms existing methods.
AdANNS: A Framework for Adaptive Semantic Search
Authors: Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Abstract
Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive retrieval. In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. To this end, we introduce AdANNS, a novel ANNS design framework that explicitly leverages the flexibility of Matryoshka Representations. We demonstrate state-of-the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF at the same compute budget; and matches accuracy while being up to 90x faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline constructed using rigid representations -- same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. Code is open-sourced at https://github.com/RAIVNLab/AdANNS.
Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN
Authors: Yangfan Hu, Qian Zheng, Xudong Jiang, Gang Pan
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Spiking neural networks (SNNs) have shown advantages in computation and energy efficiency over traditional artificial neural networks (ANNs) thanks to their event-driven representations. SNNs also replace weight multiplications in ANNs with additions, which are more energy-efficient and less computationally intensive. However, it remains a challenge to train deep SNNs due to the discrete spike function. A popular approach to circumvent this challenge is ANN-to-SNN conversion. However, due to the quantization error and accumulating error, it often requires lots of time steps (high inference latency) to achieve high performance, which negates SNN's advantages. To this end, this paper proposes Fast-SNN that achieves high performance with low latency. We demonstrate the equivalent mapping between temporal quantization in SNNs and spatial quantization in ANNs, based on which the minimization of the quantization error is transferred to quantized ANN training. With the minimization of the quantization error, we show that the sequential error is the primary cause of the accumulating error, which is addressed by introducing a signed IF neuron model and a layer-wise fine-tuning mechanism. Our method achieves state-of-the-art performance and low latency on various computer vision tasks, including image classification, object detection, and semantic segmentation. Codes are available at: https://github.com/yangfan-hu/Fast-SNN.
Keyword: efficient
Perimeter Control Using Deep Reinforcement Learning: A Model-free Approach towards Homogeneous Flow Rate Optimization
Cones 2: Customizable Image Synthesis with Multiple Subjects
On Riemannian Projection-free Online Learning
An absolutely convergent fixed-point fast sweeping WENO method on triangular meshes for steady state of hyperbolic conservation laws
Blockwise Parallel Transformer for Long Context Large Models
Compositional diversity in visual concept learning
Generating Finite Element Codes combining Adaptive Octrees with Complex Geometries
FRAMM: Fair Ranking with Missing Modalities for Clinical Trial Site Selection
Are Large Kernels Better Teachers than Transformers for ConvNets?
Efficient Training of Energy-Based Models Using Jarzynski Equality
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Efficient Implementation of a Multi-Layer Gradient-Free Online-Trainable Spiking Neural Network on FPGA
Exploring Lottery Prompts for Pre-trained Language Models
Kaczmarz-Type Methods for Solving Matrix Equations
DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator
Replicability in Reinforcement Learning
Zero-Shot Automatic Pronunciation Assessment
Confidential Signal Cancellation Phenomenon in Interference Alignment Networks: Cause and Cure
Towards Omni-generalizable Neural Methods for Vehicle Routing Problems
Neural Kernel Surface Reconstruction
Monitoring the evolution of dimensional accuracy and product properties in property-controlled forming processes
Measuring and Predicting the Quality of a Join for Data Discovery
Improving Expressivity of Graph Neural Networks using Localization
Vandermonde Neural Operators
Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA
Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation
VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning Challenges
An Efficient Machine Learning-based Channel Prediction Technique for OFDM Sub-Bands
Isogeometric Multi-Resolution Full Waveform Inversion based on the Finite Cell Method
Low-Complexity Dynamic Directional Modulation: Vulnerability and Information Leakage
APPRAISER: DNN Fault Resilience Analysis Employing Approximation Errors
Off-By-One Implementation Error in J-UNIWARD
IDAS: Intent Discovery with Abstractive Summarization
A Survey of Label-Efficient Deep Learning for 3D Point Clouds
Homogenization of nondivergence-form elliptic equations with discontinuous coefficients and finite element approximation of the homogenized problem
Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization
Parametric Shape Holomorphy of Boundary Integral Operators with Applications
Multi-Channel Operation for the Release 2 of ETSI Cooperative Intelligent Transport Systems
Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN
Handling Large Discrete Action Spaces via Dynamic Neighborhood Construction
Hidden Stabilizers, the Isogeny To Endomorphism Ring Problem and the Cryptanalysis of pSIDH
Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations
How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases
Neural LerPlane Representations for Fast 4D Reconstruction of Deformable Tissues
Fully Dynamic Submodular Maximization over Matroids
MSKdeX: Musculoskeletal (MSK) decomposition from an X-ray image for fine-grained estimation of lean muscle mass and muscle volume
Image Registration of In Vivo Micro-Ultrasound and Ex Vivo Pseudo-Whole Mount Histopathology Images of the Prostate: A Proof-of-Concept Study
A Geometric Perspective on Diffusion Models
MicroSegNet: A Deep Learning Approach for Prostate Segmentation on Micro-Ultrasound Images
DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting
A Study of Bayesian Neural Network Surrogates for Bayesian Optimization
Simulation and Retargeting of Complex Multi-Character Interactions
Deception by Omission: Using Adversarial Missingness to Poison Causal Structure Learning
Latent Exploration for Reinforcement Learning
Decision-Oriented Dialogue for Human-AI Collaboration
Efficient Diffusion Policies for Offline Reinforcement Learning
Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
Too Large; Data Reduction for Vision-Language Pre-Training
Keyword: faster
AdANNS: A Framework for Adaptive Semantic Search
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Graph Entropy Minimization for Semi-supervised Node Classification
Recasting Self-Attention with Holographic Reduced Representations
Vandermonde Neural Operators
Deep learning and MCMC with aggVAE for shifting administrative boundaries: mapping malaria prevalence in Kenya
A technique to jointly estimate depth and depth uncertainty for unmanned aerial vehicles
A Geometric Perspective on Diffusion Models
Feature Learning in Image Hierarchies using Functional Maximal Correlation
Keyword: mobile
LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model
Vision Transformers for Mobile Applications: A Short Survey
User Driven Functionality Deletion for Mobile Apps
Rare Life Event Detection via Mobile Sensing Using Multi-Task Learning
Keyword: pruning
Budget-Aware Graph Convolutional Network Design using Probabilistic Magnitude Pruning
infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information
Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Keyword: diffusion
Fine-grained Text Style Transfer with Diffusion-Based Language Models
Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels
Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model
Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards
Mask, Stitch, and Re-Sample: Enhancing Robustness and Generalizability in Anomaly Detection through Automatic Diffusion Models
Deep Stochastic Mechanics
Spontaneous symmetry breaking in generative diffusion models
A hybridizable discontinuous Galerkin method for magnetic advection-diffusion problems
Direct Diffusion Bridge using Data Consistency for Inverse Problems
Homogenization of nondivergence-form elliptic equations with discontinuous coefficients and finite element approximation of the homogenized problem
Inverse-design of nonlinear mechanical metamaterials via video denoising diffusion models
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
A Geometric Perspective on Diffusion Models
GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations
Protein Design with Guided Discrete Diffusion
Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust
A Unified Conditional Framework for Diffusion-based Image Restoration
Efficient Diffusion Policies for Offline Reinforcement Learning
Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
Understanding and Mitigating Copying in Diffusion Models
Keyword: dynamic
Memory as a Mass-based Graph: Towards a Conceptual Framework for the Simulation Model of Human Memory in AI
Tracking the whole-body centre of mass while seated in a wheelchair using motion capture
DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling
Efficient Training of Energy-Based Models Using Jarzynski Equality
Dynamic Sparsity Is Channel-Level Sparsity Learner
DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator
About Decisiveness of Dynamic Probabilistic Models
Economics of Spot Instance Service: A Two-stage Dynamic Game Apporach
Medication Recommendation via Domain Knowledge Informed Deep Learning
Adaptive Compatible Performance Control for Spacecraft Attitude Control under Motion Constraints with Guaranteed Accuracy
Composite Triggered Intermittent Control for Constrained Spacecraft Attitude Tracking
Adaptive Reduced Attitude Control for Spacecraft Boresight Alignment under Multiple Constraints with Guaranteed Accuracy
Equivalence between State-Space Stability Analysis and Generalized Nyquist Criterion with Frequency Dynamics Included
Spontaneous symmetry breaking in generative diffusion models
Optimal Decision Trees for Separable Objectives: Pushing the Limits of Dynamic Programming
Low-Complexity Dynamic Directional Modulation: Vulnerability and Information Leakage
Direct Learning-Based Deep Spiking Neural Networks: A Review
Learning Representations without Compositional Assumptions
Neural Markov Jump Processes
Tracking Power System Events with Accuracy-Based PMU Adaptive Reporting Rate
Evolutionary Solution Adaption for Multi-Objective Metal Cutting Process Optimization
An Empirical Study of Fault Localization in Python Programs
Learning Task-preferred Inference Routes for Gradient De-conflict in Multi-output DNNs
Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive
Stability and convergence of in time approximations of hyperbolic elastodynamics via stepwise minimization
Handling Large Discrete Action Spaces via Dynamic Neighborhood Construction
Neural LerPlane Representations for Fast 4D Reconstruction of Deformable Tissues
Fully Dynamic Submodular Maximization over Matroids
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
A Geometric Perspective on Diffusion Models
Regulated Pure Pursuit for Robot Path Tracking
Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance
Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
Keyword: adaptive
Indoor Localization using Bluetooth and Inertial Motion Sensors in Distributed Edge and Cloud Computing Environment
On Riemannian Projection-free Online Learning
Bearing-Constrained Leader-Follower Formation of Single-Integrators with Disturbance Rejection: Adaptive Variable-Structure Approaches
Generating Finite Element Codes combining Adaptive Octrees with Complex Geometries
Incremental Learning for Heterogeneous Structure Segmentation in Brain Tumor MRI
Adapting Fairness Interventions to Missing Values
AdANNS: A Framework for Adaptive Semantic Search
OWAdapt: An adaptive loss function for deep learning using OWA operators
Efficient Implementation of a Multi-Layer Gradient-Free Online-Trainable Spiking Neural Network on FPGA
Is Learning in Games Good for the Learners?
Perception and Semantic Aware Regularization for Sequential Confidence Calibration
Active causal structure learning with advice
Federated Learning on Heterogeneous Data via Adaptive Self-Distillation
Adaptive Compatible Performance Control for Spacecraft Attitude Control under Motion Constraints with Guaranteed Accuracy
Adaptive Reduced Attitude Control for Spacecraft Boresight Alignment under Multiple Constraints with Guaranteed Accuracy
Isogeometric Multi-Resolution Full Waveform Inversion based on the Finite Cell Method
Adaptive and Explainable Deployment of Navigation Skills via Hierarchical Deep Reinforcement Learning
Tracking Power System Events with Accuracy-Based PMU Adaptive Reporting Rate
Off-By-One Implementation Error in J-UNIWARD
Adaptive Conformal Regression with Jackknife+ Rescaled Scores
Joint Adaptive Representations for Image-Language Learning
Regulated Pure Pursuit for Robot Path Tracking
A Unified Conditional Framework for Diffusion-based Image Restoration
Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision
Keyword: quantization
Low Precision Quantization-aware Training in Spiking Neural Networks with Differentiable Quantization Function
AdANNS: A Framework for Adaptive Semantic Search
Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN