Abstract
Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS that efficiently approximates the Hessian using history parameter and gradient changes, suffers convergence instability in stochastic training. So far, attempts that adapt L-BFGS to large-scale stochastic training incur considerable extra overhead, which offsets its convergence benefits in wall-clock time. In this paper, we propose mL-BFGS, a lightweight momentum-based L-BFGS algorithm that paves the way for quasi-Newton (QN) methods in large-scale distributed deep neural network (DNN) optimization. mL-BFGS introduces a nearly cost-free momentum scheme into L-BFGS update and greatly reduces stochastic noise in the Hessian, therefore stabilizing convergence during stochastic optimization. For model training at a large scale, mL-BFGS approximates a block-wise Hessian, thus enabling distributing compute and memory costs across all computing nodes. We provide a supporting convergence analysis for mL-BFGS in stochastic settings. To investigate mL-BFGS potential in large-scale DNN training, we train benchmark neural models using mL-BFGS and compare performance with baselines (SGD, Adam, and other quasi-Newton methods). Results show that mL-BFGS achieves both noticeable iteration-wise and wall-clock speedup.
A Novel Computationally Efficient Group Signature for Anonymous and Secure V2X Communications
Authors: Jia Liu, Liqun Chen, Mehrdad Dianati, Carsten Maple, Yan Yan
Abstract
The use of vehicle-to-everything (V2X) communication is expected to significantly improve road safety and traffic management. We present an efficient protocol, called the AEE protocol, for protecting data authenticity and user privacy in V2X applications. Our protocol provides event-based likability, which enables messages from a subject vehicle to be linked to a specific event in order to prevent Sybil attacks. Messages on different events are unlinkable to preserve the long-term privacy of vehicles. Moreover, our protocol introduces a new method for generating temporary public keys to reduce computing and transmission overheads. Such a temporary public key is bound with a certain event and is automatically revoked when the event is over. We describe how to apply our protocol in vehicular communications using two exemplar use cases. To further reduce the real-time computational complexity, our protocol enables us to decompose the cryptographic operations into offline processes for complex operations and real-time processes for fast computations.
Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor
Authors: Gavin Parpart, Sumedh R. Risbud, Garrett T. Kenyon, Yijing Watkins
Subjects: Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR)
Abstract
Neuromorphic processors have garnered considerable interest in recent years for their potential in energy-efficient and high-speed computing. The Locally Competitive Algorithm (LCA) has been utilized for power efficient sparse coding on neuromorphic processors, including the first Loihi processor. With the Loihi 2 processor enabling custom neuron models and graded spike communication, more complex implementations of LCA are possible. We present a new implementation of LCA designed for the Loihi 2 processor and perform an initial set of benchmarks comparing it to LCA on CPU and GPU devices. In these experiments LCA on Loihi 2 is orders of magnitude more efficient and faster for large sparsity penalties, while maintaining similar reconstruction quality. We find this performance improvement increases as the LCA parameters are tuned towards greater representation sparsity. Our study highlights the potential of neuromorphic processors, particularly Loihi 2, in enabling intelligent, autonomous, real-time processing on small robots, satellites where there are strict SWaP (small, lightweight, and low power) requirements. By demonstrating the superior performance of LCA on Loihi 2 compared to conventional computing device, our study suggests that Loihi 2 could be a valuable tool in advancing these types of applications. Overall, our study highlights the potential of neuromorphic processors for efficient and accurate data processing on resource-constrained devices.
A real-time material breakage detection for offshore wind turbines based on improved neural network algorithm
Authors: Yantong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
The integrity of offshore wind turbines, pivotal for sustainable energy generation, is often compromised by surface material defects. Despite the availability of various detection techniques, limitations persist regarding cost-effectiveness, efficiency, and applicability. Addressing these shortcomings, this study introduces a novel approach leveraging an advanced version of the YOLOv8 object detection model, supplemented with a Convolutional Block Attention Module (CBAM) for improved feature recognition. The optimized loss function further refines the learning process. Employing a dataset of 5,432 images from the Saemangeum offshore wind farm and a publicly available dataset, our method underwent rigorous testing. The findings reveal a substantial enhancement in defect detection stability, marking a significant stride towards efficient turbine maintenance. This study's contributions illuminate the path for future research, potentially revolutionizing sustainable energy practices.
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
Abstract
As the size of transformer-based models continues to grow, fine-tuning these large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. Parameter-efficient learning has been developed to reduce the number of tunable parameters during fine-tuning. Although these methods show promising results, there is still a significant performance gap compared to full fine-tuning. To address this challenge, we propose an Effective and Efficient Visual Prompt Tuning (E^2VPT) approach for large-scale transformer-based model adaptation. Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning. Moreover, we design a prompt pruning procedure to systematically prune low importance prompts while preserving model performance, which largely enhances the model's efficiency. Empirical results demonstrate that our approach outperforms several state-of-the-art baselines on two benchmarks, with considerably low parameter usage (e.g., 0.32% of model parameters on VTAB-1k). Our code is available at https://github.com/ChengHan111/E2VPT.
Upward Planarity Testing of Biconnected Outerplanar DAGs Solves Partition
Authors: Fabrizio Frati
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
Abstract
We show an $O(n)$-time reduction from the problem of testing whether a multiset of positive integers can be partitioned into two multisets so that the sum of the integers in each multiset is equal to $n/2$ to the problem of testing whether an $n$-vertex biconnected outerplanar DAG admits an upward planar drawing. This constitutes the first barrier to the existence of efficient algorithms for testing the upward planarity of DAGs with no large triconnected minor. We also show a result in the opposite direction. Suppose that partitioning a multiset of positive integers into two multisets so that the sum of the integers in each multiset is $n/2$ can be solved in $f(n)$ time. Let $G$ be an $n$-vertex biconnected outerplanar DAG and $e$ be an edge incident to the outer face of an outerplanar drawing of $G$. Then it can be tested in $O(f(n))$ time whether $G$ admits an upward planar drawing with $e$ on the outer face.
Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT
Authors: Taha Emre, Marzieh Oghbaie, Arunava Chakravarty, Antoine Rivail, Sophie Riedl, Julia Mai, Hendrik P.N. Scholl, Sobha Sivaprasad, Daniel Rueckert, Andrew Lotery, Ursula Schmidt-Erfurth, Hrvoje Bogunović
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
In the field of medical imaging, 3D deep learning models play a crucial role in building powerful predictive models of disease progression. However, the size of these models presents significant challenges, both in terms of computational resources and data requirements. Moreover, achieving high-quality pretraining of 3D models proves to be even more challenging. To address these issues, hybrid 2.5D approaches provide an effective solution for utilizing 3D volumetric data efficiently using 2D models. Combining 2D and 3D techniques offers a promising avenue for optimizing performance while minimizing memory requirements. In this paper, we explore 2.5D architectures based on a combination of convolutional neural networks (CNNs), long short-term memory (LSTM), and Transformers. In addition, leveraging the benefits of recent non-contrastive pretraining approaches in 2D, we enhanced the performance and data efficiency of 2.5D techniques even further. We demonstrate the effectiveness of architectures and associated pretraining on a task of predicting progression to wet age-related macular degeneration (AMD) within a six-month period on two large longitudinal OCT datasets.
Good Lattice Training: Physics-Informed Neural Networks Accelerated by Number Theory
Abstract
Physics-informed neural networks (PINNs) offer a novel and efficient approach to solving partial differential equations (PDEs). Their success lies in the physics-informed loss, which trains a neural network to satisfy a given PDE at specific points and to approximate the solution. However, the solutions to PDEs are inherently infinite-dimensional, and the distance between the output and the solution is defined by an integral over the domain. Therefore, the physics-informed loss only provides a finite approximation, and selecting appropriate collocation points becomes crucial to suppress the discretization errors, although this aspect has often been overlooked. In this paper, we propose a new technique called good lattice training (GLT) for PINNs, inspired by number theoretic methods for numerical analysis. GLT offers a set of collocation points that are effective even with a small number of points and for multi-dimensional spaces. Our experiments demonstrate that GLT requires 2--20 times fewer collocation points (resulting in lower computational cost) than uniformly random sampling or Latin hypercube sampling, while achieving competitive performance.
Efficient Estimation of the Local Robustness of Machine Learning Models
Abstract
Machine learning models often need to be robust to noisy input data. The effect of real-world noise (which is often random) on model predictions is captured by a model's local robustness, i.e., the consistency of model predictions in a local region around an input. However, the na\"ive approach to computing local robustness based on Monte-Carlo sampling is statistically inefficient, leading to prohibitive computational costs for large-scale applications. In this work, we develop the first analytical estimators to efficiently compute local robustness of multi-class discriminative models using local linear function approximation and the multivariate Normal CDF. Through the derivation of these estimators, we show how local robustness is connected to concepts such as randomized smoothing and softmax probability. We also confirm empirically that these estimators accurately and efficiently compute the local robustness of standard deep learning models. In addition, we demonstrate these estimators' usefulness for various tasks involving local robustness, such as measuring robustness bias and identifying examples that are vulnerable to noise perturbation in a dataset. By developing these analytical estimators, this work not only advances conceptual understanding of local robustness, but also makes its computation practical, enabling the use of local robustness in critical downstream applications.
Dynamic Grouping for Climate Change Negotiation: Facilitating Cooperation and Balancing Interests through Effective Strategies
Authors: Yu Qin, Duo Zhang, Yuren Pang
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Abstract
In this paper, we propose a dynamic grouping negotiation model for climate mitigation based on real-world business and political negotiation protocols. Within the AI4GCC competition framework, we develop a three-stage process: group formation and updates, intra-group negotiation, and inter-group negotiation. Our model promotes efficient and effective cooperation between various stakeholders to achieve global climate change objectives. By implementing a group-forming method and group updating strategy, we address the complexities and imbalances in multi-region climate negotiations. Intra-group negotiations ensure that all members contribute to mitigation efforts, while inter-group negotiations use the proposal-evaluation framework to set mitigation and savings rates. We demonstrate our negotiation model within the RICE-N framework, illustrating a promising approach for facilitating international cooperation on climate change mitigation.
Low-Parameter Federated Learning with Large Language Models
Authors: Jingang Jiang, Xiangyang Liu, Chenyou Fan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
We study few-shot Natural Language Understanding (NLU) tasks with Large Language Models (LLMs) in federated learning (FL) scenarios. It is a challenging task due to limited labeled data and communication capacities in FL, especially with mobile devices. Recent studies show LLMs can be prompted to perform few-shot NLU tasks like sentiment analysis and arithmetic reasoning. However, the huge sizes of LLMs result in high computation and communication costs, making classical FL schemes impractical. To address these challenges, we propose Low-Parameter Federated Learning (LP-FL). LP-FL combines few-shot prompt learning from LLMs with efficient communication and federating techniques. Our approach enables federated clients to assign soft labels to unlabeled data using gradually learned knowledge from the global model. Through iterative soft-label assigning, we continually expand the labeled set during the FL process. Additionally, to reduce computation and communication costs, LP-FL utilizes the Low-Rank Adaptation (LoRA) technique for compact learnable parameter construction, efficient local model fine-tuning, and affordable global model federation. LP-FL consistently outperforms Full-Parameter Federated Learning (FP-FL) in sentiment analysis tasks across various FL settings. Its resistance to overfitting allows LP-FL to equal or surpass centralized training in few-shot scenarios.
AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets
Authors: Siyi Du, Nourhan Bayasi, Ghassan Harmarneh, Rafeef Garbi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Skin lesion segmentation (SLS) plays an important role in skin lesion analysis. Vision transformers (ViTs) are considered an auspicious solution for SLS, but they require more training data compared to convolutional neural networks (CNNs) due to their inherent parameter-heavy structure and lack of some inductive biases. To alleviate this issue, current approaches fine-tune pre-trained ViT backbones on SLS datasets, aiming to leverage the knowledge learned from a larger set of natural images to lower the amount of skin training data needed. However, fully fine-tuning all parameters of large backbones is computationally expensive and memory intensive. In this paper, we propose AViT, a novel efficient strategy to mitigate ViTs' data-hunger by transferring any pre-trained ViTs to the SLS task. Specifically, we integrate lightweight modules (adapters) within the transformer layers, which modulate the feature representation of a ViT without updating its pre-trained weights. In addition, we employ a shallow CNN as a prompt generator to create a prompt embedding from the input image, which grasps fine-grained information and CNN's inductive biases to guide the segmentation task on small datasets. Our quantitative experiments on 4 skin lesion datasets demonstrate that AViT achieves competitive, and at times superior, performance to SOTA but with significantly fewer trainable parameters. Our code is available at https://github.com/siyi-wind/AViT.
Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation
Authors: Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Text-to-3D generation has recently garnered significant attention, fueled by 2D diffusion models trained on billions of image-text pairs. Existing methods primarily rely on score distillation to leverage the 2D diffusion priors to supervise the generation of 3D models, e.g., NeRF. However, score distillation is prone to suffer the view inconsistency problem, and implicit NeRF modeling can also lead to an arbitrary shape, thus leading to less realistic and uncontrollable 3D generation. In this work, we propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation by distilling the knowledge from both 2D and 3D diffusion models. The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation. Specifically, we use the sparse point cloud generated from the 3D diffusion model, Point-E, as the geometric prior, conditioned on a single reference image. To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points. In addition to controlling the geometry, we propose to optimize the NeRF for a more view-consistent appearance. To be specific, we perform score distillation to the publicly available 2D image diffusion model ControlNet, conditioned on text as well as depth map of the learned compact geometry. Qualitative and quantitative comparisons demonstrate that Points-to-3D improves view consistency and achieves good shape controllability for text-to-3D generation. Points-to-3D provides users with a new way to improve and control text-to-3D generation.
BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery
Authors: Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong
Abstract
Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on stochastic gradient Markov Chain Monte Carlo (SG-MCMC) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluations on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines.
On the hardness of finding balanced independent sets in random bipartite graphs
Authors: Will Perkins, Yuzhou Wang
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Probability (math.PR)
Abstract
We consider the algorithmic problem of finding large \textit{balanced} independent sets in sparse random bipartite graphs, and more generally the problem of finding independent sets with specified proportions of vertices on each side of the bipartition. In a bipartite graph it is trivial to find an independent set of density at least half (take one of the partition classes). In contrast, in a random bipartite graph of average degree $d$, the largest balanced independent sets (containing equal number of vertices from each class) are typically of density $(2+o_d(1)) \frac{\log d}{d}$. Can we find such large balanced independent sets in these graphs efficiently? By utilizing the overlap gap property and the low-degree algorithmic framework, we prove that local and low-degree algorithms (even those that know the bipartition) cannot find balanced independent sets of density greater than $(1+\epsilon) \frac{\log d}{d}$ for any $\epsilon>0$ fixed and $d$ large but constant. This factor $2$ statistical--computational gap between what exists and what local algorithms can achieve is analogous to the gap for finding large independent sets in (non-bipartite) random graphs. Our results therefor suggest that this gap is pervasive in many models, and that hard computational problems can lurk inside otherwise tractable ones. A particularly striking aspect of the gap in bipartite graphs is that the algorithm achieving the lower bound is extremely simple and can be implemented as a $1$-local algorithm and a degree-$1$ polynomial (a linear function).
trajdata: A Unified Interface to Multiple Human Trajectory Datasets
Authors: Boris Ivanovic, Guanyu Song, Igor Gilitschenski, Marco Pavone
Abstract
The field of trajectory forecasting has grown significantly in recent years, partially owing to the release of numerous large-scale, real-world human trajectory datasets for autonomous vehicles (AVs) and pedestrian motion tracking. While such datasets have been a boon for the community, they each use custom and unique data formats and APIs, making it cumbersome for researchers to train and evaluate methods across multiple datasets. To remedy this, we present trajdata: a unified interface to multiple human trajectory datasets. At its core, trajdata provides a simple, uniform, and efficient representation and API for trajectory and map data. As a demonstration of its capabilities, in this work we conduct a comprehensive empirical evaluation of existing trajectory datasets, providing users with a rich understanding of the data underpinning much of current pedestrian and AV motion forecasting research, and proposing suggestions for future datasets from these insights. trajdata is permissively licensed (Apache 2.0) and can be accessed online at https://github.com/NVlabs/trajdata
Fourier Growth of Communication Protocols for XOR Functions
Authors: Uma Girish, Makrand Sinha, Avishay Tal, Kewen Wu
Abstract
The level-$k$ $\ell1$-Fourier weight of a Boolean function refers to the sum of absolute values of its level-$k$ Fourier coefficients. Fourier growth refers to the growth of these weights as $k$ grows. It has been extensively studied for various computational models, and bounds on the Fourier growth, even for the first few levels, have proven useful in learning theory, circuit lower bounds, pseudorandomness, and quantum-classical separations. We investigate the Fourier growth of certain functions that naturally arise from communication protocols for XOR functions (partial functions evaluated on the bitwise XOR of the inputs to Alice and Bob). If a protocol $\mathcal C$ computes an XOR function, then $\mathcal C(x,y)$ is a function of the parity $x\oplus y$. This motivates us to analyze the XOR-fiber of $\mathcal C$, defined as $h(z):=\mathbb E{x,y}[\mathcal C(x,y)|x\oplus y=z]$. We present improved Fourier growth bounds for the XOR-fibers of protocols that communicate $d$ bits. For the first level, we show a tight $O(\sqrt d)$ bound and obtain a new coin theorem, as well as an alternative proof for the tight randomized communication lower bound for Gap-Hamming. For the second level, we show an $d^{3/2}\cdot\mathrm{polylog}(n)$ bound, which improves the previous $O(d^2)$ bound by Girish, Raz, and Tal (ITCS 2021) and implies a polynomial improvement on the randomized communication lower bound for the XOR-lift of Forrelation, extending its quantum-classical gap. Our analysis is based on a new way of adaptively partitioning a relatively large set in Gaussian space to control its moments in all directions. We achieve this via martingale arguments and allowing protocols to transmit real values. We also show a connection between Fourier growth and lifting theorems with constant-sized gadgets as a potential approach to prove optimal bounds for the second level and beyond.
Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics
Authors: Aamal Hussain, Francesco Belardinelli, Georgios Piliouras
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption. Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by thedistance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent.
The stabilized exponential-SAV approach preserving maximum bound principle for nonlocal Allen-Cahn equation
Authors: Xiaoqing Meng, Aijie Cheng, Zhengguang Liu
Abstract
The nonlocal Allen-Cahn equation with nonlocal diffusion operator is a generalization of the classical Allen-Cahn equation. It satisfies the energy dissipation law and maximum bound principle (MBP), and is important for simulating a series of physical and biological phenomena involving long-distance interactions in space. In this paper, we construct first- and second-order (in time) accurate, unconditionally energy stable and MBP-preserving schemes for the nonlocal Allen-Cahn type model based on the stabilized exponential scalar auxiliary variable (sESAV) approach. On the one hand, we have proved the MBP and unconditional energy stability carefully and rigorously in the fully discrete levels. On the other hand, we adopt an efficient FFT-based fast solver to compute the nearly full coefficient matrix generated from the spatial discretization, which improves the computational efficiency. Finally, typical numerical experiments are presented to demonstrate the performance of our proposed schemes.
Formal Verification of Robotic Contact Tasks via Reachability Analysis
Authors: Chencheng Tang, Matthias Althoff
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Verifying the correct behavior of robots in contact tasks is challenging due to model uncertainties associated with contacts. Standard methods for testing often fall short since all (uncountable many) solutions cannot be obtained. Instead, we propose to formally and efficiently verify robot behaviors in contact tasks using reachability analysis, which enables checking all the reachable states against user-provided specifications. To this end, we extend the state of the art in reachability analysis for hybrid (mixed discrete and continuous) dynamics subject to discrete-time input trajectories. In particular, we present a novel and scalable guard intersection approach to reliably compute the complex behavior caused by contacts. We model robots subject to contacts as hybrid automata in which crucial time delays are included. The usefulness of our approach is demonstrated by verifying safe human-robot interaction in the presence of constrained collisions, which was out of reach for existing methods.
Time multiscale modeling of sorption kinetics I: uniformly accurate schemes for highly oscillatory advection-diffusion equation
Authors: Clarissa Astuto, Mohammed Lemou, Giovanni Russo
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
In this paper we propose a numerical method to solve a 2D advection-diffusion equation, in the highly oscillatory regime. We use an efficient and robust integrator which leads to an accurate approximation of the solution without any time step-size restriction. Uniform first and second order numerical approximations in time are obtained with errors, and at a cost, that are independent of the oscillation frequency. {This work is part of a long time project, and the final goal is the resolution of a Stokes-advection-diffusion system, in which the expression for the velocity in the advection term, is the solution of the Stokes equations.} This paper focuses on the time multiscale challenge, coming from the velocity that is an $\varepsilon-$periodic function, whose expression is explicitly known. We also introduce a two--scale formulation, as a first step to the numerical resolution of the complete oscillatory Stokes-advection-diffusion system, that is currently under investigation. This two--scale formulation is also useful to understand the asymptotic behaviour of the solution.
Adaptive Frequency Filters As Efficient Global Token Mixers
Abstract
Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable successes in broad vision tasks thanks to their effective information fusion in the global scope. However, their efficient deployments, especially on mobile devices, still suffer from noteworthy challenges due to the heavy computational costs of self-attention mechanisms, large kernels, or fully connected layers. In this work, we apply conventional convolution theorem to deep learning for addressing this and reveal that adaptive frequency filters can serve as efficient global token mixers. With this insight, we propose Adaptive Frequency Filtering (AFF) token mixer. This neural operator transfers a latent representation to the frequency domain via a Fourier transform and performs semantic-adaptive frequency filtering via an elementwise multiplication, which mathematically equals to a token mixing operation in the original latent space with a dynamic convolution kernel as large as the spatial resolution of this latent representation. We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet. Extensive experiments demonstrate the effectiveness of our proposed AFF token mixer and show that AFFNet achieve superior accuracy and efficiency trade-offs compared to other lightweight network designs on broad visual tasks, including visual recognition and dense prediction tasks.
Car-Studio: Learning Car Radiance Fields from Single-View and Endless In-the-wild Images
Authors: Tianyu Liu, Hao Zhao, Yang Yu, Guyue Zhou, Ming Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Compositional neural scene graph studies have shown that radiance fields can be an efficient tool in an editable autonomous driving simulator. However, previous studies learned within a sequence of autonomous driving datasets, resulting in unsatisfactory blurring when rotating the car in the simulator. In this letter, we propose a pipeline for learning unconstrained images and building a dataset from processed images. To meet the requirements of the simulator, which demands that the vehicle maintain clarity when the perspective changes and that the contour remains sharp from the background to avoid artifacts when editing, we design a radiation field of the vehicle, a crucial part of the urban scene foreground. Through experiments, we demonstrate that our model achieves competitive performance compared to baselines. Using the datasets built from in-the-wild images, our method gradually presents a controllable appearance editing function. We will release the dataset and code on https://lty2226262.github.io/car-studio/ to facilitate further research in the field.
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution
Authors: Mingjin Zhang, Chi Zhang, Qiming Zhang, Jie Guo, Xinbo Gao, Jing Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation. However, the prevailing CNN-based approaches have shown limitations in building long-range dependencies and capturing interaction information between spectral features. This results in inadequate utilization of spectral information and artifacts after upsampling. To address this issue, we propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure. Specifically, we first introduce a robust and spectral-friendly similarity metric, \ie, the spectral correlation coefficient of the spectrum (SCC), to replace the original attention matrix and incorporates inductive biases into the model to facilitate training. Built upon it, we further utilize the kernelizable attention technique with theoretical support to form a novel efficient SCC-kernel-based self-attention (ESSA) and reduce attention computation to linear complexity. ESSA enlarges the receptive field for features after upsampling without bringing much computation and allows the model to effectively utilize spatial-spectral information from different scales, resulting in the generation of more natural high-resolution images. Without the need for pretraining on large-scale datasets, our experiments demonstrate ESSA's effectiveness in both visual quality and quantitative results.
3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability
Abstract
Shape generation is the practice of producing 3D shapes as various representations for 3D content creation. Previous studies on 3D shape generation have focused on shape quality and structure, without or less considering the importance of semantic information. Consequently, such generative models often fail to preserve the semantic consistency of shape structure or enable manipulation of the semantic attributes of shapes during generation. In this paper, we proposed a novel semantic generative model named 3D Semantic Subspace Traverser that utilizes semantic attributes for category-specific 3D shape generation and editing. Our method utilizes implicit functions as the 3D shape representation and combines a novel latent-space GAN with a linear subspace model to discover semantic dimensions in the local latent space of 3D shapes. Each dimension of the subspace corresponds to a particular semantic attribute, and we can edit the attributes of generated shapes by traversing the coefficients of those dimensions. Experimental results demonstrate that our method can produce plausible shapes with complex structures and enable the editing of semantic attributes. The code and trained models are available at https://github.com/TrepangCat/3D_Semantic_Subspace_Traverser
Relay-Enabled Backscatter Communications: Linear Mapping and Resource Allocation
Authors: Rui Xu, Liqin Shi, Yinghui Ye, Haijian Sun, Gan Zheng
Abstract
Relay-enabled backscatter communication (BC) is an intriguing paradigm to alleviate energy shortage and improve throughput of Internet-of-Things (IoT) devices. Most of the existing works focus on the resource allocation that considered the unequal and continuous time allocation for both source-relay and relay-destination links. However, the continuous time allocation may be infeasible since in practice, the time allocation shall be carried out in integral multiple of the subframe duration unit. In this article, we study a discrete time scheme from the perspective of frame structure, where one transmission block is divided into two phases and the linear mapping is employed as a re-encoding method to determine the number of subframes for both phases and the power allocation for each subframe in a relay-enabled BC system. Based on this, we derive an accurate system-throughput expression and formulate a mixed-integral non-convex optimization problem to maximize the system throughput by jointly optimizing the power reflection coefficient (PRC) of the IoT node, the power allocation of the hybrid access point (HAP) and the linear mapping matrix, and solve it via a three-step approach. Accordingly, we propose a low complexity iterative algorithm to obtain the throughput maximization-based resource allocation solution. Numerical results analyze the performance of our proposed algorithm, verify the superiority of our proposed scheme, and evaluate the impacts of network parameters on the system throughput.
Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation
Authors: Long Liu, Bo Zhou, Zhipeng Zhao, Zening Liu
Abstract
Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from related source domains to an unlabeled target domain. While recent MUDA methods have shown promising results, most focus on aligning the overall feature distributions across source domains, which can lead to negative effects due to redundant features within each domain. Moreover, there is a significant performance gap between MUDA and supervised methods. To address these challenges, we propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA). Firstly, we establish a multi-source dynamic modulation mechanism during the training process based on the degree of distribution differences between source and target domains. This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains. Additionally, we propose a Multi-source Active Boundary Sample Selection (MABS) strategy, which utilizes a guided dynamic boundary loss to design an efficient query function for selecting important samples. This strategy achieves improved generalization to the target domain with minimal sampling costs. We extensively evaluate our proposed method on commonly used domain adaptation datasets, comparing it against existing UDA and ADA methods. The experimental results unequivocally demonstrate the superiority of our approach.
Gleam: An RDMA-accelerated Multicast Protocol for Datacenter Networks
Abstract
RDMA has been widely adopted for high-speed datacenter networks. However, native RDMA merely supports one-to-one reliable connection, which mismatches various applications with group communication patterns (e.g., one-to-many). While there are some multicast enhancements to address it, they all fail to simultaneously achieve optimal multicast forwarding and fully unleash the distinguished RDMA capabilities. In this paper, we present Gleam, an RDMA-accelerated multicast protocol that simultaneously supports optimal multicast forwarding, efficient utilization of the prominent RDMA capabilities, and compatibility with the commodity RNICs. At its core, Gleam re-purposes the existing RDMA RC logic with careful switch coordination as an efficient multicast transport. Gleam performs the one-to-many connection maintenance and many-to-one feedback aggregation, based on an extended multicast forwarding table structure, to achieve integration between standard RC logic and in-fabric multicast. We implement a fully functional Gleam prototype. With extensive testbed experiments and simulations, we demonstrate Gleam's significant improvement in accelerating multicast communication of realistic applications. For instance, Gleam achieves 2.9X lower communication time of an HPC benchmark application and 2.7X higher data replication throughput.
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks
Abstract
We study reinforcement learning (RL) for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. In specific, at the outset of the game, the leader announces her policy to the follower and commits to it. The follower observes the leader's policy and, in turn, adopts a quantal response policy by solving an entropy-regularized policy optimization problem induced by leader's policy. The goal of the leader is to find her optimal policy, which yields the optimal expected total return, by interacting with the follower and learning from data. A key challenge of this problem is that the leader cannot observe the follower's reward, and needs to infer the follower's quantal response model from his actions against leader's policies. We propose sample-efficient algorithms for both the online and offline settings, in the context of function approximation. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem, and we show that they achieve sublinear regret upper bounds. Moreover, we quantify the uncertainty of these estimators and leverage the uncertainty to implement optimistic and pessimistic algorithms for online and offline settings. Besides, when specialized to the linear and myopic setting, our algorithms are also computationally efficient. Our theoretical analysis features a novel performance-difference lemma which incorporates the error of quantal response model, which might be of independent interest.
Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Authors: Tian-Hao Zhang, Dinghao Zhou, Guiping Zhon, Baoxiang Li
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Abstract
RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve length alignment between input audio and target sequence. However, the implementation complexity and the alignment-based optimization target of RNN-T loss lead to computational redundancy and a reduced role for predictor network, respectively. In this paper, we propose a novel model named CIF-Transducer (CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism with the RNN-T model to achieve efficient alignment. In this way, the RNN-T loss is abandoned, thus bringing a computational reduction and allowing the predictor network a more significant role. We also introduce Funnel-CIF, Context Blocks, Unified Gating and Bilinear Pooling joint network, and auxiliary training strategy to further improve performance. Experiments on the 178-hour AISHELL-1 and 10000-hour WenetSpeech datasets show that CIF-T achieves state-of-the-art results with lower computational overhead compared to RNN-T models.
Learning Disentangled Discrete Representations
Authors: David Friede, Christian Reimers, Heiner Stuckenschmidt, Mathias Niepert
Abstract
Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions mitigates the problem of rotational invariance associated with multivariate Gaussian distributions, acting as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations.
An Antithetic Multilevel Monte Carlo-Milstein Scheme for Stochastic Partial Differential Equations
Authors: Abdul-Lateef Haji-Al, Andreas Stein
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Abstract
We present a novel multilevel Monte Carlo approach for estimating quantities of interest for stochastic partial differential equations (SPDEs). Drawing inspiration from [Giles and Szpruch: Antithetic multilevel Monte Carlo estimation for multi-dimensional SDEs without L\'evy area simulation, Annals of Appl. Prob., 2014], we extend the antithetic Milstein scheme for finite-dimensional stochastic differential equations to Hilbert space-valued SPDEs. Our method has the advantages of both Euler and Milstein discretizations, as it is easy to implement and does not involve intractable L\'evy area terms. Moreover, the antithetic correction in our method leads to the same variance decay in a MLMC algorithm as the standard Milstein method, resulting in significantly lower computational complexity than a corresponding MLMC Euler scheme. Our approach is applicable to a broader range of non-linear diffusion coefficients and does not require any commutative properties. The key component of our MLMC algorithm is a truncated Milstein-type time stepping scheme for SPDEs, which accelerates the rate of variance decay in the MLMC method when combined with an antithetic coupling on the fine scales. We combine the truncated Milstein scheme with appropriate spatial discretizations and noise approximations on all scales to obtain a fully discrete scheme and show that the antithetic coupling does not introduce an additional bias.
Complexity results for the Pilot Assignment problem in Cell-Free Massive MIMO
Abstract
Wireless communication is enabling billions of people to connect to each other and the internet, transforming every sector of the economy, and building the foundations for powerful new technologies that hold great promise to improve lives at an unprecedented rate and scale. The rapid increase in the number of devices and the associated demands for higher data rates and broader network coverage fuels the need for more robust wireless technologies. The key technology identified to address this problem is referred to as Cell-Free Massive MIMO (CF-mMIMO). CF-mMIMO is accompanied by many challenges, one of which is efficiently allocating limited resources. In this paper, we focus on a major resource allocation problem in wireless networks, namely the Pilot Assignment problem (PA). We show that PA is strongly NP-hard and that it does not admit a polynomial-time constant-factor approximation algorithm. Further, we show that PA cannot be approximated in polynomial time within $\mathcal{O}(K^2)$ (where $K$ is the number of users) when the system consists of at least three pilots. Finally, we present an approximation lower bound of $1.058$ (resp. $\epsilon|K|^2$, for $\epsilon >0$) in special cases where the system consists of exactly two (resp. three) pilots.
ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation
Authors: Görkay Aydemir, Adil Kaan Akan, Fatma Güney
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Forecasting future trajectories of agents in complex traffic scenes requires reliable and efficient predictions for all agents in the scene. However, existing methods for trajectory prediction are either inefficient or sacrifice accuracy. To address this challenge, we propose ADAPT, a novel approach for jointly predicting the trajectories of all agents in the scene with dynamic weight learning. Our approach outperforms state-of-the-art methods in both single-agent and multi-agent settings on the Argoverse and Interaction datasets, with a fraction of their computational overhead. We attribute the improvement in our performance: first, to the adaptive head augmenting the model capacity without increasing the model size; second, to our design choices in the endpoint-conditioned prediction, reinforced by gradient stopping. Our analyses show that ADAPT can focus on each agent with adaptive prediction, allowing for accurate predictions efficiently. https://KUIS-AI.github.io/adapt
Application of Random Forest and Support Vector Machine for Investigation of Pressure Filtration Performance, a Zinc Plant Filter Cake Modeling
Authors: Masoume Kazemi, Davood Moradkhani, Alireza Abbas Alipour
Abstract
The hydrometallurgical method of zinc production involves leaching zinc from ore and then separating the solid residue from the liquid solution by pressure filtration. This separation process is very important since the solid residue contains some moisture that can reduce the amount of zinc recovered. This study modeled the pressure filtration process through Random Forest (RF) and Support Vector Machine (SVM). The models take continuous variables (extracted features) from the lab samples as inputs. Thus, regression models namely Random Forest Regression (RFR) and Support Vector Regression (SVR) were chosen. A total dataset was obtained during the pressure filtration process in two conditions: 1) Polypropylene (S1) and 2) Polyester fabrics (S2). To predict the cake moisture, solids concentration (0.2 and 0.38), temperature (35 and 65 centigrade), pH (2, 3.5, and 5), pressure, cake thickness (14, 20, 26, and 34 mm), air-blow time (2, 10 and 15 min) and filtration time were applied as input variables. The models' predictive accuracy was evaluated by the coefficient of determination (R2) parameter. The results revealed that the RFR model is superior to the SVR model for cake moisture prediction.
Sliding Mode Control of Active Magnetic Bearings -- A Cascaded Architecture
Abstract
Accurate and robust positioning of rotor axle is essential for efficient and safe operation of high-speed rotational machines with active magnetic bearings. This study presents a cascaded nonlinear control strategy for vertical axial positioning of an active magnetic bearing system. The proposed scheme employs two sliding mode controllers for regulating rotor vertical position and current and an adaptive estimator to invert the nonlinear input mapping. Uniform asymptotic stability is proven for the closed-loop system and the efficacy and performance of the proposed design is evaluated in simulation.
Large-scale Fully-Unsupervised Re-Identification
Authors: Gabriel Bertocco, Fernanda Andaló, Terrance E. Boult, Anderson Rocha
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Fully-unsupervised Person and Vehicle Re-Identification have received increasing attention due to their broad applicability in surveillance, forensics, event understanding, and smart cities, without requiring any manual annotation. However, most of the prior art has been evaluated in datasets that have just a couple thousand samples. Such small-data setups often allow the use of costly techniques in time and memory footprints, such as Re-Ranking, to improve clustering results. Moreover, some previous work even pre-selects the best clustering hyper-parameters for each dataset, which is unrealistic in a large-scale fully-unsupervised scenario. In this context, this work tackles a more realistic scenario and proposes two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each iteration without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n^2) to O(kn) with k << n. To avoid the pre-selection of specific hyper-parameter values for the clustering algorithm, we also present a novel scheduling algorithm that adjusts the density parameter during training, to leverage the diversity of samples and keep the learning robust to noisy labeling. Finally, due to the complementary knowledge learned by different models, we also introduce a co-training strategy that relies upon the permutation of predicted pseudo-labels, among the backbones, with no need for any hyper-parameters or weighting optimization. The proposed methodology outperforms the state-of-the-art methods in well-known benchmarks and in the challenging large-scale Veri-Wild dataset, with a faster and memory-efficient Re-Ranking strategy, and a large-scale, noisy-robust, and ensemble-based learning approach.
Differentiable Programming & Network Calculus: Configuration Synthesis under Delay Constraints
Authors: Fabien Geyer, Steffen Bondorf
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
With the advent of standards for deterministic network behavior, synthesizing network designs under delay constraints becomes the natural next task to tackle. Network Calculus (NC) has become a key method for validating industrial networks, as it computes formally verified end-to-end delay bounds. However, analyses from the NC framework have been designed to bound the delay of one flow at a time. Attempts to use classical analyses to derive a network configuration have shown that this approach is poorly suited to practical use cases. Consider finding a delay-optimal routing configuration: one model had to be created for each routing alternative, then each flow delay had to be bounded, and then the bounds had to be compared to the given constraints. To overcome this three-step process, we introduce Differential Network Calculus. We extend NC to allow the differentiation of delay bounds w.r.t. to a wide range of network parameters - such as flow paths or priority. This opens up NC to a class of efficient nonlinear optimization techniques that exploit the gradient of the delay bound. Our numerical evaluation on the routing and priority assignment problem shows that our novel method can synthesize flow paths and priorities in a matter of seconds, outperforming existing methods by several orders of magnitude.
Keyword: faster
Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor
Authors: Gavin Parpart, Sumedh R. Risbud, Garrett T. Kenyon, Yijing Watkins
Subjects: Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR)
Abstract
Neuromorphic processors have garnered considerable interest in recent years for their potential in energy-efficient and high-speed computing. The Locally Competitive Algorithm (LCA) has been utilized for power efficient sparse coding on neuromorphic processors, including the first Loihi processor. With the Loihi 2 processor enabling custom neuron models and graded spike communication, more complex implementations of LCA are possible. We present a new implementation of LCA designed for the Loihi 2 processor and perform an initial set of benchmarks comparing it to LCA on CPU and GPU devices. In these experiments LCA on Loihi 2 is orders of magnitude more efficient and faster for large sparsity penalties, while maintaining similar reconstruction quality. We find this performance improvement increases as the LCA parameters are tuned towards greater representation sparsity. Our study highlights the potential of neuromorphic processors, particularly Loihi 2, in enabling intelligent, autonomous, real-time processing on small robots, satellites where there are strict SWaP (small, lightweight, and low power) requirements. By demonstrating the superior performance of LCA on Loihi 2 compared to conventional computing device, our study suggests that Loihi 2 could be a valuable tool in advancing these types of applications. Overall, our study highlights the potential of neuromorphic processors for efficient and accurate data processing on resource-constrained devices.
EasyNet: An Easy Network for 3D Industrial Anomaly Detection
Abstract
3D anomaly detection is an emerging and vital computer vision task in industrial manufacturing (IM). Recently many advanced algorithms have been published, but most of them cannot meet the needs of IM. There are several disadvantages: i) difficult to deploy on production lines since their algorithms heavily rely on large pre-trained models; ii) hugely increase storage overhead due to overuse of memory banks; iii) the inference speed cannot be achieved in real-time. To overcome these issues, we propose an easy and deployment-friendly network (called EasyNet) without using pre-trained models and memory banks: firstly, we design a multi-scale multi-modality feature encoder-decoder to accurately reconstruct the segmentation maps of anomalous regions and encourage the interaction between RGB images and depth images; secondly, we adopt a multi-modality anomaly segmentation network to achieve a precise anomaly map; thirdly, we propose an attention-based information entropy fusion module for feature fusion during inference, making it suitable for real-time deployment. Extensive experiments show that EasyNet achieves an anomaly detection AUROC of 92.6% without using pre-trained models and memory banks. In addition, EasyNet is faster than existing methods, with a high frame rate of 94.55 FPS on a Tesla V100 GPU.
Stochastic $p$th root approximation of a stochastic matrix: A Riemannian optimization approach
Abstract
We propose two approaches, based on Riemannian optimization, for computing a stochastic approximation of the $p$th root of a stochastic matrix $A$. In the first approach, the approximation is found in the Riemannian manifold of positive stochastic matrices. In the second approach, we introduce the Riemannian manifold of positive stochastic matrices sharing with $A$ the Perron eigenvector and we compute the approximation of the $p$th root of $A$ in such a manifold. This way, differently from the available methods based on constrained optimization, $A$ and its $p$th root approximation share the Perron eigenvector. Such a property is relevant, from a modelling point of view, in the embedding problem for Markov chains. The extended numerical experimentation shows that, in the first approach, the Riemannian optimization methods are generally faster and more accurate than the available methods based on constrained optimization. In the second approach, even though the stochastic approximation of the $p$th root is found in a smaller set, the approximation is generally more accurate than the one obtained by standard constrained optimization.
Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models
Authors: Himmet Toprak Kesgin, Muzaffer Kaan Yuce, Mehmet Fatih Amasyali
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
This study introduces and evaluates tiny, mini, small, and medium-sized uncased Turkish BERT models, aiming to bridge the research gap in less-resourced languages. We trained these models on a diverse dataset encompassing over 75GB of text from multiple sources and tested them on several tasks, including mask prediction, sentiment analysis, news classification, and, zero-shot classification. Despite their smaller size, our models exhibited robust performance, including zero-shot task, while ensuring computational efficiency and faster execution times. Our findings provide valuable insights into the development and application of smaller language models, especially in the context of the Turkish language.
Large-scale Fully-Unsupervised Re-Identification
Authors: Gabriel Bertocco, Fernanda Andaló, Terrance E. Boult, Anderson Rocha
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Fully-unsupervised Person and Vehicle Re-Identification have received increasing attention due to their broad applicability in surveillance, forensics, event understanding, and smart cities, without requiring any manual annotation. However, most of the prior art has been evaluated in datasets that have just a couple thousand samples. Such small-data setups often allow the use of costly techniques in time and memory footprints, such as Re-Ranking, to improve clustering results. Moreover, some previous work even pre-selects the best clustering hyper-parameters for each dataset, which is unrealistic in a large-scale fully-unsupervised scenario. In this context, this work tackles a more realistic scenario and proposes two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each iteration without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n^2) to O(kn) with k << n. To avoid the pre-selection of specific hyper-parameter values for the clustering algorithm, we also present a novel scheduling algorithm that adjusts the density parameter during training, to leverage the diversity of samples and keep the learning robust to noisy labeling. Finally, due to the complementary knowledge learned by different models, we also introduce a co-training strategy that relies upon the permutation of predicted pseudo-labels, among the backbones, with no need for any hyper-parameters or weighting optimization. The proposed methodology outperforms the state-of-the-art methods in well-known benchmarks and in the challenging large-scale Veri-Wild dataset, with a faster and memory-efficient Re-Ranking strategy, and a large-scale, noisy-robust, and ensemble-based learning approach.
Reinforcement Learning by Guided Safe Exploration
Authors: Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, Matthijs T. J. Spaan
Abstract
Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.
Keyword: mobile
Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance
Authors: Shantanu Ghosh, Kayhan Batmanghelich
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Discovering a high-performing sparse network within a massive neural network is advantageous for deploying them on devices with limited storage, such as mobile phones. Additionally, model explainability is essential to fostering trust in AI. The Lottery Ticket Hypothesis (LTH) finds a network within a deep network with comparable or superior performance to the original model. However, limited study has been conducted on the success or failure of LTH in terms of explainability. In this work, we examine why the performance of the pruned networks gradually increases or decreases. Using Grad-CAM and Post-hoc concept bottleneck models (PCBMs), respectively, we investigate the explainability of pruned networks in terms of pixels and high-level concepts. We perform extensive experiments across vision and medical imaging datasets. As more weights are pruned, the performance of the network degrades. The discovered concepts and pixels from the pruned networks are inconsistent with the original network -- a possible reason for the drop in performance.
Low-Parameter Federated Learning with Large Language Models
Authors: Jingang Jiang, Xiangyang Liu, Chenyou Fan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
We study few-shot Natural Language Understanding (NLU) tasks with Large Language Models (LLMs) in federated learning (FL) scenarios. It is a challenging task due to limited labeled data and communication capacities in FL, especially with mobile devices. Recent studies show LLMs can be prompted to perform few-shot NLU tasks like sentiment analysis and arithmetic reasoning. However, the huge sizes of LLMs result in high computation and communication costs, making classical FL schemes impractical. To address these challenges, we propose Low-Parameter Federated Learning (LP-FL). LP-FL combines few-shot prompt learning from LLMs with efficient communication and federating techniques. Our approach enables federated clients to assign soft labels to unlabeled data using gradually learned knowledge from the global model. Through iterative soft-label assigning, we continually expand the labeled set during the FL process. Additionally, to reduce computation and communication costs, LP-FL utilizes the Low-Rank Adaptation (LoRA) technique for compact learnable parameter construction, efficient local model fine-tuning, and affordable global model federation. LP-FL consistently outperforms Full-Parameter Federated Learning (FP-FL) in sentiment analysis tasks across various FL settings. Its resistance to overfitting allows LP-FL to equal or surpass centralized training in few-shot scenarios.
Adaptive Frequency Filters As Efficient Global Token Mixers
Abstract
Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable successes in broad vision tasks thanks to their effective information fusion in the global scope. However, their efficient deployments, especially on mobile devices, still suffer from noteworthy challenges due to the heavy computational costs of self-attention mechanisms, large kernels, or fully connected layers. In this work, we apply conventional convolution theorem to deep learning for addressing this and reveal that adaptive frequency filters can serve as efficient global token mixers. With this insight, we propose Adaptive Frequency Filtering (AFF) token mixer. This neural operator transfers a latent representation to the frequency domain via a Fourier transform and performs semantic-adaptive frequency filtering via an elementwise multiplication, which mathematically equals to a token mixing operation in the original latent space with a dynamic convolution kernel as large as the spatial resolution of this latent representation. We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet. Extensive experiments demonstrate the effectiveness of our proposed AFF token mixer and show that AFFNet achieve superior accuracy and efficiency trade-offs compared to other lightweight network designs on broad visual tasks, including visual recognition and dense prediction tasks.
Mining Reddit Data to Elicit Students' Requirements During COVID-19 Pandemic
Abstract
Data-driven requirements engineering leverages the abundance of openly accessible and crowdsourced information on the web. By incorporating user feedback provided about a software product, such as reviews in mobile app stores, these approaches facilitate the identification of issues, bug fixes, and implementation of change requests. However, relying solely on user feedback about a software product limits the possibility of eliciting all requirements, as users may not always have a clear understanding of their exact needs from the software, despite their wealth of experience with the problem, event, or challenges they encounter and use the software to assist them. In this study, we propose a shift in requirements elicitation, focusing on gathering feedback related to the problem itself rather than relying solely on feedback about the software product. We conducted a case study on student requirements during the COVID-19 pandemic in a higher education institution. We gathered their communications from Reddit during the pandemic and employed multiple machine-learning and natural language processing techniques to identify requirement sentences. We achieved the F-score of 0.79 using Naive Bayes with TF-IDF when benchmarking multiple techniques. The results lead us to believe that mining requirements from communication about a problem are feasible. While we present the preliminary results, we envision a future where these requirements complement conventionally elicited requirements and help to close the requirements gap.
CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor
Abstract
Navigation of a mobile robot is conditioned on the knowledge of its pose. In observer-based localisation configurations its initial pose may not be knowable in advance, leading to the need of its estimation. Solutions to the problem of global localisation are either robust against noise and environment arbitrariness but require motion and time, which may (need to) be economised on, or require minimal estimation time but assume environmental structure, may be sensitive to noise, and demand preprocessing and tuning. This article proposes a method that retains the strengths and avoids the weaknesses of the two approaches. The method leverages properties of the Cumulative Absolute Error per Ray metric with respect to the errors of pose estimates of a 2D LIDAR sensor, and utilises scan--to--map-scan matching for fine(r) pose approximations. A large number of tests, in real and simulated conditions, involving disparate environments and sensor properties, illustrate that the proposed method outperforms state-of-the-art methods of both classes of solutions in terms of pose discovery rate and execution time. The source code is available for download.
Keyword: pruning
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
Abstract
As the size of transformer-based models continues to grow, fine-tuning these large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. Parameter-efficient learning has been developed to reduce the number of tunable parameters during fine-tuning. Although these methods show promising results, there is still a significant performance gap compared to full fine-tuning. To address this challenge, we propose an Effective and Efficient Visual Prompt Tuning (E^2VPT) approach for large-scale transformer-based model adaptation. Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning. Moreover, we design a prompt pruning procedure to systematically prune low importance prompts while preserving model performance, which largely enhances the model's efficiency. Empirical results demonstrate that our approach outperforms several state-of-the-art baselines on two benchmarks, with considerably low parameter usage (e.g., 0.32% of model parameters on VTAB-1k). Our code is available at https://github.com/ChengHan111/E2VPT.
Abstract
For an artist or a graphic designer, the spatial layout of a scene is a critical design choice. However, existing text-to-image diffusion models provide limited support for incorporating spatial information. This paper introduces Composite Diffusion as a means for artists to generate high-quality images by composing from the sub-scenes. The artists can specify the arrangement of these sub-scenes through a flexible free-form segment layout. They can describe the content of each sub-scene primarily using natural text and additionally by utilizing reference images or control inputs such as line art, scribbles, human pose, canny edges, and more. We provide a comprehensive and modular method for Composite Diffusion that enables alternative ways of generating, composing, and harmonizing sub-scenes. Further, we wish to evaluate the composite image for effectiveness in both image quality and achieving the artist's intent. We argue that existing image quality metrics lack a holistic evaluation of image composites. To address this, we propose novel quality criteria especially relevant to composite generation. We believe that our approach provides an intuitive method of art creation. Through extensive user surveys, quantitative and qualitative analysis, we show how it achieves greater spatial, semantic, and creative control over image generation. In addition, our methods do not need to retrain or modify the architecture of the base diffusion models and can work in a plug-and-play manner with the fine-tuned models.
Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation
Authors: Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Text-to-3D generation has recently garnered significant attention, fueled by 2D diffusion models trained on billions of image-text pairs. Existing methods primarily rely on score distillation to leverage the 2D diffusion priors to supervise the generation of 3D models, e.g., NeRF. However, score distillation is prone to suffer the view inconsistency problem, and implicit NeRF modeling can also lead to an arbitrary shape, thus leading to less realistic and uncontrollable 3D generation. In this work, we propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation by distilling the knowledge from both 2D and 3D diffusion models. The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation. Specifically, we use the sparse point cloud generated from the 3D diffusion model, Point-E, as the geometric prior, conditioned on a single reference image. To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points. In addition to controlling the geometry, we propose to optimize the NeRF for a more view-consistent appearance. To be specific, we perform score distillation to the publicly available 2D image diffusion model ControlNet, conditioned on text as well as depth map of the learned compact geometry. Qualitative and quantitative comparisons demonstrate that Points-to-3D improves view consistency and achieves good shape controllability for text-to-3D generation. Points-to-3D provides users with a new way to improve and control text-to-3D generation.
The high-order exponential semi-implicit scalar auxiliary variable approach for nonlocal Cahn-Hilliard equation
Authors: Xiaoqing Meng, Aijie Cheng, Zhengguang Liu
Abstract
The nonlocal Cahn-Hilliard (NCH) equation with nonlocal diffusion operator is more suitable for the simulation of microstructure phase transition than the local Cahn-Hilliard (LCH) equation. In this paper, based on the exponential semi-implicit scalar auxiliary variable (ESI-SAV) method, the highly effcient and accurate schemes in time with unconditional energy stability for solving the NCH equation are proposed. On the one hand, we have demostrated the unconditional energy stability for the NCH equation with its high-order semi-discrete schemes carefully and rigorously. On the other hand, in order to reduce the calculation and storage cost in numerical simulation, we use the fast solver based on FFT and FCG for spatial discretization. Some numerical simulations involving the Gaussian kernel are presented and show the stability, accuracy, efficiency and unconditional energy stability of the proposed schemes.
The stabilized exponential-SAV approach preserving maximum bound principle for nonlocal Allen-Cahn equation
Authors: Xiaoqing Meng, Aijie Cheng, Zhengguang Liu
Abstract
The nonlocal Allen-Cahn equation with nonlocal diffusion operator is a generalization of the classical Allen-Cahn equation. It satisfies the energy dissipation law and maximum bound principle (MBP), and is important for simulating a series of physical and biological phenomena involving long-distance interactions in space. In this paper, we construct first- and second-order (in time) accurate, unconditionally energy stable and MBP-preserving schemes for the nonlocal Allen-Cahn type model based on the stabilized exponential scalar auxiliary variable (sESAV) approach. On the one hand, we have proved the MBP and unconditional energy stability carefully and rigorously in the fully discrete levels. On the other hand, we adopt an efficient FFT-based fast solver to compute the nearly full coefficient matrix generated from the spatial discretization, which improves the computational efficiency. Finally, typical numerical experiments are presented to demonstrate the performance of our proposed schemes.
How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data?
Abstract
Transformer-based pretrained language models (PLMs) have achieved great success in modern NLP. An important advantage of PLMs is good out-of-distribution (OOD) robustness. Recently, diffusion models have attracted a lot of work to apply diffusion to PLMs. It remains under-explored how diffusion influences PLMs on OOD data. The core of diffusion models is a forward diffusion process which gradually applies Gaussian noise to inputs, and a reverse denoising process which removes noise. The noised input reconstruction is a fundamental ability of diffusion models. We directly analyze OOD robustness by measuring the reconstruction loss, including testing the abilities to reconstruct OOD data, and to detect OOD samples. Experiments are conducted by analyzing different training parameters and data statistical features on eight datasets. It shows that finetuning PLMs with diffusion degrades the reconstruction ability on OOD data. The comparison also shows that diffusion models can effectively detect OOD samples, achieving state-of-the-art performance in most of the datasets with an absolute accuracy improvement up to 18%. These results indicate that diffusion reduces OOD robustness of PLMs.
Time multiscale modeling of sorption kinetics I: uniformly accurate schemes for highly oscillatory advection-diffusion equation
Authors: Clarissa Astuto, Mohammed Lemou, Giovanni Russo
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
In this paper we propose a numerical method to solve a 2D advection-diffusion equation, in the highly oscillatory regime. We use an efficient and robust integrator which leads to an accurate approximation of the solution without any time step-size restriction. Uniform first and second order numerical approximations in time are obtained with errors, and at a cost, that are independent of the oscillation frequency. {This work is part of a long time project, and the final goal is the resolution of a Stokes-advection-diffusion system, in which the expression for the velocity in the advection term, is the solution of the Stokes equations.} This paper focuses on the time multiscale challenge, coming from the velocity that is an $\varepsilon-$periodic function, whose expression is explicitly known. We also introduce a two--scale formulation, as a first step to the numerical resolution of the complete oscillatory Stokes-advection-diffusion system, that is currently under investigation. This two--scale formulation is also useful to understand the asymptotic behaviour of the solution.
Pre-Training with Diffusion models for Dental Radiography segmentation
Authors: Jérémy Rousseau, Christian Alaka, Emma Covili, Hippolyte Mayard, Laura Misrachi, Willy Au
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Medical radiography segmentation, and specifically dental radiography, is highly limited by the cost of labeling which requires specific expertise and labor-intensive annotations. In this work, we propose a straightforward pre-training method for semantic segmentation leveraging Denoising Diffusion Probabilistic Models (DDPM), which have shown impressive results for generative modeling. Our straightforward approach achieves remarkable performance in terms of label efficiency and does not require architectural modifications between pre-training and downstream tasks. We propose to first pre-train a Unet by exploiting the DDPM training objective, and then fine-tune the resulting model on a segmentation task. Our experimental results on the segmentation of dental radiographs demonstrate that the proposed method is competitive with state-of-the-art pre-training methods.
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet
Authors: Zhihao Hu, Dong Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recently, diffusion models like StableDiffusion have achieved impressive image generation results. However, the generation process of such diffusion models is uncontrollable, which makes it hard to generate videos with continuous and consistent content. In this work, by using the diffusion model with ControlNet, we proposed a new motion-guided video-to-video translation framework called VideoControlNet to generate various videos based on the given prompts and the condition from the input video. Inspired by the video codecs that use motion information for reducing temporal redundancy, our framework uses motion information to prevent the regeneration of the redundant areas for content consistency. Specifically, we generate the first frame (i.e., the I-frame) by using the diffusion model with ControlNet. Then we generate other key frames (i.e., the P-frame) based on the previous I/P-frame by using our newly proposed motion-guided P-frame generation (MgPG) method, in which the P-frames are generated based on the motion information and the occlusion areas are inpainted by using the diffusion model. Finally, the rest frames (i.e., the B-frame) are generated by using our motion-guided B-frame interpolation (MgBI) module. Our experiments demonstrate that our proposed VideoControlNet inherits the generation capability of the pre-trained large diffusion model and extends the image diffusion model to the video diffusion model by using motion information. More results are provided at our project page.
An Antithetic Multilevel Monte Carlo-Milstein Scheme for Stochastic Partial Differential Equations
Authors: Abdul-Lateef Haji-Al, Andreas Stein
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Abstract
We present a novel multilevel Monte Carlo approach for estimating quantities of interest for stochastic partial differential equations (SPDEs). Drawing inspiration from [Giles and Szpruch: Antithetic multilevel Monte Carlo estimation for multi-dimensional SDEs without L\'evy area simulation, Annals of Appl. Prob., 2014], we extend the antithetic Milstein scheme for finite-dimensional stochastic differential equations to Hilbert space-valued SPDEs. Our method has the advantages of both Euler and Milstein discretizations, as it is easy to implement and does not involve intractable L\'evy area terms. Moreover, the antithetic correction in our method leads to the same variance decay in a MLMC algorithm as the standard Milstein method, resulting in significantly lower computational complexity than a corresponding MLMC Euler scheme. Our approach is applicable to a broader range of non-linear diffusion coefficients and does not require any commutative properties. The key component of our MLMC algorithm is a truncated Milstein-type time stepping scheme for SPDEs, which accelerates the rate of variance decay in the MLMC method when combined with an antithetic coupling on the fine scales. We combine the truncated Milstein scheme with appropriate spatial discretizations and noise approximations on all scales to obtain a fully discrete scheme and show that the antithetic coupling does not introduce an additional bias.
Founding a mathematical diffusion model in linguistics. The case study of German syntactic features in the North-Eastern Italian dialects
Abstract
We take as a case study the spread of Germanic syntactic features into Romance dialects of North-Eastern Italy, which occurred after the immigration of German people in the Tyrol during the High Middle Ages. An interactive map is produced using tools of what is called Geographic Data Science. A smooth two-dimensional surface $\mathcal{G}$ expresses locally which fraction of territory uses a given German language feature: it is obtained by interpolating a discrete function that says if at any surveyed locality that feature is used or not.\newline This surface $\mathcal{G}$ is thought of as the value at the present time of a function describing a diffusion-convection phenomenon in two dimensions (here said \emph{tidal} mode), which is subjected in a very natural way to the same equation, suitably contextualized, used in physics for a number of phenomenological facts like the heat diffusion. It is shown that solutions of this equation, evaluated at the present time, fit well with the data as interpolated by $\mathcal{G}$, thus providing convincing pictures of diffusion-convection of the linguistic features of the case study, albeit simplifications and approximations.\newline Very importantly, it is shown that Schmidt's 'waves' can be counted among the solutions of the diffusion equation: superimposing Schmidt 'waves' to a 'tidal flooding' can reproduce complexities of real linguistic diffusion events.
Visual Instruction Inversion: Image Editing via Visual Prompting
Authors: Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.
Keyword: adaptive
FedDRL: A Trustworthy Federated Learning Model Fusion Method Based on Staged Reinforcement Learning
Authors: Leiming Chen, Cihao Dong, Sibo Qiao, Ziling Huang, Kai Wang, Yuming Nie, Zhaoxiang Hou, Cheewei Tan
Abstract
Traditional federated learning uses the number of samples to calculate the weights of each client model and uses this fixed weight value to fusion the global model. However, in practical scenarios, each client's device and data heterogeneity leads to differences in the quality of each client's model. Thus the contribution to the global model is not wholly determined by the sample size. In addition, if clients intentionally upload low-quality or malicious models, using these models for aggregation will lead to a severe decrease in global model accuracy. Traditional federated learning algorithms do not address these issues. To solve this probelm, we propose FedDRL, a model fusion approach using reinforcement learning based on a two staged approach. In the first stage, Our method could filter out malicious models and selects trusted client models to participate in the model fusion. In the second stage, the FedDRL algorithm adaptively adjusts the weights of the trusted client models and aggregates the optimal global model. We also define five model fusion scenarios and compare our method with two baseline algorithms in those scenarios. The experimental results show that our algorithm has higher reliability than other algorithms while maintaining accuracy.
Fully Dynamic Consistent $k$-Center Clustering
Authors: Jakub Łącki, Bernhard Haeupler, Christoph Grunau, Václav Rozhoň, Rajesh Jayaram
Abstract
We study the consistent k-center clustering problem. In this problem, the goal is to maintain a constant factor approximate $k$-center solution during a sequence of $n$ point insertions and deletions while minimizing the recourse, i.e., the number of changes made to the set of centers after each point insertion or deletion. Previous works by Lattanzi and Vassilvitskii [ICML '12] and Fichtenberger, Lattanzi, Norouzi-Fard, and Svensson [SODA '21] showed that in the incremental setting, where deletions are not allowed, one can obtain $k \cdot \textrm{polylog}(n) / n$ amortized recourse for both $k$-center and $k$-median, and demonstrated a matching lower bound. However, no algorithm for the fully dynamic setting achieves less than the trivial $O(k)$ changes per update, which can be obtained by simply reclustering the full dataset after every update. In this work, we give the first algorithm for consistent $k$-center clustering for the fully dynamic setting, i.e., when both point insertions and deletions are allowed, and improves upon a trivial $O(k)$ recourse bound. Specifically, our algorithm maintains a constant factor approximate solution while ensuring worst-case constant recourse per update, which is optimal in the fully dynamic setting. Moreover, our algorithm is deterministic and is therefore correct even if an adaptive adversary chooses the insertions and deletions.
Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation
Authors: Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Text-to-3D generation has recently garnered significant attention, fueled by 2D diffusion models trained on billions of image-text pairs. Existing methods primarily rely on score distillation to leverage the 2D diffusion priors to supervise the generation of 3D models, e.g., NeRF. However, score distillation is prone to suffer the view inconsistency problem, and implicit NeRF modeling can also lead to an arbitrary shape, thus leading to less realistic and uncontrollable 3D generation. In this work, we propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation by distilling the knowledge from both 2D and 3D diffusion models. The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation. Specifically, we use the sparse point cloud generated from the 3D diffusion model, Point-E, as the geometric prior, conditioned on a single reference image. To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points. In addition to controlling the geometry, we propose to optimize the NeRF for a more view-consistent appearance. To be specific, we perform score distillation to the publicly available 2D image diffusion model ControlNet, conditioned on text as well as depth map of the learned compact geometry. Qualitative and quantitative comparisons demonstrate that Points-to-3D improves view consistency and achieves good shape controllability for text-to-3D generation. Points-to-3D provides users with a new way to improve and control text-to-3D generation.
Fourier Growth of Communication Protocols for XOR Functions
Authors: Uma Girish, Makrand Sinha, Avishay Tal, Kewen Wu
Abstract
The level-$k$ $\ell1$-Fourier weight of a Boolean function refers to the sum of absolute values of its level-$k$ Fourier coefficients. Fourier growth refers to the growth of these weights as $k$ grows. It has been extensively studied for various computational models, and bounds on the Fourier growth, even for the first few levels, have proven useful in learning theory, circuit lower bounds, pseudorandomness, and quantum-classical separations. We investigate the Fourier growth of certain functions that naturally arise from communication protocols for XOR functions (partial functions evaluated on the bitwise XOR of the inputs to Alice and Bob). If a protocol $\mathcal C$ computes an XOR function, then $\mathcal C(x,y)$ is a function of the parity $x\oplus y$. This motivates us to analyze the XOR-fiber of $\mathcal C$, defined as $h(z):=\mathbb E{x,y}[\mathcal C(x,y)|x\oplus y=z]$. We present improved Fourier growth bounds for the XOR-fibers of protocols that communicate $d$ bits. For the first level, we show a tight $O(\sqrt d)$ bound and obtain a new coin theorem, as well as an alternative proof for the tight randomized communication lower bound for Gap-Hamming. For the second level, we show an $d^{3/2}\cdot\mathrm{polylog}(n)$ bound, which improves the previous $O(d^2)$ bound by Girish, Raz, and Tal (ITCS 2021) and implies a polynomial improvement on the randomized communication lower bound for the XOR-lift of Forrelation, extending its quantum-classical gap. Our analysis is based on a new way of adaptively partitioning a relatively large set in Gaussian space to control its moments in all directions. We achieve this via martingale arguments and allowing protocols to transmit real values. We also show a connection between Fourier growth and lifting theorems with constant-sized gadgets as a potential approach to prove optimal bounds for the second level and beyond.
Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception
Authors: Kun Yang, Dingkang Yang, Jingyu Zhang, Mingcheng Li, Yang Liu, Jing Liu, Hanqi Wang, Peng Sun, Liang Song
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-agent collaborative perception as a potential application for vehicle-to-everything communication could significantly improve the perception performance of autonomous vehicles over single-agent perception. However, several challenges remain in achieving pragmatic information sharing in this emerging research. In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner. Specifically, SCOPE has three distinct strengths: i) it considers effective semantic cues of the temporal context to enhance current representations of the target agent; ii) it aggregates perceptually critical spatial information from heterogeneous agents and overcomes localization errors via multi-scale feature interactions; iii) it integrates multi-source representations of the target agent based on their complementary contributions by an adaptive fusion paradigm. To thoroughly evaluate SCOPE, we consider both real-world and simulated scenarios of collaborative 3D object detection tasks on three datasets. Extensive experiments demonstrate the superiority of our approach and the necessity of the proposed components.
Take Your Pick: Enabling Effective Personalized Federated Learning within Low-dimensional Feature Space
Abstract
Personalized federated learning (PFL) is a popular framework that allows clients to have different models to address application scenarios where clients' data are in different domains. The typical model of a client in PFL features a global encoder trained by all clients to extract universal features from the raw data and personalized layers (e.g., a classifier) trained using the client's local data. Nonetheless, due to the differences between the data distributions of different clients (aka, domain gaps), the universal features produced by the global encoder largely encompass numerous components irrelevant to a certain client's local task. Some recent PFL methods address the above problem by personalizing specific parameters within the encoder. However, these methods encounter substantial challenges attributed to the high dimensionality and non-linearity of neural network parameter space. In contrast, the feature space exhibits a lower dimensionality, providing greater intuitiveness and interpretability as compared to the parameter space. To this end, we propose a novel PFL framework named FedPick. FedPick achieves PFL in the low-dimensional feature space by selecting task-relevant features adaptively for each client from the features generated by the global encoder based on its local data distribution. It presents a more accessible and interpretable implementation of PFL compared to those methods working in the parameter space. Extensive experimental results show that FedPick could effectively select task-relevant features for each client and improve model performance in cross-domain FL.
Adaptive Frequency Filters As Efficient Global Token Mixers
Abstract
Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable successes in broad vision tasks thanks to their effective information fusion in the global scope. However, their efficient deployments, especially on mobile devices, still suffer from noteworthy challenges due to the heavy computational costs of self-attention mechanisms, large kernels, or fully connected layers. In this work, we apply conventional convolution theorem to deep learning for addressing this and reveal that adaptive frequency filters can serve as efficient global token mixers. With this insight, we propose Adaptive Frequency Filtering (AFF) token mixer. This neural operator transfers a latent representation to the frequency domain via a Fourier transform and performs semantic-adaptive frequency filtering via an elementwise multiplication, which mathematically equals to a token mixing operation in the original latent space with a dynamic convolution kernel as large as the spatial resolution of this latent representation. We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet. Extensive experiments demonstrate the effectiveness of our proposed AFF token mixer and show that AFFNet achieve superior accuracy and efficiency trade-offs compared to other lightweight network designs on broad visual tasks, including visual recognition and dense prediction tasks.
Consensus-Adaptive RANSAC
Authors: Luca Cavalli, Daniel Barath, Marc Pollefeys, Viktor Larsson
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
RANSAC and its variants are widely used for robust estimation, however, they commonly follow a greedy approach to finding the highest scoring model while ignoring other model hypotheses. In contrast, Iteratively Reweighted Least Squares (IRLS) techniques gradually approach the model by iteratively updating the weight of each correspondence based on the residuals from previous iterations. Inspired by these methods, we propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer. This rich state then guides the minimal sampling between iterations as well as the model refinement. We evaluate the proposed approach on essential and fundamental matrix estimation on a number of indoor and outdoor datasets. It outperforms state-of-the-art estimators by a significant margin adding only a small runtime overhead. Moreover, we demonstrate good generalization properties of our trained model, indicating its effectiveness across different datasets and tasks. The proposed attention mechanism and one-step transformer provide an adaptive behavior that enhances the performance of RANSAC, making it a more effective tool for robust estimation. Code is available at https://github.com/cavalli1234/CA-RANSAC.
A superconvergent stencil-adaptive SBP-SAT finite difference scheme
Authors: Viktor Linders, Mark Carpenter, Jan Nordström
Abstract
A stencil-adaptive SBP-SAT finite difference scheme is shown to display superconvergent behavior. Applied to the linear advection equation, it has a convergence rate $\mathcal{O}(\Delta x^4)$ in contrast to a conventional scheme, which converges at a rate $\mathcal{O}(\Delta x^3)$.
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching
Authors: Junpeng Jing, Jiankun Li, Pengfei Xiong, Jiangyu Liu, Shuaicheng Liu, Yichen Guo, Xin Deng, Mai Xu, Lai Jiang, Leonid Sigal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Correlation based stereo matching has achieved outstanding performance, which pursues cost volume between two feature maps. Unfortunately, current methods with a fixed model do not work uniformly well across various datasets, greatly limiting their real-world applicability. To tackle this issue, this paper proposes a new perspective to dynamically calculate correlation for robust stereo matching. A novel Uncertainty Guided Adaptive Correlation (UGAC) module is introduced to robustly adapt the same model for different scenarios. Specifically, a variance-based uncertainty estimation is employed to adaptively adjust the sampling area during warping operation. Additionally, we improve the traditional non-parametric warping with learnable parameters, such that the position-specific weights can be learned. We show that by empowering the recurrent network with the UGAC module, stereo matching can be exploited more robustly and effectively. Extensive experiments demonstrate that our method achieves state-of-the-art performance over the ETH3D, KITTI, and Middlebury datasets when employing the same fixed model over these datasets without any retraining procedure. To target real-time applications, we further design a lightweight model based on UGAC, which also outperforms other methods over KITTI benchmarks with only 0.6 M parameters.
Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards
Authors: Behzad Nourani-Koliji, Steven Bilaj, Amir Rezaei Balef, Setareh Maghsudi
Abstract
We study the piecewise stationary combinatorial semi-bandit problem with causally related rewards. In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process. In such an environment, an optimal decision-maker must follow both sources of change and adapt accordingly. The problem becomes aggravated in the combinatorial semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms. The core of our proposed policy is the Upper Confidence Bound (UCB) algorithm. We assume the agent relies on an adaptive approach to overcome the challenge. More specifically, it employs a change-point detector based on the Generalized Likelihood Ratio (GLR) test. Besides, we introduce the notion of group restart as a new alternative restarting strategy in the decision making process in structured environments. Finally, our algorithm integrates a mechanism to trace the variations of the underlying graph structure, which captures the causal relationships between the rewards in the bandit setting. Theoretically, we establish a regret upper bound that reflects the effects of the number of structural- and distribution changes on the performance. The outcome of our numerical experiments in real-world scenarios exhibits applicability and superior performance of our proposal compared to the state-of-the-art benchmarks.
ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation
Authors: Görkay Aydemir, Adil Kaan Akan, Fatma Güney
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Forecasting future trajectories of agents in complex traffic scenes requires reliable and efficient predictions for all agents in the scene. However, existing methods for trajectory prediction are either inefficient or sacrifice accuracy. To address this challenge, we propose ADAPT, a novel approach for jointly predicting the trajectories of all agents in the scene with dynamic weight learning. Our approach outperforms state-of-the-art methods in both single-agent and multi-agent settings on the Argoverse and Interaction datasets, with a fraction of their computational overhead. We attribute the improvement in our performance: first, to the adaptive head augmenting the model capacity without increasing the model size; second, to our design choices in the endpoint-conditioned prediction, reinforced by gradient stopping. Our analyses show that ADAPT can focus on each agent with adaptive prediction, allowing for accurate predictions efficiently. https://KUIS-AI.github.io/adapt
Online Modeling and Monitoring of Dependent Processes under Resource Constraints
Authors: Tanapol Kosolwattana, Huazheng Wang, Ying Lin
Abstract
Monitoring a population of dependent processes under limited resources is critical for abnormal events detection. A novel online collaborative learning method is proposed to adaptively allocate the resources for exploitation of high-risk processes and exploration of dependent dynamics. Efficiency of the proposed method is proved through theoretical analysis and experiments.
Sliding Mode Control of Active Magnetic Bearings -- A Cascaded Architecture
Abstract
Accurate and robust positioning of rotor axle is essential for efficient and safe operation of high-speed rotational machines with active magnetic bearings. This study presents a cascaded nonlinear control strategy for vertical axial positioning of an active magnetic bearing system. The proposed scheme employs two sliding mode controllers for regulating rotor vertical position and current and an adaptive estimator to invert the nonlinear input mapping. Uniform asymptotic stability is proven for the closed-loop system and the efficacy and performance of the proposed design is evaluated in simulation.
A Constraint Enforcement Deep Reinforcement Learning Framework for Optimal Energy Storage Systems Dispatch
Authors: Shengren Hou, Edgar Mauricio Salazar Duque, Peter Palensky, Pedro P. Vergara
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
The optimal dispatch of energy storage systems (ESSs) presents formidable challenges due to the uncertainty introduced by fluctuations in dynamic prices, demand consumption, and renewable-based energy generation. By exploiting the generalization capabilities of deep neural networks (DNNs), deep reinforcement learning (DRL) algorithms can learn good-quality control models that adaptively respond to distribution networks' stochastic nature. However, current DRL algorithms lack the capabilities to enforce operational constraints strictly, often even providing unfeasible control actions. To address this issue, we propose a DRL framework that effectively handles continuous action spaces while strictly enforcing the environments and action space operational constraints during online operation. Firstly, the proposed framework trains an action-value function modeled using DNNs. Subsequently, this action-value function is formulated as a mixed-integer programming (MIP) formulation enabling the consideration of the environment's operational constraints. Comprehensive numerical simulations show the superior performance of the proposed MIP-DRL framework, effectively enforcing all constraints while delivering high-quality dispatch decisions when compared with state-of-the-art DRL algorithms and the optimal solution obtained with a perfect forecast of the stochastic variables.
Keyword: efficient
mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization
A Novel Computationally Efficient Group Signature for Anonymous and Secure V2X Communications
Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor
A real-time material breakage detection for offshore wind turbines based on improved neural network algorithm
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
Upward Planarity Testing of Biconnected Outerplanar DAGs Solves Partition
Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT
Good Lattice Training: Physics-Informed Neural Networks Accelerated by Number Theory
Efficient Estimation of the Local Robustness of Machine Learning Models
Dynamic Grouping for Climate Change Negotiation: Facilitating Cooperation and Balancing Interests through Effective Strategies
Low-Parameter Federated Learning with Large Language Models
AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets
Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation
BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery
On the hardness of finding balanced independent sets in random bipartite graphs
trajdata: A Unified Interface to Multiple Human Trajectory Datasets
Fourier Growth of Communication Protocols for XOR Functions
Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics
close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the
distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent.The stabilized exponential-SAV approach preserving maximum bound principle for nonlocal Allen-Cahn equation
Formal Verification of Robotic Contact Tasks via Reachability Analysis
Time multiscale modeling of sorption kinetics I: uniformly accurate schemes for highly oscillatory advection-diffusion equation
Adaptive Frequency Filters As Efficient Global Token Mixers
Car-Studio: Learning Car Radiance Fields from Single-View and Endless In-the-wild Images
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution
3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability
Relay-Enabled Backscatter Communications: Linear Mapping and Resource Allocation
Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation
Gleam: An RDMA-accelerated Multicast Protocol for Datacenter Networks
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks
Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Learning Disentangled Discrete Representations
An Antithetic Multilevel Monte Carlo-Milstein Scheme for Stochastic Partial Differential Equations
Complexity results for the Pilot Assignment problem in Cell-Free Massive MIMO
ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation
Application of Random Forest and Support Vector Machine for Investigation of Pressure Filtration Performance, a Zinc Plant Filter Cake Modeling
Sliding Mode Control of Active Magnetic Bearings -- A Cascaded Architecture
Large-scale Fully-Unsupervised Re-Identification
Differentiable Programming & Network Calculus: Configuration Synthesis under Delay Constraints
Keyword: faster
Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor
EasyNet: An Easy Network for 3D Industrial Anomaly Detection
Stochastic $p$th root approximation of a stochastic matrix: A Riemannian optimization approach
Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models
Large-scale Fully-Unsupervised Re-Identification
Reinforcement Learning by Guided Safe Exploration
Keyword: mobile
Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance
Low-Parameter Federated Learning with Large Language Models
Adaptive Frequency Filters As Efficient Global Token Mixers
Mining Reddit Data to Elicit Students' Requirements During COVID-19 Pandemic
CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor
Keyword: pruning
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
Keyword: diffusion
Composite Diffusion | whole >= Σparts
Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation
The high-order exponential semi-implicit scalar auxiliary variable approach for nonlocal Cahn-Hilliard equation
The stabilized exponential-SAV approach preserving maximum bound principle for nonlocal Allen-Cahn equation
How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data?
Time multiscale modeling of sorption kinetics I: uniformly accurate schemes for highly oscillatory advection-diffusion equation
Pre-Training with Diffusion models for Dental Radiography segmentation
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet
An Antithetic Multilevel Monte Carlo-Milstein Scheme for Stochastic Partial Differential Equations
Founding a mathematical diffusion model in linguistics. The case study of German syntactic features in the North-Eastern Italian dialects
Visual Instruction Inversion: Image Editing via Visual Prompting
Keyword: adaptive
FedDRL: A Trustworthy Federated Learning Model Fusion Method Based on Staged Reinforcement Learning
Fully Dynamic Consistent $k$-Center Clustering
Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation
Fourier Growth of Communication Protocols for XOR Functions
Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception
Take Your Pick: Enabling Effective Personalized Federated Learning within Low-dimensional Feature Space
Adaptive Frequency Filters As Efficient Global Token Mixers
Consensus-Adaptive RANSAC
A superconvergent stencil-adaptive SBP-SAT finite difference scheme
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching
Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards
ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation
Online Modeling and Monitoring of Dependent Processes under Resource Constraints
Sliding Mode Control of Active Magnetic Bearings -- A Cascaded Architecture
A Constraint Enforcement Deep Reinforcement Learning Framework for Optimal Energy Storage Systems Dispatch
Keyword: quantization
There is no result