Abstract
Recently, neural heuristics based on deep reinforcement learning have exhibited promise in solving multi-objective combinatorial optimization problems (MOCOPs). However, they are still struggling to achieve high learning efficiency and solution quality. To tackle this issue, we propose an efficient meta neural heuristic (EMNH), in which a meta-model is first trained and then fine-tuned with a few steps to solve corresponding single-objective subproblems. Specifically, for the training process, a (partial) architecture-shared multi-task model is leveraged to achieve parallel learning for the meta-model, so as to speed up the training; meanwhile, a scaled symmetric sampling method with respect to the weight vectors is designed to stabilize the training. For the fine-tuning process, an efficient hierarchical method is proposed to systematically tackle all the subproblems. Experimental results on the multi-objective traveling salesman problem (MOTSP), multi-objective capacitated vehicle routing problem (MOCVRP), and multi-objective knapsack problem (MOKP) show that, EMNH is able to outperform the state-of-the-art neural heuristics in terms of solution quality and learning efficiency, and yield competitive solutions to the strong traditional heuristics while consuming much shorter time.
Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages
Authors: Alexandra Butoi, Tim Vieira, Ryan Cotterell, David Chiang
Subjects: Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL)
Abstract
The class of tree-adjoining languages can be characterized by various two-level formalisms, consisting of a context-free grammar (CFG) or pushdown automaton (PDA) controlling another CFG or PDA. These four formalisms are equivalent to tree-adjoining grammars (TAG), linear indexed grammars (LIG), pushdown-adjoining automata (PAA), and embedded pushdown automata (EPDA). We define semiring-weighted versions of the above two-level formalisms, and we design new algorithms for computing their stringsums (the weight of all derivations of a string) and allsums (the weight of all derivations). From these, we also immediately obtain stringsum and allsum algorithms for TAG, LIG, PAA, and EPDA. For LIG, our algorithm is more time-efficient by a factor of $\mathcal{O}(n|\mathcal{N}|)$ (where $n$ is the string length and $|\mathcal{N}|$ is the size of the nonterminal set) and more space-efficient by a factor of $\mathcal{O}(|\Gamma|)$ (where $|\Gamma|$ is the size of the stack alphabet) than the algorithm of Vijay-Shanker and Weir (1989). For EPDA, our algorithm is both more space-efficient and time-efficient than the algorithm of Alonso et al. (2001) by factors of $\mathcal{O}(|\Gamma|^2)$ and $\mathcal{O}(|\Gamma|^3)$, respectively. Finally, we give the first PAA stringsum and allsum algorithms.
DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM
Abstract
In the burgeoning field of natural language processing, Neural Topic Models (NTMs) and Large Language Models (LLMs) have emerged as areas of significant research interest. Despite this, NTMs primarily utilize contextual embeddings from LLMs, which are not optimal for clustering or capable for topic generation. Our study addresses this gap by introducing a novel framework named Diffusion-Enhanced Topic Modeling using Encoder-Decoder-based LLMs (DeTiME). DeTiME leverages ncoder-Decoder-based LLMs to produce highly clusterable embeddings that could generate topics that exhibit both superior clusterability and enhanced semantic coherence compared to existing methods. Additionally, by exploiting the power of diffusion, our framework also provides the capability to generate content relevant to the identified topics. This dual functionality allows users to efficiently produce highly clustered topics and related content simultaneously. DeTiME's potential extends to generating clustered embeddings as well. Notably, our proposed framework proves to be efficient to train and exhibits high adaptability, demonstrating its potential for a wide array of applications.
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Abstract
The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge VFMs into a unified model that assimilates their expertise. Our proposed method integrates multi-task learning, continual learning techniques, and teacher-student distillation. This strategy entails significantly less computational cost compared to traditional multi-task training from scratch. Additionally, it only demands a small fraction of the pre-training datasets that were initially used to train individual models. By applying our method to SAM and CLIP, we derive SAM-CLIP: a unified model that amalgamates the strengths of SAM and CLIP into a single backbone, making it apt for edge device applications. We show that SAM-CLIP learns richer visual representations, equipped with both localization and semantic features, suitable for a broad range of vision tasks. SAM-CLIP obtains improved performance on several head probing tasks when compared with SAM and CLIP. We further show that SAM-CLIP not only retains the foundational strengths of its precursor models but also introduces synergistic functionalities, most notably in zero-shot semantic segmentation, where SAM-CLIP establishes new state-of-the-art results on 5 benchmarks. It outperforms previous models that are specifically designed for this task by a large margin, including +6.8% and +5.9% mean IoU improvement on Pascal-VOC and COCO-Stuff datasets, respectively.
Specialist or Generalist? Instruction Tuning for Specific NLP Tasks
Authors: Chufan Shi, Yixuan Su, Cheng Yang, Yujiu Yang, Deng Cai
Abstract
The potential of large language models (LLMs) to simultaneously perform a wide range of natural language processing (NLP) tasks has been the subject of extensive research. Although instruction tuning has proven to be a data-efficient method for transforming LLMs into such generalist models, their performance still lags behind specialist models trained exclusively for specific tasks. In this paper, we investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model. We hypothesize that its efficacy depends on task specificity and skill requirements. Our experiments assess four target tasks with distinct coverage levels, revealing that integrating generalist instruction tuning consistently enhances model performance when the task coverage is broad. The effect is particularly pronounced when the amount of task-specific training data is limited. Further investigation into three target tasks focusing on different capabilities demonstrates that generalist instruction tuning improves understanding and reasoning abilities. However, for tasks requiring factual knowledge, generalist data containing hallucinatory information may negatively affect the model's performance. Overall, our work provides a systematic guide for developing specialist models with general instruction tuning. Our code and other related resources can be found at https://github.com/DavidFanzz/Generalist_or_Specialist.
Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network
Abstract
Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, with which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-grained feature interaction selection approach that targets both feature field and feature value for deep sparse networks. To explore such expansive space, we propose a decomposed space which is calculated on the fly. We then develop a selection algorithm called OptFeature, which efficiently selects the feature interaction from both the feature field and the feature value simultaneously. Results from experiments on three large real-world benchmark datasets demonstrate that OptFeature performs well in terms of accuracy and efficiency. Additional studies support the feasibility of our method.
Algorithm for Invalidation of Cached Results of Queries to a Single Table
Abstract
One of the most popular setups for a back-end of a high performance website consists of a relational database and a cache which stores results of performed queries. Several application frameworks support caching of queries made to the database, but few of them handle cache invalidation correctly, resorting to simpler solutions such as short TTL values, or flushing the whole cache after any write to the database. In this paper a simple, correct, efficient and tested in real world application solution is presented, which allows for infinite TTL, and very fine grained cache invalidation. Algorithm is proven to be correct in a concurrent environment, both theoretically and in practice.
Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach
Abstract
We study the problem of computationally and label efficient PAC active learning $d$-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of $\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1}})$\footnote{In the main body of this work, we use $\tilde{O}(\cdot), \tilde{\Theta}(\cdot)$ to hide factors of the form $\polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})$}, under the assumption that the Tsybakov noise parameter $\alpha \in (\frac13, 1]$, which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting.
zonoLAB: A MATLAB toolbox for set-based control systems analysis using hybrid zonotopes
Authors: Justin Koeln, Trevor J. Bird, Jacob Siefert, Justin Ruths, Herschel Pangborn, Neera Jain
Abstract
This paper introduces zonoLAB, a MATLAB-based toolbox for set-based control system analysis using the hybrid zonotope set representation. Hybrid zonotopes have proven to be an expressive set representation that can exactly represent the reachable sets of mixed-logical dynamical systems and tightly approximate the reachable sets of nonlinear dynamic systems. Moreover, hybrid zonotopes can exactly represent the continuous piecewise linear control laws associated with model predictive control and the input-output mappings of neural networks with piecewise linear activation functions. The hybrid zonotope set representation is also highly exploitable, where efficient methods developed for mixed-integer linear programming can be directly used for set operation and analysis. The zonoLAB toolbox is designed to make these capabilities accessible to the dynamic systems and controls community, with functionality spanning fundamental operations with hybrid zonotope, constrained zonotope, and zonotope set representations, powerful set analysis tools, and general-purpose algorithms for reachability analysis of open- and closed-loop systems.
ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles
Authors: Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry
Abstract
Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by helping them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the chatbot and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the chatbot's response; each mode of feedback automatically generates a principle that is inserted into the chatbot's prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the chatbot, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs.
A Review of Economic Incentives for Efficient Operation of Flexible Transmission
Abstract
The growing penetration of renewable energy requires upgrades to the transmission network to ensure the deliverability of renewable generation. As an efficient alternative to transmission expansion, flexible transmission technologies, whose benefits have been widely studied, can alleviate transmission system congestion and enhance renewable energy integration. However, under the current market structure, investments for these technologies only receive a regulated rate of return, providing little to no incentive for efficient operation. Additionally, a regulated rate of return creates an incentive for building more transmission lines rather than efficient utilization of the existing system. Therefore, investments in flexible transmission technologies remain rather limited. To facilitate the deployment of flexible transmission, improve system efficiency, and accommodate renewable energy integration, a proper incentive structure for flexible transmission technologies, compatible with the current market design, is vital. This paper reviews the current market-based mechanisms for various flexible transmission technologies, including impedance control, dynamic line rating, and transmission switching. This review pinpoints current challenges of the market-based operation of flexible transmission and provides insights for future endeavors in designing efficient price signals for flexible transmission operation.
Abstract
Multi-port DC-DC converters are gaining more significance in modern power system environments by enabling the connection of multiple renewable energy sources, so the efficient operation of these converters is paramount. Soft switching methods increase efficiency in DC-DC converters and increase the reliability and lifespan of devices by relieving stress on components. This paper proposes a method for soft-switching of a dual-output step-down current-fed full-bridge push-pull DC-DC converter. The converter enables two independent outputs to supply different loads. The topology achieves zero-current switching on the primary side and zero-voltage switching on the secondary side, eliminating the need for active-clamp circuits and passive snubbers to absorb surge voltage. This reduces switching losses and lower voltage and current stresses on power electronic devices. The paper thoroughly investigates the proposed converter's operation principle, control strategy, and characteristics. Equations for the voltage and current of all components are derived, and the conditions for achieving soft switching are calculated. Simulation results in EMTDC/PSCAD software validate the accuracy of the proposed method.
PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers' Workflows
Authors: Savvas Petridis, Michael Terry, Carrie J. Cai
Abstract
Prototyping AI applications is notoriously difficult. While large language model (LLM) prompting has dramatically lowered the barriers to AI prototyping, designers are still prototyping AI functionality and UI separately. We investigate how coupling prompt and UI design affects designers' workflows. Grounding this research, we developed PromptInfuser, a Figma plugin that enables users to create semi-functional mockups, by connecting UI elements to the inputs and outputs of prompts. In a study with 14 designers, we compare PromptInfuser to designers' current AI-prototyping workflow. PromptInfuser was perceived to be significantly more useful for communicating product ideas, more capable of producing prototypes that realistically represent the envisioned artifact, more efficient for prototyping, and more helpful for anticipating UI issues and technical constraints. PromptInfuser encouraged iteration over prompt and UI together, which helped designers identify UI and prompt incompatibilities and reflect upon their total solution. Together, these findings inform future systems for prototyping AI applications.
Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks
Authors: Xiaojun Jia, Jianshu Li, Jindong Gu, Yang Bai, Xiaochun Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Adversarial training has shown promise in building robust models against adversarial examples. A major drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples. To overcome this limitation, adversarial training based on single-step attacks has been explored. Previous work improves the single-step adversarial training from different perspectives, e.g., sample initialization, loss regularization, and training strategy. Almost all of them treat the underlying model as a black box. In this work, we propose to exploit the interior building blocks of the model to improve efficiency. Specifically, we propose to dynamically sample lightweight subnetworks as a surrogate model during training. By doing this, both the forward and backward passes can be accelerated for efficient adversarial training. Besides, we provide theoretical analysis to show the model robustness can be improved by the single-step adversarial training with sampled subnetworks. Furthermore, we propose a novel sampling strategy where the sampling varies from layer to layer and from iteration to iteration. Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness. Evaluations on a series of popular datasets demonstrate the effectiveness of the proposed FB-Better. Our code has been released at https://github.com/jiaxiaojunQAQ/FP-Better.
A distributed-memory parallel algorithm for discretized integral equations using Julia
Authors: Tianyu Liang, Chao Chen, Per-Gunnar Martinsson, George Biros
Abstract
Boundary value problems involving elliptic PDEs such as the Laplace and the Helmholtz equations are ubiquitous in physics and engineering. Many such problems have alternative formulations as integral equations that are mathematically more tractable than their PDE counterparts. However, the integral equation formulation poses a challenge in solving the dense linear systems that arise upon discretization. In cases where iterative methods converge rapidly, existing methods that draw on fast summation schemes such as the Fast Multipole Method are highly efficient and well established. More recently, linear complexity direct solvers that sidestep convergence issues by directly computing an invertible factorization have been developed. However, storage and compute costs are high, which limits their ability to solve large-scale problems in practice. In this work, we introduce a distributed-memory parallel algorithm based on an existing direct solver named ``strong recursive skeletonization factorization.'' The analysis of its parallel scalability applies generally to a class of existing methods that exploit the so-called strong admissibility. Specifically, we apply low-rank compression to certain off-diagonal matrix blocks in a way that minimizes data movement. Given a compression tolerance, our method constructs an approximate factorization of a discretized integral operator (dense matrix), which can be used to solve linear systems efficiently in parallel. Compared to iterative algorithms, our method is particularly suitable for problems involving ill-conditioned matrices or multiple right-hand sides. Large-scale numerical experiments are presented to demonstrate the performance of our implementation using the Julia language.
HL-DPoS: An Enhanced Anti-Long-Range Attack DPoS Algorithm
Authors: Yang Li, Chunhe Xia, Chunyan Li, Yuan Zhao, Chen Chen, Tianbo Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
The consensus algorithm is crucial in blockchain for ensuring the validity and security of transactions across the decentralized network. However, achieving consensus among nodes and packaging blocks in blockchain networks is a complex task that requires efficient and secure consensus algorithms. The DPoS consensus algorithm has emerged as a popular choice due to its fast transaction processing and high throughput. Despite these advantages, the algorithm still suffers from weaknesses such as centralization and vulnerability to long-range attacks, which can compromise the integrity of the blockchain network. To combat these problems, we developed an Enhanced Anti-Long-Range Attack DPoS algorithm (HL-DPoS). First, we split nodes into pieces to reduce centralization issues while giving witness nodes the power to report and benefit from malicious node's reports, maintaining high efficiency and high security. Second, we propose a validation method in HL-DPoS that compares consensuses transactions with the longest chain to detect long-range attacks. Algorithm analysis and simulation experiment results demonstrate that our HL-DPoS consensus algorithm improves security while achieving better consensus performance.
EKGNet: A 10.96μW Fully Analog Neural Network for Intra-Patient Arrhythmia Classification
Abstract
We present an integrated approach by combining analog computing and deep learning for electrocardiogram (ECG) arrhythmia classification. We propose EKGNet, a hardware-efficient and fully analog arrhythmia classification architecture that archives high accuracy with low power consumption. The proposed architecture leverages the energy efficiency of transistors operating in the subthreshold region, eliminating the need for analog-to-digital converters (ADC) and static random access memory (SRAM). The system design includes a novel analog sequential Multiply-Accumulate (MAC) circuit that mitigates process, supply voltage, and temperature variations. Experimental evaluations on PhysioNet's MIT-BIH and PTB Diagnostics datasets demonstrate the effectiveness of the proposed method, achieving average balanced accuracy of 95% and 94.25% for intra-patient arrhythmia classification and myocardial infarction (MI) classification, respectively. This innovative approach presents a promising avenue for developing low-power arrhythmia classification systems with enhanced accuracy and transferability in biomedical applications.
Empowering Distributed Solutions in Renewable Energy Systems and Grid Optimization
Authors: Mohammad Mohammadi, Ali Mohammadi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
This study delves into the shift from centralized to decentralized approaches in the electricity industry, with a particular focus on how machine learning (ML) advancements play a crucial role in empowering renewable energy sources and improving grid management. ML models have become increasingly important in predicting renewable energy generation and consumption, utilizing various techniques like artificial neural networks, support vector machines, and decision trees. Furthermore, data preprocessing methods, such as data splitting, normalization, decomposition, and discretization, are employed to enhance prediction accuracy. The incorporation of big data and ML into smart grids offers several advantages, including heightened energy efficiency, more effective responses to demand, and better integration of renewable energy sources. Nevertheless, challenges like handling large data volumes, ensuring cybersecurity, and obtaining specialized expertise must be addressed. The research investigates various ML applications within the realms of solar energy, wind energy, and electric distribution and storage, illustrating their potential to optimize energy systems. To sum up, this research demonstrates the evolving landscape of the electricity sector as it shifts from centralized to decentralized solutions through the application of ML innovations and distributed decision-making, ultimately shaping a more efficient and sustainable energy future.
RIS-based IMT-2030 Testbed for MmWave Multi-stream Ultra-massive MIMO Communications
Abstract
As one enabling technique of the future sixth generation (6G) network, ultra-massive multiple-input-multiple-output (MIMO) can support high-speed data transmissions and cell coverage extension. However, it is hard to realize the ultra-massive MIMO via traditional phased arrays due to unacceptable power consumption. To address this issue, reconfigurable intelligent surface-based (RIS-based) antennas are an energy-efficient enabler of the ultra-massive MIMO, since they are free of energy-hungry phase shifters. In this article, we report the performances of the RIS-enabled ultra-massive MIMO via a project called Verification of MmWave Multi-stream Transmissions Enabled by RIS-based Ultra-massive MIMO for 6G (V4M), which was proposed to promote the evolution towards IMT-2030. In the V4M project, we manufacture RIS-based antennas with 1024 one-bit elements working at 26 GHz, based on which an mmWave dual-stream ultra-massive MIMO prototype is implemented for the first time. To approach practical settings, the Tx and Rx of the prototype are implemented by one commercial new radio base station and one off-the-shelf user equipment, respectively. The measured data rate of the dual-stream prototype approaches the theoretical peak rate. Our contributions to the V4M project are also discussed by presenting technological challenges and corresponding solutions.
Multi-split configuration design for fluid-based thermal management systems
Authors: Saeid Bayat, Nastaran Shahmansouri, Satya RT Peddada, Alexander Tessier, Adrian Butscher, James T Allison
Abstract
High power density systems require efficient cooling to maintain their thermal performance. Despite this, as systems get larger and more complex, human practice and insight may not suffice to determine the desired thermal management system designs. To this end, a framework for automatic architecture exploration is presented in this article for a class of single-phase, multi-split cooling systems. For this class of systems, heat generation devices are clustered based on their spatial information, and flow-split are added only when required and at the location of heat devices. To generate different architectures, candidate architectures are represented as graphs. From these graphs, dynamic physics models are created automatically using a graph-based thermal modeling framework. Then, an optimal fluid flow distribution problem is solved by addressing temperature constraints in the presence of exogenous heat loads to achieve optimal performance. The focus in this work is on the design of general multi-split heat management systems. The architectures discussed here can be used for various applications in the domain of configuration design. The multi-split algorithm can produce configurations where splitting can occur at any of the vertices. The results presented include 3 categories of cases and are discussed in detail.
The Quantum Tortoise and the Classical Hare: A simple framework for understanding which problems quantum computing will accelerate (and which it will not)
Authors: Sukwoong Choi, William S. Moses, Neil Thompson
Subjects: Data Structures and Algorithms (cs.DS); Emerging Technologies (cs.ET); General Economics (econ.GN); Quantum Physics (quant-ph)
Abstract
Quantum computing promises transformational gains for solving some problems, but little to none for others. For anyone hoping to use quantum computers now or in the future, it is important to know which problems will benefit. In this paper, we introduce a framework for answering this question both intuitively and quantitatively. The underlying structure of the framework is a race between quantum and classical computers, where their relative strengths determine when each wins. While classical computers operate faster, quantum computers can sometimes run more efficient algorithms. Whether the speed advantage or the algorithmic advantage dominates determines whether a problem will benefit from quantum computing or not. Our analysis reveals that many problems, particularly those of small to moderate size that can be important for typical businesses, will not benefit from quantum computing. Conversely, larger problems or those with particularly big algorithmic gains will benefit from near-term quantum computing. Since very large algorithmic gains are rare in practice and theorized to be rare even in principle, our analysis suggests that the benefits from quantum computing will flow either to users of these rare cases, or practitioners processing very large data.
Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs
Abstract
Recently, Deep reinforcement learning (DRL) models have shown promising results in solving routing problems. However, most DRL solvers are commonly proposed to solve node routing problems, such as the Traveling Salesman Problem (TSP). Meanwhile, there has been limited research on applying neural methods to arc routing problems, such as the Chinese Postman Problem (CPP), since they often feature irregular and complex solution spaces compared to TSP. To fill these gaps, this paper proposes a novel DRL framework to address the CPP with load-dependent costs (CPP-LC) (Corberan et al., 2018), which is a complex arc routing problem with load constraints. The novelty of our method is two-fold. First, we formulate the CPP-LC as a Markov Decision Process (MDP) sequential model. Subsequently, we introduce an autoregressive model based on DRL, namely Arc-DRL, consisting of an encoder and decoder to address the CPP-LC challenge effectively. Such a framework allows the DRL model to work efficiently and scalably to arc routing problems. Furthermore, we propose a new bio-inspired meta-heuristic solution based on Evolutionary Algorithm (EA) for CPP-LC. Extensive experiments show that Arc-DRL outperforms existing meta-heuristic methods such as Iterative Local Search (ILS) and Variable Neighborhood Search (VNS) proposed by (Corberan et al., 2018) on large benchmark datasets for CPP-LC regarding both solution quality and running time; while the EA gives the best solution quality with much more running time. We release our C++ implementations for metaheuristics such as EA, ILS and VNS along with the code for data generation and our generated data at https://github.com/HySonLab/Chinese_Postman_Problem
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
Authors: Rongzhe Wei, Eleonora Kreačić, Haoyu Wang, Haoteng Yin, Eli Chien, Vamsi K. Potluru, Pan Li
Abstract
Privacy concerns have led to a surge in the creation of synthetic datasets, with diffusion models emerging as a promising avenue. Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $(\epsilon, \mathcal{O}(\frac{1}{s^2\epsilon}))$-pDP to $(\epsilon, \mathcal{O}(\frac{1}{s\epsilon}))$-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.
SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation
Authors: Jialing Pan, Adrien Sadé, Jin Kim, Eric Soriano, Guillem Sole, Sylvain Flamant
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
With the recent focus on Large Language Models (LLMs), both StarCoder (Li et al., 2023) and Code Llama (Rozi`ere et al., 2023) have demonstrated remarkable performance in code generation. However, there is still a need for improvement in code translation functionality with efficient training techniques. In response to this, we introduce SteloCoder, a decoder-only StarCoder-based LLM designed specifically for multi-programming language-to-Python code translation. In particular, SteloCoder achieves C++, C#, JavaScript, Java, or PHP-to-Python code translation without specifying the input programming language. We modified StarCoder model architecture by incorporating a Mixture-of-Experts (MoE) technique featuring five experts and a gating network for multi-task handling. Experts are obtained by StarCoder fine-tuning. Specifically, we use a Low-Rank Adaptive Method (LoRA) technique, limiting each expert size as only 0.06% of number of StarCoder's parameters. At the same time, to enhance training efficiency in terms of time, we adopt curriculum learning strategy and use self-instruct data for efficient fine-tuning. As a result, each expert takes only 6 hours to train on one single 80Gb A100 HBM. With experiments on XLCoST datasets, SteloCoder achieves an average of 73.76 CodeBLEU score in multi-programming language-to-Python translation, surpassing the top performance from the leaderboard by at least 3.5. This accomplishment is attributed to only 45M extra parameters with StarCoder as the backbone and 32 hours of valid training on one 80GB A100 HBM. The source code is release here: https://github.com/sade-adrien/SteloCoder.
Improving Language Models Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary
Abstract
The non-humanlike behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness. A striking phenomenon of such faulty behaviours is the generation of inconsistent predictions, which produces logically contradictory results, such as generating different predictions for texts delivering the same meaning or violating logical properties. Previous studies exploited data augmentation or implemented specialised loss functions to alleviate the issue. However, their usage is limited, because they consume expensive training resources for large-sized PLMs and can only handle a certain consistency type. To this end, we propose a practical approach that alleviates the inconsistent behaviour issue by fundamentally improving PLMs' meaning awareness. Based on the conceptual role theory, our method allows PLMs to capture accurate meaning by learning precise interrelationships between concepts from word-definition pairs in a dictionary. Next, we propose an efficient parameter integration technique that updates only a few additional parameters to combine the learned interrelationship with PLMs' pre-trained knowledge. Our experimental results reveal that the approach can concurrently improve multiple types of consistency, enables efficient knowledge integration, and easily applies to other languages.
Symmetry-preserving graph attention network to solve routing problems at multiple resolutions
Abstract
Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever completely equivariant model and training to solve combinatorial problems. Furthermore, it is essential to capture the multiscale structure (i.e. from local to global information) of the input graph, especially for the cases of large and long-range graphs, while previous methods are limited to extracting only local information that can lead to a local or sub-optimal solution. To tackle the above limitation, we propose a Multiresolution scheme in combination with Equivariant Graph Attention network (mEGAT) architecture, which can learn the optimal route based on low-level and high-level graph resolutions in an efficient way. In particular, our approach constructs a hierarchy of coarse-graining graphs from the input graph, in which we try to solve the routing problems on simple low-level graphs first, then utilize that knowledge for the more complex high-level graphs. Experimentally, we have shown that our model outperforms existing baselines and proved that symmetry preservation and multiresolution are important recipes for solving combinatorial problems in a data-driven manner. Our source code is publicly available at https://github.com/HySonLab/Multires-NP-hard
Capacity-based Spatial Modulation Constellation and Pre-scaling Design
Abstract
Spatial Modulation (SM) can utilize the index of the transmit antenna (TA) to transmit additional information. In this paper, to improve the performance of SM, a non-uniform constellation (NUC) and pre-scaling coefficients optimization design scheme is proposed. The bit-interleaved coded modulation (BICM) capacity calculation formula of SM system is firstly derived. The constellation and pre-scaling coefficients are optimized by maximizing the BICM capacity without channel state information (CSI) feedback. Optimization results are given for the multiple-input-single-output (MISO) system with Rayleigh channel. Simulation result shows the proposed scheme provides a meaningful performance gain compared to conventional SM system without CSI feedback. The proposed optimization design scheme can be a promising technology for future 6G to achieve high-efficiency.
CONTRASTE: Supervised Contrastive Pre-training With Aspect-based Prompts For Aspect Sentiment Triplet Extraction
Abstract
Existing works on Aspect Sentiment Triplet Extraction (ASTE) explicitly focus on developing more efficient fine-tuning techniques for the task. Instead, our motivation is to come up with a generic approach that can improve the downstream performances of multiple ABSA tasks simultaneously. Towards this, we present CONTRASTE, a novel pre-training strategy using CONTRastive learning to enhance the ASTE performance. While we primarily focus on ASTE, we also demonstrate the advantage of our proposed technique on other ABSA tasks such as ACOS, TASD, and AESC. Given a sentence and its associated (aspect, opinion, sentiment) triplets, first, we design aspect-based prompts with corresponding sentiments masked. We then (pre)train an encoder-decoder model by applying contrastive learning on the decoder-generated aspect-aware sentiment representations of the masked terms. For fine-tuning the model weights thus obtained, we then propose a novel multi-task approach where the base encoder-decoder model is combined with two complementary modules, a tagging-based Opinion Term Detector, and a regression-based Triplet Count Estimator. Exhaustive experiments on four benchmark datasets and a detailed ablation study establish the importance of each of our proposed components as we achieve new state-of-the-art ASTE results.
Accelerating Split Federated Learning over Wireless Communication Networks
Abstract
The development of artificial intelligence (AI) provides opportunities for the promotion of deep neural network (DNN)-based applications. However, the large amount of parameters and computational complexity of DNN makes it difficult to deploy it on edge devices which are resource-constrained. An efficient method to address this challenge is model partition/splitting, in which DNN is divided into two parts which are deployed on device and server respectively for co-training or co-inference. In this paper, we consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL). We consider a practical scenario of heterogeneous devices with individual split points of DNN. We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency. By using alternating optimization, we decompose the problem into two sub-problems and solve them optimally. Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.
Closed Loop Molecular Communication Testbed: Setup, Interference Analysis, and Experimental Results
Authors: Lukas Brand, Maike Scherer, Teena tom Dieck, Sebastian Lotter, Maximilian Schäfer, Andreas Burkovski, Heinrich Sticht, Kathrin Castiglione, Robert Schober
Abstract
In this paper, we present a fluid-based experimental molecular communication (MC) testbed that, similar to the human cardiovascular system, operates in a closed circuit pipe system. The proposed system is designed to be biocompatible, resource-efficient, and controllable from outside the pipe. As signaling molecule the testbed employs the green fluorescent protein variant "Dreiklang" (GFPD). GFPDs can be reversibly switched via light of different wavelengths between a bright fluorescent state and a less fluorescent state. Hence, this property allows for writing and erasing information encoded in the state of the GFPDs already existing in the fluid via radiation from outside the pipe. The concept of modulating the GFPDs existing in the channel at the transmitter for information transmission instead of releasing new molecules is a form of media modulation. In our testbed, due to the closed loop setup and the long experiment durations of up to 250 min, we observe new forms of inter-symbol interferences (ISI), which do not occur in short experiments and open loop systems. In particular, four different forms of inter-symbol interference (ISI), namely channel ISI, inter-loop ISI, offset ISI, and permanent ISI, can occur in the considered system. To mitigate inter-loop ISI and offset ISI, we propose a powerful light based eraser. Finally, we experimentally demonstrate the reliability of information transmission in our testbed achieving error-free transmission of 500 bit at a data rate of 6 bit/min, while employing a sub-optimal low-complexity detection scheme.
Emergent Communication in Interactive Sketch Question Answering
Abstract
Vision-based emergent communication (EC) aims to learn to communicate through sketches and demystify the evolution of human communication. Ironically, previous works neglect multi-round interaction, which is indispensable in human communication. To fill this gap, we first introduce a novel Interactive Sketch Question Answering (ISQA) task, where two collaborative players are interacting through sketches to answer a question about an image in a multi-round manner. To accomplish this task, we design a new and efficient interactive EC system, which can achieve an effective balance among three evaluation factors, including the question answering accuracy, drawing complexity and human interpretability. Our experimental results including human evaluation demonstrate that multi-round interactive mechanism facilitates targeted and efficient communication between intelligent agents with decent human interpretability.
An In-Memory Architecture for High-Performance Long-Read Pre-Alignment Filtering
Authors: Taha Shahroodi, Michael Miao, Joel Lindegger, Stephan Wong, Onur Mutlu, Said Hamdioui
Abstract
With the recent move towards sequencing of accurate long reads, finding solutions that support efficient analysis of these reads becomes more necessary. The long execution time required for sequence alignment of long reads negatively affects genomic studies relying on sequence alignment. Although pre-alignment filtering as an extra step before alignment was recently introduced to mitigate sequence alignment for short reads, these filters do not work as efficiently for long reads. Moreover, even with efficient pre-alignment filters, the overall end-to-end (i.e., filtering + original alignment) execution time of alignment for long reads remains high, while the filtering step is now a major portion of the end-to-end execution time. Our paper makes three contributions. First, it identifies data movement of sequences between memory units and computing units as the main source of inefficiency for pre-alignment filters of long reads. This is because although filters reject many of these long sequencing pairs before they get to the alignment stage, they still require a huge cost regarding time and energy consumption for the large data transferred between memory and processor. Second, this paper introduces an adaptation of a short-read pre-alignment filtering algorithm suitable for long reads. We call this LongGeneGuardian. Finally, it presents Filter-Fuse as an architecture that supports LongGeneGuardian inside the memory. FilterFuse exploits the Computation-In-Memory computing paradigm, eliminating the cost of data movement in LongGeneGuardian. Our evaluations show that FilterFuse improves the execution time of filtering by 120.47x for long reads compared to State-of-the-Art (SoTA) filter, SneakySnake. FilterFuse also improves the end-to-end execution time of sequence alignment by up to 49.14x and 5207.63x compared to SneakySnake with SoTA aligner and only SoTA aligner, respectively.
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Authors: Florian Schmid, Khaled Koutini, Gerhard Widmer
Abstract
The introduction of large-scale audio datasets, such as AudioSet, paved the way for Transformers to conquer the audio domain and replace CNNs as the state-of-the-art neural network architecture for many tasks. Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks. However, current popular Audio Spectrogram Transformers are demanding in terms of computational complexity compared to CNNs. Recently, we have shown that, by employing Transformer-to-CNN Knowledge Distillation, efficient CNNs can catch up with and even outperform Transformers on large datasets. In this work, we extend this line of research and increase the capacity of efficient CNNs by introducing dynamic CNN blocks, constructed of dynamic non-linearities, dynamic convolutions and attention mechanisms. We show that these dynamic CNNs outperform traditional efficient CNNs, in terms of the performance-complexity trade-off and parameter efficiency, at the task of audio tagging on the large-scale AudioSet. Our experiments further indicate that the introduced dynamic CNNs achieve better performance on downstream tasks and scale up well, attaining Transformer performance and even outperforming them on AudioSet and several downstream tasks.
Interactive Generalized Additive Model and Its Applications in Electric Load Forecasting
Authors: Linxiao Yang, Rui Ren, Xinyue Gu, Liang Sun
Abstract
Electric load forecasting is an indispensable component of electric power system planning and management. Inaccurate load forecasting may lead to the threat of outages or a waste of energy. Accurate electric load forecasting is challenging when there is limited data or even no data, such as load forecasting in holiday, or under extreme weather conditions. As high-stakes decision-making usually follows after load forecasting, model interpretability is crucial for the adoption of forecasting models. In this paper, we propose an interactive GAM which is not only interpretable but also can incorporate specific domain knowledge in electric power industry for improved performance. This boosting-based GAM leverages piecewise linear functions and can be learned through our efficient algorithm. In both public benchmark and electricity datasets, our interactive GAM outperforms current state-of-the-art methods and demonstrates good generalization ability in the cases of extreme weather events. We launched a user-friendly web-based tool based on interactive GAM and already incorporated it into our eForecaster product, a unified AI platform for electricity forecasting.
A posteriori error estimates for nonconforming discretizations of singularly perturbed biharmonic operators
Abstract
For the pure biharmonic equation and a biharmonic singular perturbation problem, a residual-based error estimator is introduced which applies to many existing nonconforming finite elements. The error estimator involves the local best-approximation error of the finite element function by piecewise polynomial functions of the degree determining the expected approximation order, which need not coincide with the maximal polynomial degree of the element, for example if bubble functions are used. The error estimator is shown to be reliable and locally efficient up to this polynomial best-approximation error and oscillations of the right-hand side.
Robust Methods for Multiscale Coarse Approximations of Diffusion Models in Perforated Domains
Authors: Miranda Boutilier, Konstantin Brenner, Victorita Dolean
Abstract
For the Poisson equation posed in a domain containing a large number of polygonal perforations, we propose a low-dimensional coarse approximation space based on a coarse polygonal partitioning of the domain. Similarly to other multiscale numerical methods, this coarse space is spanned by locally discrete harmonic basis functions. Along the subdomain boundaries, the basis functions are piecewise polynomial. The main contribution of this article is an error estimate regarding the H1-projection over the coarse space which depends only on the regularity of the solution over the edges of the coarse partitioning. For a specific edge refinement procedure, the error analysis establishes superconvergence of the method even if the true solution has a low general regularity. Combined with domain decomposition (DD) methods, the coarse space leads to an efficient two-level iterative linear solver which reaches the fine-scale finite element error in few iterations. It also bodes well as a preconditioner for Krylov methods and provides scalability with respect to the number of subdomains. Numerical experiments showcase the increased precision of the coarse approximation as well as the efficiency and scalability of the coarse space as a component of a DD algorithm.
DACOOP-A: Decentralized Adaptive Cooperative Pursuit via Attention
Abstract
Integrating rule-based policies into reinforcement learning promises to improve data efficiency and generalization in cooperative pursuit problems. However, most implementations do not properly distinguish the influence of neighboring robots in observation embedding or inter-robot interaction rules, leading to information loss and inefficient cooperation. This paper proposes a cooperative pursuit algorithm named Decentralized Adaptive COOperative Pursuit via Attention (DACOOP-A) by empowering reinforcement learning with artificial potential field and attention mechanisms. An attention-based framework is developed to emphasize important neighbors by concurrently integrating the learned attention scores into observation embedding and inter-robot interaction rules. A KL divergence regularization is introduced to alleviate the resultant learning stability issue. Improvements in data efficiency and generalization are demonstrated through numerical simulations. Extensive quantitative analysis and ablation studies are performed to illustrate the advantages of the proposed modules. Real-world experiments are performed to justify the feasibility of deploying DACOOP-A in physical systems.
Efficient Online String Matching through Linked Weak Factors
Authors: Matthew N. Palmer, Simone Faro, Stefano Scafiti
Abstract
Online string matching is a computational problem involving the search for patterns or substrings in a large text dataset, with the pattern and text being processed sequentially, without prior access to the entire text. Its relevance stems from applications in data compression, data mining, text editing, and bioinformatics, where rapid and efficient pattern matching is crucial. Various solutions have been proposed over the past few decades, employing diverse techniques. Recently, weak recognition approaches have attracted increasing attention. This paper presents Hash Chain, a new algorithm based on a robust weak factor recognition approach that connects adjacent factors through hashing. Despite its O(nm) complexity, the algorithm exhibits a sublinear behavior in practice and achieves superior performance compared to the most effective algorithms.
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
Abstract
Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs.
SharkGraph: A Time Series Distributed Graph System
Authors: Derong Tang (tencent)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)
Abstract
Current graph systems can easily process billions of data, however when increased to exceed hundred billions, the performance decreases dramatically, time series data always be very huge, consequently computation on time series graphs still remains challenging nowadays. In current piece of work, we introduces SharkGraph, a (distributed file system) DFS-based time series graph system, used a novel storage structure (Time Series Graph Data File) TGF, By reading file stream to iterate graph computation, SharkGraph is able to execute batch graph query, simulation, data mining, or clustering algorithm on exceed hundred billions edge size industry graph. Through well defined experiments that shows SharkGraph performs well on large-scale graph processing, also can support time traversal for graphs, and recover state at any position in the timeline. By repeating experiments reported for existing distributed systems like GraphX, we demonstrate that SharkGraph can easily handle hundreds billions of data, rather than GraphX which met many problems such as memory issues and skewed distribution on graph traversal. Compared with other graph systems SharkGraph uses less memory and more efficiently to process the same graph.
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Abstract
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.
SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning
Authors: Khanh-Binh Nguyen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Semi-supervised learning (SSL) has become popular in recent years because it allows the training of a model using a large amount of unlabeled data. However, one issue that many SSL methods face is the confirmation bias, which occurs when the model is overfitted to the small labeled training dataset and produces overconfident, incorrect predictions. To address this issue, we propose SequenceMatch, an efficient SSL method that utilizes multiple data augmentations. The key element of SequenceMatch is the inclusion of a medium augmentation for unlabeled data. By taking advantage of different augmentations and the consistency constraints between each pair of augmented examples, SequenceMatch helps reduce the divergence between the prediction distribution of the model for weakly and strongly augmented examples. In addition, SequenceMatch defines two different consistency constraints for high and low-confidence predictions. As a result, SequenceMatch is more data-efficient than ReMixMatch, and more time-efficient than both ReMixMatch ($\times4$) and CoMatch ($\times2$) while having higher accuracy. Despite its simplicity, SequenceMatch consistently outperforms prior methods on standard benchmarks, such as CIFAR-10/100, SVHN, and STL-10. It also surpasses prior state-of-the-art methods by a large margin on large-scale datasets such as ImageNet, with a 38.46\% error rate. Code is available at https://github.com/beandkay/SequenceMatch.
Improving generalization in large language models by learning prefix subspaces
Authors: Louis Falissard, Vincent Guigue, Laure Soulier
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: https://github.com/Liloulou/prefix_subspace
Numerical Derivative-based Flexible Integration Algorithm for Power Electronic Systems Simulation Considering Nonlinear Components
Abstract
Simulation is an efficient tool in the design and control of power electronic systems. However, quick and accurate simulation of them is still challenging, especially when the system contains a large number of switches and state variables. Conventional general-purpose integration algorithms assume nonlinearity within systems but face inefficiency in handling the piecewise characteristics of power electronic switches. While some specialized algorithms can adapt to the piecewise characteristics, most of these methods require systems to be piecewise linear. In this article, a numerical derivative-based flexible integration algorithm is proposed. This algorithm can adapt to the piecewise characteristic caused by switches and have no difficulty when nonlinear non-switching components are present in the circuit. This algorithm consists of a recursive numerical scheme that obtains high-order time derivatives of nonlinear components and a decoupling strategy that further increases computational efficiency. The proposed method is applied to solve a motor derive system and a large-scale power conversion system (PCS) to verify its accuracy and efficiency by comparing experimental waveforms and simulated results given by commercial software. Our proposed method demonstrates several-fold acceleration compared to multiple commonly used algorithms in Simulink.
A Spline-Based Collocation Method for Stokes and Navier-Stokes equations
Abstract
In the paper, we propose a collocation method based on multivariate polynomial splines over triangulation or tetrahedralization for solving Stokes and Navier-Stokes equations. We start with a detailed explanation of the method for the Stokes equation and then extend the study to the Navier-Stokes equations. We shall show that the numerical solution can approximate the exact PDE solution very well over several domains. Then we present several numerical experimental results to demonstrate the performance of the method over the 2D and 3D settings. Also, we apply the IPBM method to our method to find the solution over several curved domains effectively. In addition, we present a comparison with the existing multivariate spline methods in \cite{AL02} and several existing methods to show that the new method produces a similar and sometimes more accurate approximation in a more efficient fashion.
Hybridized Formulations of Flux Reconstruction Schemes for Advection-Diffusion Problems
Abstract
We present the hybridization of flux reconstruction methods for advection-diffusion problems. Hybridization introduces a new variable into the problem so that it can be reduced via static condensation. This allows the solution of implicit discretizations to be done more efficiently. We derive an energy statement from a stability analysis considering a range of correction functions on hybridized and embedded flux reconstruction schemes. Then, we establish connections to standard formulations. We devise a post-processing scheme that leverages existing flux reconstruction operators to enhance accuracy for diffusion-dominated problems. Results show that the implicit convergence of these methods for advection-diffusion problems can result in performance benefits of over an order of magnitude. In addition, we observe that the superconvergence property of hybridized methods can be extended to the family of FR schemes for a range of correction functions.
State Sequences Prediction via Fourier Transform for Representation Learning
Authors: Mingxuan Ye, Yufei Kuang, Jie Wang, Rui Yang, Wengang Zhou, Houqiang Li, Feng Wu
Abstract
While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However, many existing methods do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we propose State Sequences Prediction via Fourier Transform (SPF), a novel method that exploits the frequency domain of state sequences to extract the underlying patterns in time series data for learning expressive representations efficiently. Specifically, we theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity, and then propose to predict the Fourier transform of infinite-step future state sequences to extract such information. One of the appealing features of SPF is that it is simple to implement while not requiring storage of infinite-step future states as prediction targets. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
Beam Design and Signal Enhancement in RIS-Aided Multi-User Communication Systems: A Maximization-Minimization Approach
Authors: Rujing Xiong, Jialong Lu, Ke Yin, Tiebin Mi, Robert Caiming Qiu
Abstract
Most beam design schemes in RIS-aided multi-user systems suffer from imbalanced signal power gains and computation complexity. This paper tackles the crucial beam design issue by focusing on received power optimization. Concretely, we leverage the passive characteristics of RIS, modeling the reflecting signals and articulating a comprehensive max-min optimization framework. To efficiently address the beamforming issue, we propose the Moreau-Yosida approximation (MA) algorithm, which pursues the optimal beamforming designs based on arbitrary utility functions in the user's received power. The comprehensive numerical simulations and prototype experiments substantiate the effectiveness of our proposed algorithm.
GO-FEAP: Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning
Abstract
Autonomous exploration is a fundamental problem for various applications of unmanned aerial vehicles(UAVs). Existing methods, however, are demonstrated to static local optima and two-dimensional exploration. To address these challenges, this paper introduces GO-FEAP (Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning), aiming to achieve efficient and complete three-dimensional exploration. Frontier-Omission-Aware Exploration module presented in this work takes into account multiple pivotal factors, encompassing frontier distance, nearby frontier count, frontier duration, and frontier categorization, for a comprehensive assessment of frontier importance. Furthermore, to tackle scenarios with substantial vertical variations, we introduce the Altitude-Stratified Planning strategy, which stratifies the three-dimensional space based on altitude, conducting global-local planning for each stratum. The objective of global planning is to identify the most optimal frontier for exploration, followed by viewpoint selection and local path optimization based on frontier type, ultimately generating dynamically feasible three-dimensional spatial exploration trajectories. We present extensive benchmark and real-world tests, in which our method completes the exploration tasks with unprecedented completeness compared to state-of-the-art approaches.
Online Robust Mean Estimation
Authors: Daniel M. Kane, Ilias Diakonikolas, Hanshen Xiao, Sihan Liu
Abstract
We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where $n$ sensors are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $\mu_t$ for the true mean value of the process at time $t$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $\epsilon$-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation $\mu$ to the true mean $\mu^\ast := \mathbf{E}[X]$. We note that if the algorithm is allowed to wait until time $T$ to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation. We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of $(\epsilon,\delta)$-stability, we give an efficient online algorithm that outputs estimates $\mu_t$, $t \in [T],$ such that with high probability it holds that $|\mu-\mu^\ast|_2 = O(\delta \log(T))$, where $\mu = (\mut){t \in [T]}$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O(\delta)$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all.
An Efficient Method for Realizing Contractions of Access Structures in Cloud Storage
Abstract
In single-cloud storage, ciphertext-policy attribute-based encryption (CP-ABE) allows one to encrypt any data under an access structure to a cloud server, specifying what attributes are required to decrypt. In multi-cloud storage, a secret sharing scheme (SSS) allows one to split any data into multiple shares, one to a single server, and specify which subset of the servers are able to recover the data. It is an interesting problem to remove some attributes/servers but still enable the remaining attributes/servers in every authorized set to recover the data. The problem is related to the contraction problem of access structures for SSSs. In this paper, we propose a method that can efficiently transform a given SSS for an access structure to SSSs for contractions of the access structure. We show its applications in solving the attribute removal problem in the CP-ABE based single-cloud storage and the data relocating problem in multi-cloud storage. Our method results in solutions that require either less server storage or even no additional server storage.
Data-driven Traffic Simulation: A Comprehensive Review
Authors: Di Chen, Meixin Zhu, Hao Yang, Xuesong Wang, Yinhai Wang
Abstract
Autonomous vehicles (AVs) have the potential to significantly revolutionize society by providing a secure and efficient mode of transportation. Recent years have witnessed notable advance-ments in autonomous driving perception and prediction, but the challenge of validating the performance of AVs remains largely unresolved. Data-driven microscopic traffic simulation has be-come an important tool for autonomous driving testing due to 1) availability of high-fidelity traffic data; 2) its advantages of ena-bling large-scale testing and scenario reproducibility; and 3) its potential in reactive and realistic traffic simulation. However, a comprehensive review of this topic is currently lacking. This pa-per aims to fill this gap by summarizing relevant studies. The primary objective of this paper is to review current research ef-forts and provide a futuristic perspective that will benefit future developments in the field. It introduces the general issues of data-driven traffic simulation and outlines key concepts and terms. After overviewing traffic simulation, various datasets and evalua-tion metrics commonly used are reviewed. The paper then offers a comprehensive evaluation of imitation learning, reinforcement learning, generative and deep learning methods, summarizing each and analyzing their advantages and disadvantages in detail. Moreover, it evaluates the state-of-the-art, existing challenges, and future research directions.
TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories
Abstract
Human demonstrations of trajectories are an important source of training data for many machine learning problems. However, the difficulty of collecting human demonstration data for complex tasks makes learning efficient representations of those trajectories challenging. For many problems, such as for handwriting or for quasistatic dexterous manipulation, the exact timings of the trajectories should be factored from their spatial path characteristics. In this work, we propose TimewarpVAE, a fully differentiable manifold-learning algorithm that incorporates Dynamic Time Warping (DTW) to simultaneously learn both timing variations and latent factors of spatial variation. We show how the TimewarpVAE algorithm learns appropriate time alignments and meaningful representations of spatial variations in small handwriting and fork manipulation datasets. Our results have lower spatial reconstruction test error than baseline approaches and the learned low-dimensional representations can be used to efficiently generate semantically meaningful novel trajectories.
Abstract
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult. While model-based RL algorithms (world models) improve data-efficiency to some extent, they still require hours or days of interaction to learn skills. Recently, offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction. However, constraining an algorithm to a fixed dataset induces a state-action distribution shift between training and inference, and limits its applicability to new tasks. In this work, we seek to get the best of both worlds: we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model. To mitigate extrapolation errors during online interaction, we propose to regularize the planner at test-time by balancing estimated returns and (epistemic) model uncertainty. We evaluate our method on a variety of visuo-motor control tasks in simulation and on a real robot, and find that our method enables few-shot finetuning to seen and unseen tasks even when offline data is limited. Videos, code, and data are available at https://yunhaifeng.com/FOWM .
WebWISE: Web Interface Control and Sequential Exploration with Large Language Models
Authors: Heyi Tao, Sethuraman T V, Michal Shlapentokh-Rothman, Derek Hoiem, Heng Ji
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations. Previous approaches, such as reinforcement learning (RL) or imitation learning, are inefficient to train and task-specific. Our method uses filtered Document Object Model (DOM) elements as observations and performs tasks step-by-step, sequentially generating small programs based on the current observations. We use in-context learning, either benefiting from a single manually provided example, or an automatically generated example based on a successful zero-shot trial. We evaluate the proposed method on the MiniWob++ benchmark. With only one in-context example, our WebWISE method achieves similar or better performance than other methods that require many demonstrations or trials.
A Unified, Scalable Framework for Neural Population Decoding
Authors: Mehdi Azabou, Vinam Arora, Venkataramana Ganesh, Ximeng Mao, Santosh Nachimuthu, Michael J. Mendelson, Blake Richards, Matthew G. Perich, Guillaume Lajoie, Eva L. Dyer
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Abstract
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.
Keyword: faster
The Quantum Tortoise and the Classical Hare: A simple framework for understanding which problems quantum computing will accelerate (and which it will not)
Authors: Sukwoong Choi, William S. Moses, Neil Thompson
Subjects: Data Structures and Algorithms (cs.DS); Emerging Technologies (cs.ET); General Economics (econ.GN); Quantum Physics (quant-ph)
Abstract
Quantum computing promises transformational gains for solving some problems, but little to none for others. For anyone hoping to use quantum computers now or in the future, it is important to know which problems will benefit. In this paper, we introduce a framework for answering this question both intuitively and quantitatively. The underlying structure of the framework is a race between quantum and classical computers, where their relative strengths determine when each wins. While classical computers operate faster, quantum computers can sometimes run more efficient algorithms. Whether the speed advantage or the algorithmic advantage dominates determines whether a problem will benefit from quantum computing or not. Our analysis reveals that many problems, particularly those of small to moderate size that can be important for typical businesses, will not benefit from quantum computing. Conversely, larger problems or those with particularly big algorithmic gains will benefit from near-term quantum computing. Since very large algorithmic gains are rare in practice and theorized to be rare even in principle, our analysis suggests that the benefits from quantum computing will flow either to users of these rare cases, or practitioners processing very large data.
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
Authors: Rongzhe Wei, Eleonora Kreačić, Haoyu Wang, Haoteng Yin, Eli Chien, Vamsi K. Potluru, Pan Li
Abstract
Privacy concerns have led to a surge in the creation of synthetic datasets, with diffusion models emerging as a promising avenue. Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $(\epsilon, \mathcal{O}(\frac{1}{s^2\epsilon}))$-pDP to $(\epsilon, \mathcal{O}(\frac{1}{s\epsilon}))$-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.
Hypergraph Motifs and Their Extensions Beyond Binary
Abstract
Hypergraphs naturally represent group interactions, which are omnipresent in many domains: collaborations of researchers, co-purchases of items, and joint interactions of proteins, to name a few. In this work, we propose tools for answering the following questions: (Q1) what are the structural design principles of real-world hypergraphs? (Q2) how can we compare local structures of hypergraphs of different sizes? (Q3) how can we identify domains from which hypergraphs are? We first define hypergraph motifs (h-motifs), which describe the overlapping patterns of three connected hyperedges. Then, we define the significance of each h-motif in a hypergraph as its occurrences relative to those in properly randomized hypergraphs. Lastly, we define the characteristic profile (CP) as the vector of the normalized significance of every h-motif. Regarding Q1, we find that h-motifs' occurrences in 11 real-world hypergraphs from 5 domains are clearly distinguished from those of randomized hypergraphs. Then, we demonstrate that CPs capture local structural patterns unique to each domain, and thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept of CP is extended to represent the connectivity pattern of each node or hyperedge as a vector, which proves useful in node classification and hyperedge prediction. Our algorithmic contribution is to propose MoCHy, a family of parallel algorithms for counting h-motifs' occurrences in a hypergraph. We theoretically analyze their speed and accuracy and show empirically that the advanced approximate version MoCHy-A+ is more accurate and faster than the basic approximate and exact versions, respectively. Furthermore, we explore ternary hypergraph motifs that extends h-motifs by taking into account not only the presence but also the cardinality of intersections among hyperedges. This extension proves beneficial for all previously mentioned applications.
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions
Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang
Abstract
Automated Graphical User Interface (GUI) testing plays a crucial role in ensuring app quality, especially as mobile applications have become an integral part of our daily lives. Despite the growing popularity of learning-based techniques in automated GUI testing due to their ability to generate human-like interactions, they still suffer from several limitations, such as low testing coverage, inadequate generalization capabilities, and heavy reliance on training data. Inspired by the success of Large Language Models (LLMs) like ChatGPT in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within this framework, we have also introduced a functionality-aware memory prompting mechanism that equips the LLM with the ability to retain testing knowledge of the whole process and conduct long-term, functionality-based reasoning to guide exploration. We evaluate it on 93 apps from Google Play and demonstrate that it outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have been confirmed and fixed.
E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
Authors: Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands. For the first time, we introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse, to improve the accuracy of N:M sparsity on LLM. E-Sparse employs the information richness to leverage the channel importance, and further incorporates several novel techniques to put it into effect: (1) it introduces information entropy to enhance the significance of parameter weights and input feature norms as a novel pruning metric, and performs N:M sparsity without modifying the remaining weights. (2) it designs global naive shuffle and local block shuffle to quickly optimize the information distribution and adequately cope with the impact of N:M sparsity on LLMs' accuracy. E-Sparse is implemented as a Sparse-GEMM on FasterTransformer and runs on NVIDIA Ampere GPUs. Extensive experiments on the LLaMA family and OPT models show that E-Sparse can significantly speed up the model inference over the dense model (up to 1.53X) and obtain significant memory saving (up to 43.52%), with acceptable accuracy loss.
Redactable Signature Schemes and Zero-knowledge Proofs: A comparative examination for applications in Decentralized Digital Identity Systems
Authors: Bryan Kumara, Mark Hooper, Carsten Maple, Timothy Hobson, Jon Crowcroft
Abstract
Redactable Signature Schemes and Zero-Knowledge Proofs are two radically different approaches to enable privacy. This paper analyses their merits and drawbacks when applied to decentralized identity system. Redactable Signatures, though competitively quick and compact, are not as expressive as zero-knowledge proofs and do not provide the same level of privacy. On the other hand, zero-knowledge proofs can be much faster but some protocols require a trusted set-up. We conclude that given the benefits and drawbacks, redactable signatures are more appropriate at an earlier stage and zero-knowledge proofs are more appropriate at a later stage for decentralized identity systems
Keyword: mobile
Containerized Vertical Farming Using Cobots
Authors: Dasharadhan Mahalingam, Aditya Patankar, Khiem Phi, Nilanjan Chakraborty, Ryan McGann, IV Ramakrishnan
Abstract
Containerized vertical farming is a type of vertical farming practice using hydroponics in which plants are grown in vertical layers within a mobile shipping container. Space limitations within shipping containers make the automation of different farming operations challenging. In this paper, we explore the use of cobots (i.e., collaborative robots) to automate two key farming operations, namely, the transplantation of saplings and the harvesting of grown plants. Our method uses a single demonstration from a farmer to extract the motion constraints associated with the tasks, namely, transplanting and harvesting, and can then generalize to different instances of the same task. For transplantation, the motion constraint arises during insertion of the sapling within the growing tube, whereas for harvesting, it arises during extraction from the growing tube. We present experimental results to show that using RGBD camera images (obtained from an eye-in-hand configuration) and one demonstration for each task, it is feasible to perform transplantation of saplings and harvesting of leafy greens using a cobot, without task-specific programming.
UI Layout Generation with LLMs Guided by UI Grammar
Abstract
The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarchical structure inherent in UI screens. The aim of this approach is to guide the generative capacities of LLMs more effectively and improve the explainability and controllability of the process. Initial experiments conducted with GPT-4 showed the promising capability of LLMs to produce high-quality user interfaces via in-context learning. Furthermore, our preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results in specific aspects.
Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model
Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang
Abstract
Mobile applications have become a ubiquitous part of our daily life, providing users with access to various services and utilities. Text input, as an important interaction channel between users and applications, plays an important role in core functionality such as search queries, authentication, messaging, etc. However, certain special text (e.g., -18 for Font Size) can cause the app to crash, and generating diversified unusual inputs for fully testing the app is highly demanded. Nevertheless, this is also challenging due to the combination of explosion dilemma, high context sensitivity, and complex constraint relations. This paper proposes InputBlaster which leverages the LLM to automatically generate unusual text inputs for mobile app crash detection. It formulates the unusual inputs generation problem as a task of producing a set of test generators, each of which can yield a batch of unusual text inputs under the same mutation rule. In detail, InputBlaster leverages LLM to produce the test generators together with the mutation rules serving as the reasoning chain, and utilizes the in-context learning schema to demonstrate the LLM with examples for boosting the performance. InputBlaster is evaluated on 36 text input widgets with cash bugs involving 31 popular Android apps, and results show that it achieves 78% bug detection rate, with 136% higher than the best baseline. Besides, we integrate it with the automated GUI testing tool and detect 37 unseen crashes in real-world apps from Google Play.
Improving Diffusion Models for ECG Imputation with an Augmented Template Prior
Authors: Alexander Jenkins, Zehua Chen, Fu Siong Ng, Danilo Mandic
Abstract
Pulsative signals such as the electrocardiogram (ECG) are extensively collected as part of routine clinical care. However, noisy and poor-quality recordings, leading to missing values, are a major issue for signals collected using mobile health systems, decreasing the signal quality and affecting the automated downstream tasks. Recent studies have explored imputation of missing values for ECG with probabilistic time-series models. Nevertheless, in comparison with the deterministic models, their performance is still limited, as the variations across subjects and heart-beat relationships are not explicitly considered in the training objective. In this work, to improve the ECG imputation and forecasting accuracy with probabilistic models, we present an template-guided denoising diffusion probabilistic model, PulseDiff, which is conditioned an informative prior for a range of health conditions. Specifically, 1) we first extract a subject-level pulsative template from the observation as an informative prior of missing values, which captures the personal characteristics; 2) we then add beat-level stochastic shift terms on the template for prior augmentation, which considers the beat-level variance of positioning and amplitude; 3) we finally design a confidence score to consider the health condition of subject, which ensures our prior is provided in a safe way. Experiments with the PTBXL dataset reveal PulseDiff improves the performance of two strong DDPMs baseline models, CSDI and SSSD$^{S4}$, verifying our method guides the generation of DDPMs while managing the uncertainty. When combining with SSSD$^{S4}$, our PulseDiff method outperforms the leading deterministic model for short-interval missing data and is comparable for long-interval data loss.
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions
Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang
Abstract
Automated Graphical User Interface (GUI) testing plays a crucial role in ensuring app quality, especially as mobile applications have become an integral part of our daily lives. Despite the growing popularity of learning-based techniques in automated GUI testing due to their ability to generate human-like interactions, they still suffer from several limitations, such as low testing coverage, inadequate generalization capabilities, and heavy reliance on training data. Inspired by the success of Large Language Models (LLMs) like ChatGPT in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within this framework, we have also introduced a functionality-aware memory prompting mechanism that equips the LLM with the ability to retain testing knowledge of the whole process and conduct long-term, functionality-based reasoning to guide exploration. We evaluate it on 93 apps from Google Play and demonstrate that it outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have been confirmed and fixed.
A Resilient Framework for 5G-Edge-Connected UAVs based on Switching Edge-MPC and Onboard-PID Control
Abstract
In recent years, the need for resources for handling processes with high computational complexity for mobile robots is becoming increasingly urgent. More specifically, robots need to autonomously operate in a robust and continuous manner, while keeping high performance, a need that led to the utilization of edge computing to offload many computationally demanding and time-critical robotic procedures. However, safe mechanisms should be implemented to handle situations when it is not possible to use the offloaded procedures, such as if the communication is challenged or the edge cluster is not available. To this end, this article presents a switching strategy for safety, redundancy, and optimized behavior through an edge computing-based Model Predictive Controller (MPC) and a low-level onboard-PID controller for edge-connected Unmanned Aerial Vehicles (UAVs). The switching strategy is based on the communication Key Performance Indicators (KPIs) over 5G to decide whether the UAV should be controlled by the edge-based or have a safe fallback based on the onboard controller.
CDSD: Chinese Dysarthria Speech Database
Authors: Mengyi Sun, Ming Gao, Xinchen Kang, Shiru Wang, Jun Du, Dengfeng Yao, Su-Jing Wang
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text pool primarily consists of content from the AISHELL-1 dataset and speeches by primary and secondary school students. When participants read these texts, they must use a mobile device or the ZOOM F8n multi-track field recorder to record their speeches. In this paper, we elucidate the data collection and annotation processes and present an approach for establishing a baseline for dysarthric speech recognition. Furthermore, we conducted a speaker-dependent dysarthric speech recognition experiment using an additional 10 hours of speech data from one of our participants. Our research findings indicate that, through extensive data-driven model training, fine-tuning limited quantities of specific individual data yields commendable results in speaker-dependent dysarthric speech recognition. However, we observe significant variations in recognition results among different dysarthric speakers. These insights provide valuable reference points for speaker-dependent dysarthric speech recognition.
Abstract
If a robot masters folding a kitchen towel, we would also expect it to master folding a beach towel. However, existing works for policy learning that rely on data set augmentations are still limited in achieving this level of generalization. Our insight is to add equivariance to both the visual object representation and policy architecture. We propose EquivAct which utilizes SIM(3)-equivariant network structures that guarantee generalization across all possible object translations, 3D rotations, and scales by construction. Training of EquivAct is done in two phases. We first pre-train a SIM(3)-equivariant visual representation on simulated scene point clouds. Then, we learn a SIM(3)-equivariant visuomotor policy on top of the pre-trained visual representation using a small amount of source task demonstrations. We demonstrate that after training, the learned policy directly transfers to objects that substantially differ in scale, position and orientation from the source demonstrations. In simulation, we evaluate our method in three manipulation tasks involving deformable and articulated objects thereby going beyond the typical rigid object manipulation tasks that prior works considered. We show that our method outperforms prior works that do not use equivariant architectures or do not use our contrastive pre-training procedure. We also show quantitative and qualitative experiments on three real robot tasks, where the robot watches twenty demonstrations of a tabletop task and transfers zero-shot to a mobile manipulation task in a much larger setup. Project website: https://equivact.github.io
Keyword: pruning
LXMERT Model Compression for Visual Question Answering
Authors: Maryam Hashemi, Ghazaleh Mahmoudi, Sara Kodeiri, Hadi Sheikhi, Sauleh Eetemadi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Large-scale pretrained models such as LXMERT are becoming popular for learning cross-modal representations on text-image pairs for vision-language tasks. According to the lottery ticket hypothesis, NLP and computer vision models contain smaller subnetworks capable of being trained in isolation to full performance. In this paper, we combine these observations to evaluate whether such trainable subnetworks exist in LXMERT when fine-tuned on the VQA task. In addition, we perform a model size cost-benefit analysis by investigating how much pruning can be done without significant loss in accuracy. Our experiment results demonstrate that LXMERT can be effectively pruned by 40%-60% in size with 3% loss in accuracy.
Sparse Bayesian neural networks for regression: Tackling overfitting and computational challenges in uncertainty quantification
Abstract
Neural networks (NNs) are primarily developed within the frequentist statistical framework. Nevertheless, frequentist NNs lack the capability to provide uncertainties in the predictions, and hence their robustness can not be adequately assessed. Conversely, the Bayesian neural networks (BNNs) naturally offer predictive uncertainty by applying Bayes' theorem. However, their computational requirements pose significant challenges. Moreover, both frequentist NNs and BNNs suffer from overfitting issues when dealing with noisy and sparse data, which render their predictions unwieldy away from the available data space. To address both these problems simultaneously, we leverage insights from a hierarchical setting in which the parameter priors are conditional on hyperparameters to construct a BNN by applying a semi-analytical framework known as nonlinear sparse Bayesian learning (NSBL). We call our network sparse Bayesian neural network (SBNN) which aims to address the practical and computational issues associated with BNNs. Simultaneously, imposing a sparsity-inducing prior encourages the automatic pruning of redundant parameters based on the automatic relevance determination (ARD) concept. This process involves removing redundant parameters by optimally selecting the precision of the parameters prior probability density functions (pdfs), resulting in a tractable treatment for overfitting. To demonstrate the benefits of the SBNN algorithm, the study presents an illustrative regression problem and compares the results of a BNN using standard Bayesian inference, hierarchical Bayesian inference, and a BNN equipped with the proposed algorithm. Subsequently, we demonstrate the importance of considering the full parameter posterior by comparing the results with those obtained using the Laplace approximation with and without NSBL.
E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
Authors: Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands. For the first time, we introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse, to improve the accuracy of N:M sparsity on LLM. E-Sparse employs the information richness to leverage the channel importance, and further incorporates several novel techniques to put it into effect: (1) it introduces information entropy to enhance the significance of parameter weights and input feature norms as a novel pruning metric, and performs N:M sparsity without modifying the remaining weights. (2) it designs global naive shuffle and local block shuffle to quickly optimize the information distribution and adequately cope with the impact of N:M sparsity on LLMs' accuracy. E-Sparse is implemented as a Sparse-GEMM on FasterTransformer and runs on NVIDIA Ampere GPUs. Extensive experiments on the LLaMA family and OPT models show that E-Sparse can significantly speed up the model inference over the dense model (up to 1.53X) and obtain significant memory saving (up to 43.52%), with acceptable accuracy loss.
Abstract
Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no reference audio exists, requiring manual annotation of event timings from the video. We propose a system to extract repetitive actions onsets from a video, which are then used - in conjunction with audio or textual embeddings - to condition a diffusion model trained to generate a new synchronized sound effects audio track. In this way, we leave complete creative control to the sound designer while removing the burden of synchronization with video. Furthermore, editing the onset track or changing the conditioning embedding requires much less effort than editing the audio track itself, simplifying the sonification process. We provide sound examples, source code, and pretrained models to faciliate reproducibility
Fast and Reliable Generation of EHR Time Series via Diffusion Models
Authors: Muhang Tian, Bernie Chen, Allan Guo, Shiyi Jiang, Anru R. Zhang
Abstract
Electronic Health Records (EHRs) are rich sources of patient-level data, including laboratory tests, medications, and diagnoses, offering valuable resources for medical data analysis. However, concerns about privacy often restrict access to EHRs, hindering downstream analysis. Researchers have explored various methods for generating privacy-preserving EHR data. In this study, we introduce a new method for generating diverse and realistic synthetic EHR time series data using Denoising Diffusion Probabilistic Models (DDPM). We conducted experiments on six datasets, comparing our proposed method with seven existing methods. Our results demonstrate that our approach significantly outperforms all existing methods in terms of data utility while requiring less training effort. Our approach also enhances downstream medical data analysis by providing diverse and realistic synthetic EHR data.
DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM
Abstract
In the burgeoning field of natural language processing, Neural Topic Models (NTMs) and Large Language Models (LLMs) have emerged as areas of significant research interest. Despite this, NTMs primarily utilize contextual embeddings from LLMs, which are not optimal for clustering or capable for topic generation. Our study addresses this gap by introducing a novel framework named Diffusion-Enhanced Topic Modeling using Encoder-Decoder-based LLMs (DeTiME). DeTiME leverages ncoder-Decoder-based LLMs to produce highly clusterable embeddings that could generate topics that exhibit both superior clusterability and enhanced semantic coherence compared to existing methods. Additionally, by exploiting the power of diffusion, our framework also provides the capability to generate content relevant to the identified topics. This dual functionality allows users to efficiently produce highly clustered topics and related content simultaneously. DeTiME's potential extends to generating clustered embeddings as well. Notably, our proposed framework proves to be efficient to train and exhibits high adaptability, demonstrating its potential for a wide array of applications.
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
Authors: Rongzhe Wei, Eleonora Kreačić, Haoyu Wang, Haoteng Yin, Eli Chien, Vamsi K. Potluru, Pan Li
Abstract
Privacy concerns have led to a surge in the creation of synthetic datasets, with diffusion models emerging as a promising avenue. Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $(\epsilon, \mathcal{O}(\frac{1}{s^2\epsilon}))$-pDP to $(\epsilon, \mathcal{O}(\frac{1}{s\epsilon}))$-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.
ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts
Authors: Lena S. Bolliger, David R. Reich, Patrick Haller, Deborah N. Jakobi, Paul Prasse, Lena A. Jäger
Abstract
Eye movements in reading play a crucial role in psycholinguistic research studying the cognitive mechanisms underlying human language processing. More recently, the tight coupling between eye movements and cognition has also been leveraged for language-related machine learning tasks such as the interpretability, enhancement, and pre-training of language models, as well as the inference of reader- and text-specific properties. However, scarcity of eye movement data and its unavailability at application time poses a major challenge for this line of research. Initially, this problem was tackled by resorting to cognitive models for synthesizing eye movement data. However, for the sole purpose of generating human-like scanpaths, purely data-driven machine-learning-based methods have proven to be more suitable. Following recent advances in adapting diffusion processes to discrete data, we propose ScanDL, a novel discrete sequence-to-sequence diffusion model that generates synthetic scanpaths on texts. By leveraging pre-trained word representations and jointly embedding both the stimulus text and the fixation sequence, our model captures multi-modal interactions between the two inputs. We evaluate ScanDL within- and across-dataset and demonstrate that it significantly outperforms state-of-the-art scanpath generation methods. Finally, we provide an extensive psycholinguistic analysis that underlines the model's ability to exhibit human-like reading behavior. Our implementation is made available at https://github.com/DiLi-Lab/ScanDL.
Semantic-preserving image coding based on Conditional Diffusion models
Authors: Francesco Pezone, Osman Musa, Giuseppe Caire, Sergio Barbarossa
Abstract
Semantic communication, rather than on a bit-by-bit recovery of the transmitted messages, focuses on the meaning and the goal of the communication itself. In this paper, we propose a novel semantic image coding scheme that preserves the semantic content of an image, while ensuring a good trade-off between coding rate and image quality. The proposed Semantic-Preserving Image Coding based on Conditional Diffusion Models (SPIC) transmitter encodes a Semantic Segmentation Map (SSM) and a low-resolution version of the image to be transmitted. The receiver then reconstructs a high-resolution image using a Denoising Diffusion Probabilistic Models (DDPM) doubly conditioned to the SSM and the low-resolution image. As shown by the numerical examples, compared to state-of-the-art (SOTA) approaches, the proposed SPIC exhibits a better balance between the conventional rate-distortion trade-off and the preservation of semantically-relevant features.
Improving Diffusion Models for ECG Imputation with an Augmented Template Prior
Authors: Alexander Jenkins, Zehua Chen, Fu Siong Ng, Danilo Mandic
Abstract
Pulsative signals such as the electrocardiogram (ECG) are extensively collected as part of routine clinical care. However, noisy and poor-quality recordings, leading to missing values, are a major issue for signals collected using mobile health systems, decreasing the signal quality and affecting the automated downstream tasks. Recent studies have explored imputation of missing values for ECG with probabilistic time-series models. Nevertheless, in comparison with the deterministic models, their performance is still limited, as the variations across subjects and heart-beat relationships are not explicitly considered in the training objective. In this work, to improve the ECG imputation and forecasting accuracy with probabilistic models, we present an template-guided denoising diffusion probabilistic model, PulseDiff, which is conditioned an informative prior for a range of health conditions. Specifically, 1) we first extract a subject-level pulsative template from the observation as an informative prior of missing values, which captures the personal characteristics; 2) we then add beat-level stochastic shift terms on the template for prior augmentation, which considers the beat-level variance of positioning and amplitude; 3) we finally design a confidence score to consider the health condition of subject, which ensures our prior is provided in a safe way. Experiments with the PTBXL dataset reveal PulseDiff improves the performance of two strong DDPMs baseline models, CSDI and SSSD$^{S4}$, verifying our method guides the generation of DDPMs while managing the uncertainty. When combining with SSSD$^{S4}$, our PulseDiff method outperforms the leading deterministic model for short-interval missing data and is comparable for long-interval data loss.
Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations
Authors: Ye Yuan, Xin Li, Yong Heng, Leiji Zhang, MingZhong Wang
Abstract
Imitation Learning (IL) aims to discover a policy by minimizing the discrepancy between the agent's behavior and expert demonstrations. However, IL is susceptible to limitations imposed by noisy demonstrations from non-expert behaviors, presenting a significant challenge due to the lack of supplementary information to assess their expertise. In this paper, we introduce Self-Motivated Imitation LEarning (SMILE), a method capable of progressively filtering out demonstrations collected by policies deemed inferior to the current policy, eliminating the need for additional information. We utilize the forward and reverse processes of Diffusion Models to emulate the shift in demonstration expertise from low to high and vice versa, thereby extracting the noise information that diffuses expertise. Then, the noise information is leveraged to predict the diffusion steps between the current policy and demonstrators, which we theoretically demonstrate its equivalence to their expertise gap. We further explain in detail how the predicted diffusion steps are applied to filter out noisy demonstrations in a self-motivated manner and provide its theoretical grounds. Through empirical evaluations on MuJoCo tasks, we demonstrate that our method is proficient in learning the expert policy amidst noisy demonstrations, and effectively filters out demonstrations with expertise inferior to the current policy.
Discriminator Guidance for Autoregressive Diffusion Models
Authors: Filip Ekström Kelvinius, Fredrik Lindsten
Abstract
We introduce discriminator guidance in the setting of Autoregressive Diffusion Models. The use of a discriminator to guide a diffusion process has previously been used for continuous diffusion models, and in this work we derive ways of using a discriminator together with a pretrained generative model in the discrete case. First, we show that using an optimal discriminator will correct the pretrained model and enable exact sampling from the underlying data distribution. Second, to account for the realistic scenario of using a sub-optimal discriminator, we derive a sequential Monte Carlo algorithm which iteratively takes the predictions from the discrimiator into account during the generation process. We test these approaches on the task of generating molecular graphs and show how the discriminator improves the generative performance over using only the pretrained model.
A Diffusion Weighted Graph Framework for New Intent Discovery
Abstract
New Intent Discovery (NID) aims to recognize both new and known intents from unlabeled data with the aid of limited labeled data containing only known intents. Without considering structure relationships between samples, previous methods generate noisy supervisory signals which cannot strike a balance between quantity and quality, hindering the formation of new intent clusters and effective transfer of the pre-training knowledge. To mitigate this limitation, we propose a novel Diffusion Weighted Graph Framework (DWGF) to capture both semantic similarities and structure relationships inherent in data, enabling more sufficient and reliable supervisory signals. Specifically, for each sample, we diffuse neighborhood relationships along semantic paths guided by the nearest neighbors for multiple hops to characterize its local structure discriminately. Then, we sample its positive keys and weigh them based on semantic similarities and local structures for contrastive learning. During inference, we further propose Graph Smoothing Filter (GSF) to explicitly utilize the structure relationships to filter high-frequency noise embodied in semantically ambiguous samples on the cluster boundary. Extensive experiments show that our method outperforms state-of-the-art models on all evaluation metrics across multiple benchmark datasets. Code and data are available at https://github.com/yibai-shi/DWGF.
Hybridized Formulations of Flux Reconstruction Schemes for Advection-Diffusion Problems
Abstract
We present the hybridization of flux reconstruction methods for advection-diffusion problems. Hybridization introduces a new variable into the problem so that it can be reduced via static condensation. This allows the solution of implicit discretizations to be done more efficiently. We derive an energy statement from a stability analysis considering a range of correction functions on hybridized and embedded flux reconstruction schemes. Then, we establish connections to standard formulations. We devise a post-processing scheme that leverages existing flux reconstruction operators to enhance accuracy for diffusion-dominated problems. Results show that the implicit convergence of these methods for advection-diffusion problems can result in performance benefits of over an order of magnitude. In addition, we observe that the superconvergence property of hybridized methods can be extended to the family of FR schemes for a range of correction functions.
Language-driven Scene Synthesis using Multi-conditional Diffusion Model
Authors: An Vuong, Minh Nhat Vu, Toan Tien Nguyen, Baoru Huang, Dzung Nguyen, Thieu Vo, Anh Nguyen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which is a new task that integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results illustrate that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/.
Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles
Authors: Xing Shen, Hengguan Huang, Brennan Nichyporuk, Tal Arbel
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
While deep learning models have achieved remarkable success across a range of medical image analysis tasks, deployment of these models in real clinical contexts requires that they be robust to variability in the acquired images. While many methods apply predefined transformations to augment the training data to enhance test-time robustness, these transformations may not ensure the model's robustness to the diverse variability seen in patient images. In this paper, we introduce a novel three-stage approach based on transformers coupled with conditional diffusion models, with the goal of improving model robustness to the kinds of imaging variability commonly encountered in practice without the need for pre-determined data augmentation strategies. To this end, multiple image encoders first learn hierarchical feature representations to build discriminative latent spaces. Next, a reverse diffusion process, guided by the latent code, acts on an informative prior and proposes prediction candidates in a generative manner. Finally, several prediction candidates are aggregated in a bi-level aggregation protocol to produce the final output. Through extensive experiments on medical imaging benchmark datasets, we show that our method improves upon state-of-the-art methods in terms of robustness and confidence calibration. Additionally, we introduce a strategy to quantify the prediction uncertainty at the instance level, increasing their trustworthiness to clinicians using them in clinical practice.
CVPR 2023 Text Guided Video Editing Competition
Authors: Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest Iandola
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Humans watch more than a billion hours of video per day. Most of this video was edited manually, which is a tedious process. However, AI-enabled video-generation and video-editing is on the rise. Building on text-to-image models like Stable Diffusion and Imagen, generative AI has improved dramatically on video tasks. But it's hard to evaluate progress in these video tasks because there is no standard benchmark. So, we propose a new dataset for text-guided video editing (TGVE), and we run a competition at CVPR to evaluate models on our TGVE dataset. In this paper we present a retrospective on the competition and describe the winning method. The competition dataset is available at https://sites.google.com/view/loveucvpr23/track4.
From Posterior Sampling to Meaningful Diversity in Image Restoration
Abstract
Image restoration problems are typically ill-posed in the sense that each degraded image can be restored in infinitely many valid ways. To accommodate this, many works generate a diverse set of outputs by attempting to randomly sample from the posterior distribution of natural images given the degraded input. Here we argue that this strategy is commonly of limited practical value because of the heavy tail of the posterior distribution. Consider for example inpainting a missing region of the sky in an image. Since there is a high probability that the missing region contains no object but clouds, any set of samples from the posterior would be entirely dominated by (practically identical) completions of sky. However, arguably, presenting users with only one clear sky completion, along with several alternative solutions such as airships, birds, and balloons, would better outline the set of possibilities. In this paper, we initiate the study of meaningfully diverse image restoration. We explore several post-processing approaches that can be combined with any diverse image restoration method to yield semantically meaningful diversity. Moreover, we propose a practical approach for allowing diffusion based image restoration methods to generate meaningfully diverse outputs, while incurring only negligent computational overhead. We conduct extensive user studies to analyze the proposed techniques, and find the strategy of reducing similarity between outputs to be significantly favorable over posterior sampling. Code and examples are available in https://noa-cohen.github.io/MeaningfulDiversityInIR
Keyword: adaptive
Adaptive End-to-End Metric Learning for Zero-Shot Cross-Domain Slot Filling
Abstract
Recently slot filling has witnessed great development thanks to deep learning and the availability of large-scale annotated data. However, it poses a critical challenge to handle a novel domain whose samples are never seen during training. The recognition performance might be greatly degraded due to severe domain shifts. Most prior works deal with this problem in a two-pass pipeline manner based on metric learning. In practice, these dominant pipeline models may be limited in computational efficiency and generalization capacity because of non-parallel inference and context-free discrete label embeddings. To this end, we re-examine the typical metric-based methods, and propose a new adaptive end-to-end metric learning scheme for the challenging zero-shot slot filling. Considering simplicity, efficiency and generalizability, we present a cascade-style joint learning framework coupled with context-aware soft label representations and slot-level contrastive representation learning to mitigate the data and label shift problems effectively. Extensive experiments on public benchmarks demonstrate the superiority of the proposed approach over a series of competitive baselines.
Finite-Time Adaptive Fuzzy Tracking Control for Nonlinear State Constrained Pure-Feedback Systems
Authors: Ju Wu, Tong Wang, Min Ma
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
Abstract
This paper investigates the finite-time adaptive fuzzy tracking control problem for a class of pure-feedback system with full-state constraints. With the help of Mean-Value Theorem, the pure-feedback nonlinear system is transformed into strict-feedback case. By employing finite-time-stable like function and state transformation for output tracking error, the output tracking error converges to a predefined set in a fixed finite interval. To tackle the problem of state constraints, integral Barrier Lyapunov functions are utilized to guarantee that the state variables remain within the prescribed constraints with feasibility check. Fuzzy logic systems are utilized to approximate the unknown nonlinear functions. In addition, all the signals in the closed-loop system are guaranteed to be semi-global ultimately uniformly bounded. Finally, two simulation examples are given to show the effectiveness of the proposed control strategy.
SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation
Authors: Jialing Pan, Adrien Sadé, Jin Kim, Eric Soriano, Guillem Sole, Sylvain Flamant
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
With the recent focus on Large Language Models (LLMs), both StarCoder (Li et al., 2023) and Code Llama (Rozi`ere et al., 2023) have demonstrated remarkable performance in code generation. However, there is still a need for improvement in code translation functionality with efficient training techniques. In response to this, we introduce SteloCoder, a decoder-only StarCoder-based LLM designed specifically for multi-programming language-to-Python code translation. In particular, SteloCoder achieves C++, C#, JavaScript, Java, or PHP-to-Python code translation without specifying the input programming language. We modified StarCoder model architecture by incorporating a Mixture-of-Experts (MoE) technique featuring five experts and a gating network for multi-task handling. Experts are obtained by StarCoder fine-tuning. Specifically, we use a Low-Rank Adaptive Method (LoRA) technique, limiting each expert size as only 0.06% of number of StarCoder's parameters. At the same time, to enhance training efficiency in terms of time, we adopt curriculum learning strategy and use self-instruct data for efficient fine-tuning. As a result, each expert takes only 6 hours to train on one single 80Gb A100 HBM. With experiments on XLCoST datasets, SteloCoder achieves an average of 73.76 CodeBLEU score in multi-programming language-to-Python translation, surpassing the top performance from the leaderboard by at least 3.5. This accomplishment is attributed to only 45M extra parameters with StarCoder as the backbone and 32 hours of valid training on one 80GB A100 HBM. The source code is release here: https://github.com/sade-adrien/SteloCoder.
I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation
Abstract
Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (I$^2$MD) framework. In I$^2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training. To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy, In IMD, the Dynamic Neighbors Aggregation (DNA) mechanism is first introduced, where an additional cluster-level discrimination branch is instantiated in each modality. It adaptively aggregates highly-correlated neighboring features, forming local cluster-level contrasting. Mutual distillation is then performed between the two branches for cross-level knowledge exchange. Extensive experiments on three datasets show that our approach sets a series of new records.
3D Multi-Target Localization Via Intelligent Reflecting Surface: Protocol and Analysis
Abstract
With the emerging environment-aware applications, ubiquitous sensing is expected to play a key role in future networks. In this paper, we study a 3-dimensional (3D) multi-target localization system where multiple intelligent reflecting surfaces (IRSs) are applied to create virtual line-of-sight (LoS) links that bypass the base station (BS) and targets. To fully unveil the fundamental limit of IRS for sensing, we first study a single-target-single-IRS case and propose a novel \textit{two-stage localization protocol} by controlling the on/off state of IRS. To be specific, in the IRS-off stage, we derive the Cram\'{e}r-Rao bound (CRB) of the azimuth/elevation direction-of-arrival (DoA) of the BS-target link and design a DoA estimator based on the MUSIC algorithm. In the IRS-on stage, the CRB of the azimuth/elevation DoA of the IRS-target link is derived and a simple DoA estimator based on the on-grid IRS beam scanning method is proposed. Particularly, the impact of echo signals reflected by IRS from different paths on sensing performance is analyzed. Moreover, we prove that the single-beam of the IRS is not capable of sensing, but it can be achieved with \textit{multi-beam}. Based on the two obtained DoAs, the 3D single-target location is constructed. We then extend to the multi-target-multi-IRS case and propose an \textit{IRS-adaptive sensing protocol} by controlling the on/off state of multiple IRSs, and a multi-target localization algorithm is developed. Simulation results demonstrate the effectiveness of our scheme and show that sub-meter-level positioning accuracy can be achieved.
Learning Agility and Adaptive Legged Locomotion via Curricular Hindsight Reinforcement Learning
Authors: Sicen Li, Yiming Pang, Panju Bai, Zhaojin Liu, Jiawei Li, Shihao Hu, Liquan Wang, Gang Wang
Abstract
Agile and adaptive maneuvers such as fall recovery, high-speed turning, and sprinting in the wild are challenging for legged systems. We propose a Curricular Hindsight Reinforcement Learning (CHRL) that learns an end-to-end tracking controller that achieves powerful agility and adaptation for the legged robot. The two key components are (I) a novel automatic curriculum strategy on task difficulty and (ii) a Hindsight Experience Replay strategy adapted to legged locomotion tasks. We demonstrated successful agile and adaptive locomotion on a real quadruped robot that performed fall recovery autonomously, coherent trotting, sustained outdoor speeds up to 3.45 m/s, and tuning speeds up to 3.2 rad/s. This system produces adaptive behaviours responding to changing situations and unexpected disturbances on natural terrains like grass and dirt.
DACOOP-A: Decentralized Adaptive Cooperative Pursuit via Attention
Abstract
Integrating rule-based policies into reinforcement learning promises to improve data efficiency and generalization in cooperative pursuit problems. However, most implementations do not properly distinguish the influence of neighboring robots in observation embedding or inter-robot interaction rules, leading to information loss and inefficient cooperation. This paper proposes a cooperative pursuit algorithm named Decentralized Adaptive COOperative Pursuit via Attention (DACOOP-A) by empowering reinforcement learning with artificial potential field and attention mechanisms. An attention-based framework is developed to emphasize important neighbors by concurrently integrating the learned attention scores into observation embedding and inter-robot interaction rules. A KL divergence regularization is introduced to alleviate the resultant learning stability issue. Improvements in data efficiency and generalization are demonstrated through numerical simulations. Extensive quantitative analysis and ablation studies are performed to illustrate the advantages of the proposed modules. Real-world experiments are performed to justify the feasibility of deploying DACOOP-A in physical systems.
Query-adaptive DETR for Crowded Pedestrian Detection
Authors: Feng Gao, Jiaxu Leng, Ji Gan, Xinbo Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
DEtection TRansformer (DETR) and its variants (DETRs) have been successfully applied to crowded pedestrian detection, which achieved promising performance. However, we find that, in different degrees of crowded scenes, the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees. In this paper, we first analyze the two current query generation methods and summarize four guidelines for designing the adaptive query generation method. Then, we propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem. Specifically, we design a rank prediction head that can predict the rank of the lowest confidence positive training sample produced by the encoder. Based on the predicted rank, we design an adaptive selection method that can adaptively select coarse detection results produced by the encoder to generate queries. Moreover, to train the rank prediction head better, we propose Soft Gradient L1 Loss. The gradient of Soft Gradient L1 Loss is continuous, which can describe the relationship between the loss value and the updated value of model parameters granularly. Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory. The experimental results on Crowdhuman dataset and Citypersons dataset show that our method can adaptively generate queries for DETRs and achieve competitive results. Especially, our method achieves state-of-the-art 39.4% MR on Crowdhuman dataset.
LiCROM: Linear-Subspace Continuous Reduced Order Modeling with Neural Fields
Authors: Yue Chang, Peter Yichen Chen, Zhecheng Wang, Maurizio M. Chiaramonte, Kevin Carlberg, Eitan Grinspun
Abstract
Linear reduced-order modeling (ROM) simplifies complex simulations by approximating the behavior of a system using a simplified kinematic representation. Typically, ROM is trained on input simulations created with a specific spatial discretization, and then serves to accelerate simulations with the same discretization. This discretization-dependence is restrictive. Becoming independent of a specific discretization would provide flexibility to mix and match mesh resolutions, connectivity, and type (tetrahedral, hexahedral) in training data; to accelerate simulations with novel discretizations unseen during training; and to accelerate adaptive simulations that temporally or parametrically change the discretization. We present a flexible, discretization-independent approach to reduced-order modeling. Like traditional ROM, we represent the configuration as a linear combination of displacement fields. Unlike traditional ROM, our displacement fields are continuous maps from every point on the reference domain to a corresponding displacement vector; these maps are represented as implicit neural fields. With linear continuous ROM (LiCROM), our training set can include multiple geometries undergoing multiple loading conditions, independent of their discretization. This opens the door to novel applications of reduced order modeling. We can now accelerate simulations that modify the geometry at runtime, for instance via cutting, hole punching, and even swapping the entire mesh. We can also accelerate simulations of geometries unseen during training. We demonstrate one-shot generalization, training on a single geometry and subsequently simulating various unseen geometries.
Combining Behaviors with the Successor Features Keyboard
Authors: Wilka Carvalho, Andre Saraiva, Angelos Filos, Andrew Kyle Lampinen, Loic Matthey, Richard L. Lewis, Honglak Lee, Satinder Singh, Danilo J. Rezende, Daniel Zoran
Abstract
The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI). However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment. In this work, we propose the "Successor Features Keyboard" (SFK), which enables transfer with discovered state-features and task encodings. To enable discovery, we propose the "Categorical Successor Feature Approximator" (CSFA), a novel learning algorithm for estimating SFs while jointly discovering state-features and task encodings. With SFK and CSFA, we achieve the first demonstration of transfer with SFs in a challenging 3D environment where all the necessary representations are discovered. We first compare CSFA against other methods for approximating SFs and show that only CSFA discovers representations compatible with SF&GPI at this scale. We then compare SFK against transfer learning baselines and show that it transfers most quickly to long-horizon tasks.
Keyword: quantization
Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation
Authors: Jiaang Li, Quan Wang, Yi Liu, Licheng Zhang, Zhendong Mao
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Representation Learning on Knowledge Graphs (KGs) is essential for downstream tasks. The dominant approach, KG Embedding (KGE), represents entities with independent vectors and faces the scalability challenge. Recent studies propose an alternative way for parameter efficiency, which represents entities by composing entity-corresponding codewords matched from predefined small-scale codebooks. We refer to the process of obtaining corresponding codewords of each entity as entity quantization, for which previous works have designed complicated strategies. Surprisingly, this paper shows that simple random entity quantization can achieve similar results to current strategies. We analyze this phenomenon and reveal that entity codes, the quantization outcomes for expressing entities, have higher entropy at the code level and Jaccard distance at the codeword level under random entity quantization. Therefore, different entities become more easily distinguished, facilitating effective KG representation. The above results show that current quantization strategies are not critical for KG representation, and there is still room for improvement in entity distinguishability beyond current strategies. The code to reproduce our results is available at https://github.com/JiaangL/RandomQuantization.
Keyword: efficient
Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization
Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages
DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Specialist or Generalist? Instruction Tuning for Specific NLP Tasks
Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network
Algorithm for Invalidation of Cached Results of Queries to a Single Table
Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach
zonoLAB: A MATLAB toolbox for set-based control systems analysis using hybrid zonotopes
ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles
A Review of Economic Incentives for Efficient Operation of Flexible Transmission
Improved Dual-Output Step-Down Soft-Switching Current-Fed Push-Pull DC-DC Converter
PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers' Workflows
Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks
A distributed-memory parallel algorithm for discretized integral equations using Julia
HL-DPoS: An Enhanced Anti-Long-Range Attack DPoS Algorithm
EKGNet: A 10.96μW Fully Analog Neural Network for Intra-Patient Arrhythmia Classification
Empowering Distributed Solutions in Renewable Energy Systems and Grid Optimization
RIS-based IMT-2030 Testbed for MmWave Multi-stream Ultra-massive MIMO Communications
Multi-split configuration design for fluid-based thermal management systems
The Quantum Tortoise and the Classical Hare: A simple framework for understanding which problems quantum computing will accelerate (and which it will not)
Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation
Improving Language Models Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary
Symmetry-preserving graph attention network to solve routing problems at multiple resolutions
Capacity-based Spatial Modulation Constellation and Pre-scaling Design
CONTRASTE: Supervised Contrastive Pre-training With Aspect-based Prompts For Aspect Sentiment Triplet Extraction
Accelerating Split Federated Learning over Wireless Communication Networks
Closed Loop Molecular Communication Testbed: Setup, Interference Analysis, and Experimental Results
Emergent Communication in Interactive Sketch Question Answering
An In-Memory Architecture for High-Performance Long-Read Pre-Alignment Filtering
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Interactive Generalized Additive Model and Its Applications in Electric Load Forecasting
A posteriori error estimates for nonconforming discretizations of singularly perturbed biharmonic operators
Robust Methods for Multiscale Coarse Approximations of Diffusion Models in Perforated Domains
DACOOP-A: Decentralized Adaptive Cooperative Pursuit via Attention
Efficient Online String Matching through Linked Weak Factors
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
SharkGraph: A Time Series Distributed Graph System
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning
Improving generalization in large language models by learning prefix subspaces
Numerical Derivative-based Flexible Integration Algorithm for Power Electronic Systems Simulation Considering Nonlinear Components
A Spline-Based Collocation Method for Stokes and Navier-Stokes equations
Hybridized Formulations of Flux Reconstruction Schemes for Advection-Diffusion Problems
State Sequences Prediction via Fourier Transform for Representation Learning
Beam Design and Signal Enhancement in RIS-Aided Multi-User Communication Systems: A Maximization-Minimization Approach
GO-FEAP: Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning
Online Robust Mean Estimation
An Efficient Method for Realizing Contractions of Access Structures in Cloud Storage
Data-driven Traffic Simulation: A Comprehensive Review
TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories
Finetuning Offline World Models in the Real World
WebWISE: Web Interface Control and Sequential Exploration with Large Language Models
A Unified, Scalable Framework for Neural Population Decoding
Keyword: faster
The Quantum Tortoise and the Classical Hare: A simple framework for understanding which problems quantum computing will accelerate (and which it will not)
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
Hypergraph Motifs and Their Extensions Beyond Binary
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions
E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
Redactable Signature Schemes and Zero-knowledge Proofs: A comparative examination for applications in Decentralized Digital Identity Systems
Keyword: mobile
Containerized Vertical Farming Using Cobots
UI Layout Generation with LLMs Guided by UI Grammar
Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model
Improving Diffusion Models for ECG Imputation with an Augmented Template Prior
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions
A Resilient Framework for 5G-Edge-Connected UAVs based on Switching Edge-MPC and Onboard-PID Control
CDSD: Chinese Dysarthria Speech Database
EquivAct: SIM(3)-Equivariant Visuomotor Policies beyond Rigid Object Manipulation
Keyword: pruning
LXMERT Model Compression for Visual Question Answering
Sparse Bayesian neural networks for regression: Tackling overfitting and computational challenges in uncertainty quantification
E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
Keyword: diffusion
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
Fast and Reliable Generation of EHR Time Series via Diffusion Models
DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts
Semantic-preserving image coding based on Conditional Diffusion models
Improving Diffusion Models for ECG Imputation with an Augmented Template Prior
Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations
Discriminator Guidance for Autoregressive Diffusion Models
A Diffusion Weighted Graph Framework for New Intent Discovery
Hybridized Formulations of Flux Reconstruction Schemes for Advection-Diffusion Problems
Language-driven Scene Synthesis using Multi-conditional Diffusion Model
Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles
CVPR 2023 Text Guided Video Editing Competition
From Posterior Sampling to Meaningful Diversity in Image Restoration
Keyword: adaptive
Adaptive End-to-End Metric Learning for Zero-Shot Cross-Domain Slot Filling
Finite-Time Adaptive Fuzzy Tracking Control for Nonlinear State Constrained Pure-Feedback Systems
SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation
I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation
3D Multi-Target Localization Via Intelligent Reflecting Surface: Protocol and Analysis
Learning Agility and Adaptive Legged Locomotion via Curricular Hindsight Reinforcement Learning
DACOOP-A: Decentralized Adaptive Cooperative Pursuit via Attention
Query-adaptive DETR for Crowded Pedestrian Detection
LiCROM: Linear-Subspace Continuous Reduced Order Modeling with Neural Fields
Combining Behaviors with the Successor Features Keyboard
Keyword: quantization
Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation