Abstract
We present new refinement heuristics for the balanced graph partitioning problem that break with an age-old rule. Traditionally, local search only permits moves that keep the block sizes balanced (below a size constraint). In this work, we demonstrate that admitting large temporary balance violations drastically improves solution quality. The effects are particularly strong on irregular instances such as social networks. Designing efficient implementations of this general idea involves both careful selection of candidates for unconstrained moves as well as algorithms for rebalancing the solution later on. We explore a wide array of design choices to achieve this, in addition to our third goal of high parallel scalability. We present compelling experimental results, demonstrating that our parallel unconstrained local search techniques outperform the prior state of the art by a substantial margin. Compared with four state-of-the-art solvers, our new technique finds 91% of the best solutions on irregular graphs. We achieve a 13.8% improvement in edge cut over the next best competitor, while being only 11.4% slower in the geometric mean.
Chunked Lists versus Extensible Arrays for Text Inversion
Abstract
In our 2017 work on in-memory list-based text inversion [Hawking and Billerbeck. Efficient In-Memory, List-Based Text Inversion. ADCS 2017] we compared memory use and indexing speed of a considerable number of variants of chunked linked lists. In the present work we compare the best performing of those variants (FBB - dynamic Fibonacci chunking) with the extensible SQ array technique (SQA) presented in [Moffat and Mackenzie. Immediate-Access Indexing Using Space-Efficient Extensible Arrays. ADCS 2023].
Efficient Ray Sampling for Radiance Fields Reconstruction
Authors: Shilei Sun, Ming Liu, Zhongyi Fan, Yuxue Liu, Chengwei Lv, Liquan Dong, Lingqin Kong (Beijing Institute of Technology, China)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Accelerating neural radiance fields training is of substantial practical value, as the ray sampling strategy profoundly impacts network convergence. More efficient ray sampling can thus directly enhance existing NeRF models' training efficiency. We therefore propose a novel ray sampling approach for neural radiance fields that improves training efficiency while retaining photorealistic rendering results. First, we analyze the relationship between the pixel loss distribution of sampled rays and rendering quality. This reveals redundancy in the original NeRF's uniform ray sampling. Guided by this finding, we develop a sampling method leveraging pixel regions and depth boundaries. Our main idea is to sample fewer rays in training views, yet with each ray more informative for scene fitting. Sampling probability increases in pixel areas exhibiting significant color and depth variation, greatly reducing wasteful rays from other regions without sacrificing precision. Through this method, not only can the convergence of the network be accelerated, but the spatial geometry of a scene can also be perceived more accurately. Rendering outputs are enhanced, especially for texture-complex regions. Experiments demonstrate that our method significantly outperforms state-of-the-art techniques on public benchmark datasets.
Pure Exploration under Mediators' Feedback
Authors: Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli
Abstract
Stochastic multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a stochastic reward. Within the context of best-arm identification (BAI) problems, the goal of the agent lies in finding the optimal arm, i.e., the one with highest expected reward, as accurately and efficiently as possible. Nevertheless, the sequential interaction protocol of classical BAI problems, where the agent has complete control over the arm being pulled at each round, does not effectively model several decision-making problems of interest (e.g., off-policy learning, partially controllable environments, and human feedback). For this reason, in this work, we propose a novel strict generalization of the classical BAI problem that we refer to as best-arm identification under mediators' feedback (BAI-MF). More specifically, we consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a stochastic and possibly unknown policy. The mediator, then, communicates back to the agent the pulled arm together with the observed reward. In this setting, the agent's goal lies in sequentially choosing which mediator to query to identify with high probability the optimal arm while minimizing the identification time, i.e., the sample complexity. To this end, we first derive and analyze a statistical lower bound on the sample complexity specific to our general mediator feedback scenario. Then, we propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner. As our theory verifies, this algorithm matches the lower bound both almost surely and in expectation. Finally, we extend these results to cases where the mediators' policies are unknown to the learner obtaining comparable results.
Everything Perturbed All at Once: Enabling Differentiable Graph Attacks
Abstract
As powerful tools for representation learning on graphs, graph neural networks (GNNs) have played an important role in applications including social networks, recommendation systems, and online web services. However, GNNs have been shown to be vulnerable to adversarial attacks, which can significantly degrade their effectiveness. Recent state-of-the-art approaches in adversarial attacks rely on gradient-based meta-learning to selectively perturb a single edge with the highest attack score until they reach the budget constraint. While effective in identifying vulnerable links, these methods are plagued by high computational costs. By leveraging continuous relaxation and parameterization of the graph structure, we propose a novel attack method called Differentiable Graph Attack (DGA) to efficiently generate effective attacks and meanwhile eliminate the need for costly retraining. Compared to the state-of-the-art, DGA achieves nearly equivalent attack performance with 6 times less training time and 11 times smaller GPU memory footprint on different benchmark datasets. Additionally, we provide extensive experimental analyses of the transferability of the DGA among different graph models, as well as its robustness against widely-used defense mechanisms.
Streaming, Local, and Multi-Level (Hyper)Graph Decomposition
Abstract
(Hyper)Graph decomposition is a family of problems that aim to break down large (hyper)graphs into smaller sub(hyper)graphs for easier analysis. The importance of this lies in its ability to enable efficient computation on large and complex (hyper)graphs, such as social networks, chemical compounds, and computer networks. This dissertation explores several types of (hyper)graph decomposition problems, including graph partitioning, hypergraph partitioning, local graph clustering, process mapping, and signed graph clustering. Our main focus is on streaming algorithms, local algorithms and multilevel algorithms. In terms of streaming algorithms, we make contributions with highly efficient and effective algorithms for (hyper)graph partitioning and process mapping. In terms of local algorithms, we propose sub-linear algorithms which are effective in detecting high-quality local communities around a given seed node in a graph based on the distribution of a given motif. In terms of multilevel algorithms, we engineer high-quality multilevel algorithms for process mapping and signed graph clustering. We provide a thorough discussion of each algorithm along with experimental results demonstrating their superiority over existing state-of-the-art techniques. The results show that the proposed algorithms achieve improved performance and better solutions in various metrics, making them highly promising for practical applications. Overall, this dissertation showcases the effectiveness of advanced combinatorial algorithmic techniques in solving challenging (hyper)graph decomposition problems.
Flexible Handover with Real-Time Robust Dynamic Grasp Trajectory Generation
Authors: Gu Zhang, Hao-Shu Fang, Hongjie Fang, Cewu Lu
Abstract
In recent years, there has been a significant effort dedicated to developing efficient, robust, and general human-to-robot handover systems. However, the area of flexible handover in the context of complex and continuous objects' motion remains relatively unexplored. In this work, we propose an approach for effective and robust flexible handover, which enables the robot to grasp moving objects with flexible motion trajectories with a high success rate. The key innovation of our approach is the generation of real-time robust grasp trajectories. We also design a future grasp prediction algorithm to enhance the system's adaptability to dynamic handover scenes. We conduct one-motion handover experiments and motion-continuous handover experiments on our novel benchmark that includes 31 diverse household objects. The system we have developed allows users to move and rotate objects in their hands within a relatively large range. The success rate of the robot grasping such moving objects is 78.15% over the entire household object benchmark.
Parameterized and Approximation Algorithms for the Maximum Bimodal Subgraph Problem
Authors: Walter Didimo, Fedor V. Fomin, Petr A. Golovach, Tanmay Inamdar, Stephen Kobourov, Marie Diana Sieper
Abstract
A vertex of a plane digraph is bimodal if all its incoming edges (and hence all its outgoing edges) are consecutive in the cyclic order around it. A plane digraph is bimodal if all its vertices are bimodal. Bimodality is at the heart of many types of graph layouts, such as upward drawings, level-planar drawings, and L-drawings. If the graph is not bimodal, the Maximum Bimodal Subgraph (MBS) problem asks for an embedding-preserving bimodal subgraph with the maximum number of edges. We initiate the study of the MBS problem from the parameterized complexity perspective with two main results: (i) we describe an FPT algorithm parameterized by the branchwidth (and hence by the treewidth) of the graph; (ii) we establish that MBS parameterized by the number of non-bimodal vertices admits a polynomial kernel. As the byproduct of these results, we obtain a subexponential FPT algorithm and an efficient polynomial-time approximation scheme for MBS.
A Task-Parallel Approach for Localized Topological Data Structures
Authors: Guoxi Liu, Federico Iuricich
Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Graphics (cs.GR)
Abstract
Unstructured meshes are characterized by data points irregularly distributed in the Euclidian space. Due to the irregular nature of these data, computing connectivity information between the mesh elements requires much more time and memory than on uniformly distributed data. To lower storage costs, dynamic data structures have been proposed. These data structures compute connectivity information on the fly and discard them when no longer needed. However, on-the-fly computation slows down algorithms and results in a negative impact on the time performance. To address this issue, we propose a new task-parallel approach to proactively compute mesh connectivity. Unlike previous approaches implementing data-parallel models, where all threads run the same type of instructions, our task-parallel approach allows threads to run different functions. Specifically, some threads run the algorithm of choice while other threads compute connectivity information before they are actually needed. The approach was implemented in the new Accelerated Clustered TOPOlogical (ACTOPO) data structure, which can support any processing algorithm requiring mesh connectivity information. Our experiments show that ACTOPO combines the benefits of state-of-the-art memory-efficient (TTK CompactTriangulation) and time-efficient (TTK ExplicitTriangulation) topological data structures. It occupies a similar amount of memory as TTK CompactTriangulation while providing up to 5x speedup. Moreover, it achieves comparable time performance as TTK ExplicitTriangulation while using only half of the memory space.
Clustering Without an Eigengap
Authors: Matthew Zurek, Yudong Chen
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
We study graph clustering in the Stochastic Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters. Previous approaches achieving exact recovery do not allow any small clusters of size $o(\sqrt{n})$, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster. We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes. Mid-sized clusters pose unique challenges to the analysis, since their proximity to the recovery threshold makes them highly sensitive to small noise perturbations and precludes a closed-form candidate solution. We develop novel techniques, including a leave-one-out-style argument which controls the correlation between SDP solutions and noise vectors even when the removal of one row of noise can drastically change the SDP solution. We also develop improved eigenvalue perturbation bounds of potential independent interest. Using our gap-free clustering procedure, we obtain efficient algorithms for the problem of clustering with a faulty oracle with superior query complexities, notably achieving $o(n^2)$ sample complexity even in the presence of a large number of small clusters. Our gap-free clustering procedure also leads to improved algorithms for recursive clustering. Our results extend to certain heterogeneous probability settings that are challenging for alternative algorithms.
AskIt: Unified Programming Interface for Programming with Large Language Models
Authors: Katsumi Okuda, Saman Amarasinghe
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract
In the evolving landscape of software development, Large Language Models (LLMs) exhibit a unique phenomenon known as emergent abilities, demonstrating adeptness across numerous tasks, from text summarization to code generation. While these abilities open up novel avenues in software design and crafting, their incorporation presents substantial challenges. Developers grapple with decisions surrounding the direct embedding of LLMs within applications versus employing them for code generation. Moreover, effective prompt design becomes a critical concern, given the necessity of data extraction from natural language outputs. To address these intricacies, this paper introduces AskIt, a domain-specific language (DSL) specifically designed for LLMs. AskIt simplifies LLM integration, offering type-guided output control, template-based function definitions, and a unified interface that diminishes the distinction between LLM-based code generation and application integration. Furthermore, through Programming by Example (PBE), AskIt harnesses the power of few-shot learning at the programming language level. Our evaluations underscore AskIt's potency. Across 50 tasks, AskIt generated concise prompts for the given tasks, achieving a 16.14% reduction in prompt length relative to benchmarks. Additionally, by enabling the transition from direct LLM application usage to function generation, AskIt achieved significant speedups, as observed in our GSM8K benchmark experiments. Through these advancements, AskIt streamlines the integration of LLMs in software development, offering a more efficient, versatile approach for leveraging emergent abilities. The implementations of AskIt in TypeScript and Python are available at https://github.com/katsumiok/ts-askit and https://github.com/katsumiok/pyaskit, respectively.
Ensuring User-side Fairness in Dynamic Recommender Systems
Abstract
User-side group fairness is crucial for modern recommender systems, as it aims to alleviate performance disparity between groups of users defined by sensitive attributes such as gender, race, or age. We find that the disparity tends to persist or even increase over time. This calls for effective ways to address user-side fairness in a dynamic environment, which has been infrequently explored in the literature. However, fairness-constrained re-ranking, a typical method to ensure user-side fairness (i.e., reducing performance disparity), faces two fundamental challenges in the dynamic setting: (1) non-differentiability of the ranking-based fairness constraint, which hinders the end-to-end training paradigm, and (2) time-inefficiency, which impedes quick adaptation to changes in user preferences. In this paper, we propose FAir Dynamic rEcommender (FADE), an end-to-end framework with fine-tuning strategy to dynamically alleviate performance disparity. To tackle the above challenges, FADE uses a novel fairness loss designed to be differentiable and lightweight to fine-tune model parameters to ensure both user-side fairness and high-quality recommendations. Via extensive experiments on the real-world dataset, we empirically demonstrate that FADE effectively and efficiently reduces performance disparity, and furthermore, FADE improves overall recommendation quality over time compared to not using any new data.
Deep Reinforcement Learning Based Framework for Mobile Energy Disseminator Dispatching to Charge On-the-Road Electric Vehicles
Authors: Jiaming Wang, Jiqian Dong, Sikai Chen, Shreyas Sundaram, Samuel Labi
Abstract
The exponential growth of electric vehicles (EVs) presents novel challenges in preserving battery health and in addressing the persistent problem of vehicle range anxiety. To address these concerns, wireless charging, particularly, Mobile Energy Disseminators (MEDs) have emerged as a promising solution. The MED is mounted behind a large vehicle and charges all participating EVs within a radius upstream of it. Unfortuantely, during such V2V charging, the MED and EVs inadvertently form platoons, thereby occupying multiple lanes and impairing overall corridor travel efficiency. In addition, constrained budgets for MED deployment necessitate the development of an effective dispatching strategy to determine optimal timing and locations for introducing the MEDs into traffic. This paper proposes a deep reinforcement learning (DRL) based methodology to develop a vehicle dispatching framework. In the first component of the framework, we develop a realistic reinforcement learning environment termed "ChargingEnv" which incorporates a reliable charging simulation system that accounts for common practical issues in wireless charging deployment, specifically, the charging panel misalignment. The second component, the Proximal-Policy Optimization (PPO) agent, is trained to control MED dispatching through continuous interactions with ChargingEnv. Numerical experiments were carried out to demonstrate the demonstrate the efficacy of the proposed MED deployment decision processor. The experiment results suggest that the proposed model can significantly enhance EV travel range while efficiently deploying a optimal number of MEDs. The proposed model is found to be not only practical in its applicability but also has promises of real-world effectiveness. The proposed model can help travelers to maximize EV range and help road agencies or private-sector vendors to manage the deployment of MEDs efficiently.
Enabling Cooperative Hybrid Beamforming in TDD-based Distributed MIMO Systems
Authors: Nariman Torkzaban, Amir Khojastepour, John S. Baras
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Abstract
Distributed massive MIMO networks are envisioned to realize cooperative multi-point transmission in next-generation wireless systems. For efficient cooperative hybrid beamforming, the cluster of access points (APs) needs to obtain precise estimates of the uplink channel to perform reliable downlink precoding. However, due to the radio frequency (RF) impairments between the transceivers at the two en-points of the wireless channel, full channel reciprocity does not hold which results in performance degradation in the cooperative hybrid beamforming (CHBF) unless a suitable reciprocity calibration mechanism is in place. We propose a two-step approach to calibrate any two hybrid nodes in the distributed MIMO system. We then present and utilize the novel concept of reciprocal tandem to propose a low-complexity approach for jointly calibrating the cluster of APs and estimating the downlink channel. Finally, we validate our calibration technique's effectiveness through numerical simulation.
Convergence of non-linear diagonal frame filtering for regularizing inverse problems
Abstract
Inverse problems are core issues in several scientific areas, including signal processing and medical imaging. As inverse problems typically suffer from instability with respect to data perturbations, a variety of regularization techniques have been proposed. In particular, the use of filtered diagonal frame decompositions has proven to be effective and computationally efficient. However, the existing convergence analysis applies only to linear filters and a few non-linear filters such as soft thresholding. In this paper, we analyze the filtered diagonal frame decomposition with general non-linear filters. In particular, our results generalize SVD-based spectral filtering from linear to non-linear filters as a special case. We present three strategies to demonstrate convergence. The first two strategies relate non-linear diagonal frame filtering to variational regularization and plug-and-play regularization, respectively. The third strategy allows us to relax the assumptions involved and still obtain a full convergence analysis.
Lower Bound for Independence Covering in $C_4$-Free Graphs
Authors: Michael Kuhn, Daniel Lokshtanov, Zachary Miller
Abstract
An independent set in a graph $G$ is a set $S$ of pairwise non-adjacent vertices in $G$. A family $\mathcal{F}$ of independent sets in $G$ is called a $k$-independence covering family if for every independent set $I$ in $G$ of size at most $k$, there exists an $S \in \mathcal{F}$ such that $I \subseteq S$. Lokshtanov et al. [ACM Transactions on Algorithms, 2018] showed that graphs of degeneracy $d$ admit $k$-independence covering families of size $\binom{k(d+1)}{k} \cdot 2^{o(kd)} \cdot \log n$, and used this result to design efficient parameterized algorithms for a number of problems, including STABLE ODD CYCLE TRANSVERSAL and STABLE MULTICUT. In light of the results of Lokshtanov et al. it is quite natural to ask whether even more general families of graphs admit $k$-independence covering families of size $f(k)n^{O(1)}$. Graphs that exclude a complete bipartite graph $K{d+1,d+1}$ with $d+1$ vertices on both sides as a subgraph, called $K{d+1,d+1}$-free graphs, are a frequently considered generalization of $d$-degenerate graphs. This motivates the question whether $K{d,d}$-free graphs admit $k$-independence covering families of size $f(k,d)n^{O(1)}$. Our main result is a resounding "no" to this question -- specifically we prove that even $K{2,2}$-free graphs (or equivalently $C_4$-free graphs) do not admit $k$-independence covering families of size $f(k)n^{\frac{k}{4}-\epsilon}$.
Towards Earlier Detection of Oral Diseases On Smartphones Using Oral and Dental RGB Images
Authors: Ayush Garg, Julia Lu, Anika Maji
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Oral diseases such as periodontal (gum) diseases and dental caries (cavities) affect billions of people across the world today. However, previous state-of-the-art models have relied on X-ray images to detect oral diseases, making them inaccessible to remote monitoring, developing countries, and telemedicine. To combat this overuse of X-ray imagery, we propose a lightweight machine learning model capable of detecting calculus (also known as hardened plaque or tartar) in RGB images while running efficiently on low-end devices. The model, a modified MobileNetV3-Small neural network transfer learned from ImageNet, achieved an accuracy of 72.73% (which is comparable to state-of-the-art solutions) while still being able to run on mobile devices due to its reduced memory requirements and processing times. A ResNet34-based model was also constructed and achieved an accuracy of 81.82%. Both of these models were tested on a mobile app, demonstrating their potential to limit the number of serious oral disease cases as their predictions can help patients schedule appointments earlier without the need to go to the clinic.
Quantifying and Analyzing Entity-level Memorization in Large Language Models
Authors: Zhenhong Zhou, Jiuyang Xiang, Chaomeng Chen, Sen Su
Abstract
Large language models (LLMs) have been proven capable of memorizing their training data, which can be extracted through specifically designed prompts. As the scale of datasets continues to grow, privacy risks arising from memorization have attracted increasing attention. Quantifying language model memorization helps evaluate potential privacy risks. However, prior works on quantifying memorization require access to the precise original data or incur substantial computational overhead, making it difficult for applications in real-world language models. To this end, we propose a fine-grained, entity-level definition to quantify memorization with conditions and metrics closer to real-world scenarios. In addition, we also present an approach for efficiently extracting sensitive entities from autoregressive language models. We conduct extensive experiments based on the proposed, probing language models' ability to reconstruct sensitive entities under different settings. We find that language models have strong memorization at the entity level and are able to reproduce the training data even with partial leakages. The results demonstrate that LLMs not only memorize their training data but also understand associations between entities. These findings necessitate that trainers of LLMs exercise greater prudence regarding model memorization, adopting memorization mitigation techniques to preclude privacy violations.
Computing Geodesic Paths Encoding a Curvature Prior
Authors: Da Chen, Jean-Marie Mirebeau, Minglei Shu, Laurent D. Cohen
Abstract
In this paper, we introduce an efficient method for computing curves minimizing a variant of the Euler-Mumford elastica energy, with fixed endpoints and tangents at these endpoints, where the bending energy is enhanced with a user defined and data-driven scalar-valued term referred to as the curvature prior. In order to guarantee that the globally optimal curve is extracted, the proposed method involves the numerical computation of the viscosity solution to a specific static Hamilton-Jacobi-Bellman (HJB) partial differential equation (PDE). For that purpose, we derive the explicit Hamiltonian associated to this variant model equipped with a curvature prior, discretize the resulting HJB PDE using an adaptive finite difference scheme, and solve it in a single pass using a generalized Fast-Marching method. In addition, we also present a practical method for estimating the curvature prior values from image data, designed for the task of accurately tracking curvilinear structure centerlines. Numerical experiments on synthetic and real image data illustrate the advantages of the considered variant of the elastica model with a prior curvature enhancement in complex scenarios where challenging geometric structures appear.
Drone-NeRF: Efficient NeRF Based 3D Scene Reconstruction for Large-Scale Drone Survey
Authors: Zhihao Jia, Bing Wang, Changhao Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Neural rendering has garnered substantial attention owing to its capacity for creating realistic 3D scenes. However, its applicability to extensive scenes remains challenging, with limitations in effectiveness. In this work, we propose the Drone-NeRF framework to enhance the efficient reconstruction of unbounded large-scale scenes suited for drone oblique photography using Neural Radiance Fields (NeRF). Our approach involves dividing the scene into uniform sub-blocks based on camera position and depth visibility. Sub-scenes are trained in parallel using NeRF, then merged for a complete scene. We refine the model by optimizing camera poses and guiding NeRF with a uniform sampler. Integrating chosen samples enhances accuracy. A hash-coded fusion MLP accelerates density representation, yielding RGB and Depth outputs. Our framework accounts for sub-scene constraints, reduces parallel-training noise, handles shadow occlusion, and merges sub-regions for a polished rendering result. This Drone-NeRF framework demonstrates promising capabilities in addressing challenges related to scene complexity, rendering efficiency, and accuracy in drone-obtained imagery.
Efficient and Explainable Graph Neural Architecture Search via Monte-Carlo Tree Search
Abstract
Graph neural networks (GNNs) are powerful tools for performing data science tasks in various domains. Although we use GNNs in wide application scenarios, it is a laborious task for researchers and practitioners to design/select optimal GNN rchitectures in diverse graphs. To save human efforts and computational costs, graph neural architecture search (Graph NAS) has been used to search for a sub-optimal GNN architecture that combines existing components. However, there are no existing Graph NAS methods that satisfy explainability, efficiency, and adaptability to various graphs. Therefore, we propose an efficient and explainable Graph NAS method, called ExGNAS, which consists of (i) a simple search space that can adapt to various graphs and (ii) a search algorithm that makes the decision process explainable. The search space includes only fundamental functions that can handle homophilic and heterophilic graphs. The search algorithm efficiently searches for the best GNN architecture via Monte-Carlo tree search without neural models. The combination of our search space and algorithm achieves finding accurate GNN models and the important functions within the search space. We comprehensively evaluate our method compared with twelve hand-crafted GNN architectures and three Graph NAS methods in four graphs. Our experimental results show that ExGNAS increases AUC up to 3.6 and reduces run time up to 78\% compared with the state-of-the-art Graph NAS methods. Furthermore, we show ExGNAS is effective in analyzing the difference between GNN architectures in homophilic and heterophilic graphs.
Random Shortening of Linear Codes and Applications
Authors: Xue Chen, Kuan Cheng, Xin Li, Songtao Mao
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)
Abstract
Random linear codes (RLCs) are well known to have nice combinatorial properties and near-optimal parameters in many different settings. However, getting explicit constructions matching the parameters of RLCs is challenging, and RLCs are hard to decode efficiently. This motivated several previous works to study the problem of partially derandomizing RLCs, by applying certain operations to an explicit mother code. Among them, one of the most well studied operations is random puncturing, where a series of works culminated in the work of Guruswami and Mosheiff (FOCS' 22), which showed that a random puncturing of a low-biased code is likely to possess almost all interesting local properties of RLCs. In this work, we provide an in-depth study of another, dual operation of random puncturing, known as random shortening, which can be viewed equivalently as random puncturing on the dual code. Our main results show that for any small $\varepsilon$, by starting from a mother code with certain weaker conditions (e.g., having a large distance) and performing a random (or even pseudorandom) shortening, the new code is $\varepsilon$-biased with high probability. Our results hold for any field size and yield a shortened code with constant rate. This can be viewed as a complement to random puncturing, and together, we can obtain codes with properties like RLCs from weaker initial conditions. Our proofs involve several non-trivial methods of estimating the weight distribution of codewords, which may be of independent interest.
Reimagining Sense Amplifiers: Harnessing Phase Transition Materials for Current and Voltage Sensing
Authors: Md Mazharul Islam, Shamiul Alam, Mohammad Adnan Jahangir, Garrett S. Rose, Suman Datta, Vijaykrishnan Narayanan, Sumeet Kumar Gupta, Ahmedullah Aziz
Abstract
Energy-efficient sense amplifier (SA) circuits are essential for reliable detection of stored memory states in emerging memory systems. In this work, we present four novel sense amplifier (SA) topologies based on phase transition material (PTM) tailored for non-volatile memory applications. We utilize the abrupt switching and volatile hysteretic characteristics of PTMs which enables efficient and fast sensing operation in our proposed SA topologies. We provide comprehensive details of their functionality and assess how process variations impact their performance metrics. Our proposed sense amplifier topologies manifest notable performance enhancement. We achieve a ~67% reduction in sensing delay and a ~80% decrease in sensing power for current sensing. For voltage sensing, we achieve a ~75% reduction in sensing delay and a ~33% decrease in sensing power. Moreover, the proposed SA topologies exhibit improved variation robustness compared to conventional SAs. We also scrutinize the dependence of transistor mirroring window and PTM transition voltages on several device parameters to determine the optimum operating conditions and stance of tunability for each of the proposed SA topologies.
Task-Based MoE for Multitask Multilingual Machine Translation
Authors: Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla
Abstract
Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters. Our experiments and analysis show the advantages of our approaches over the dense and canonical MoE models on multi-task multilingual machine translations. With task-specific adapters, our models can additionally generalize to new tasks efficiently.
Securing Blockchain Systems: A Novel Collaborative Learning Framework to Detect Attacks in Transactions and Smart Contracts
Authors: Tran Viet Khoa, Do Hai Son, Chi-Hieu Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Nguyen Linh Trung, Tran Thi Thuy Quynh, Trong-Minh Hoang, Nguyen Viet Ha, Eryk Dutkiewicz
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
With the escalating prevalence of malicious activities exploiting vulnerabilities in blockchain systems, there is an urgent requirement for robust attack detection mechanisms. To address this challenge, this paper presents a novel collaborative learning framework designed to detect attacks in blockchain transactions and smart contracts by analyzing transaction features. Our framework exhibits the capability to classify various types of blockchain attacks, including intricate attacks at the machine code level (e.g., injecting malicious codes to withdraw coins from users unlawfully), which typically necessitate significant time and security expertise to detect. To achieve that, the proposed framework incorporates a unique tool that transforms transaction features into visual representations, facilitating efficient analysis and classification of low-level machine codes. Furthermore, we propose a customized collaborative learning model to enable real-time detection of diverse attack types at distributed mining nodes. In order to create a comprehensive dataset, we deploy a pilot system based on a private Ethereum network and conduct multiple attack scenarios. To the best of our knowledge, our dataset is the most comprehensive and diverse collection of transactions and smart contracts synthesized in a laboratory for cyberattack detection in blockchain systems. Our framework achieves a detection accuracy of approximately 94\% through extensive simulations and real-time experiments with a throughput of over 1,100 transactions per second. These compelling results validate the efficacy of our framework and showcase its adaptability in addressing real-world cyberattack scenarios.
Early Detection of Red Palm Weevil Infestations using Deep Learning Classification of Acoustic Signals
Abstract
The Red Palm Weevil (RPW), also known as the palm weevil, is considered among the world's most damaging insect pests of palms. Current detection techniques include the detection of symptoms of RPW using visual or sound inspection and chemical detection of volatile signatures generated by infested palm trees. However, efficient detection of RPW diseases at an early stage is considered one of the most challenging issues for cultivating date palms. In this paper, an efficient approach to the early detection of RPW is proposed. The proposed approach is based on RPW sound activities being recorded and analyzed. The first step involves the conversion of sound data into images based on a selected set of features. The second step involves the combination of images from the same sound file but computed by different features into a single image. The third step involves the application of different Deep Learning (DL) techniques to classify resulting images into two classes: infested and not infested. Experimental results show good performances of the proposed approach for RPW detection using different DL techniques, namely MobileNetV2, ResNet50V2, ResNet152V2, VGG16, VGG19, DenseNet121, DenseNet201, Xception, and InceptionV3. The proposed approach outperformed existing techniques for public datasets.
Event-Triggered Polynomial Control for Trajectory Tracking of Unicycle Robots
Abstract
This paper proposes an event-triggered polynomial control method for trajectory tracking of unicycle robots. In this control method, each control input signal between two consecutive events is a polynomial. At each event, the coefficients of the polynomial control input are chosen to minimize the error in approximating a continuous-time control signal. We design an event-triggering rule that guarantees uniform ultimate boundedness of the tracking error. We ensure the absence of zeno behavior by showing the existence of a uniform positive lower bound on the inter-event times. We illustrate our results through numerical simulations and experiments. We show that the number of events generated by the proposed controller is significantly less compared to a time-triggered controller and an event-triggered controller based on zero-order hold, while guaranteeing similar tracking performance.
Prompting Vision Language Model with Knowledge from Large Language Model for Knowledge-Based VQA
Authors: Yang Zhou, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
Abstract
Knowledge-based visual question answering is a very challenging and widely concerned task. Previous methods adopts the implicit knowledge in large language models (LLM) to achieve excellent results, but we argue that existing methods may suffer from biasing understanding of the image and insufficient knowledge to solve the problem. In this paper, we propose PROOFREAD -PROmpting vision language model with knOwledge From laRgE lAnguage moDel, a novel, lightweight and efficient kowledge-based VQA framework, which make the vision language model and the large language model cooperate to give full play to their respective strengths and bootstrap each other. In detail, our proposed method uses LLM to obtain knowledge explicitly, uses the vision language model which can see the image to get the knowledge answer, and introduces knowledge perceiver to filter out knowledge that is harmful for getting the correct final answer. Experimental results on two datasets prove the effectiveness of our approach. Our method outperforms all state-of-the-art methods on the A-OKVQA dataset in two settings and also achieves relatively good performance on the OKVQA dataset.
Domain Generalization without Excess Empirical Risk
Abstract
Given data from diverse sets of distinct distributions, domain generalization aims to learn models that generalize to unseen distributions. A common approach is designing a data-driven surrogate penalty to capture generalization and minimize the empirical risk jointly with the penalty. We argue that a significant failure mode of this recipe is an excess risk due to an erroneous penalty or hardness in joint optimization. We present an approach that eliminates this problem. Instead of jointly minimizing empirical risk with the penalty, we minimize the penalty under the constraint of optimality of the empirical risk. This change guarantees that the domain generalization penalty cannot impair optimization of the empirical risk, i.e., in-distribution performance. To solve the proposed optimization problem, we demonstrate an exciting connection to rate-distortion theory and utilize its tools to design an efficient method. Our approach can be applied to any penalty-based domain generalization method, and we demonstrate its effectiveness by applying it to three examplar methods from the literature, showing significant improvements.
Inductive Learning of Declarative Domain-Specific Heuristics for ASP
Authors: Richard Comploi-Taupe (Siemens AG Österreich, Vienna, Austria)
Abstract
Domain-specific heuristics are a crucial technique for the efficient solving of problems that are large or computationally hard. Answer Set Programming (ASP) systems support declarative specifications of domain-specific heuristics to improve solving performance. However, such heuristics must be invented manually so far. Inventing domain-specific heuristics for answer-set programs requires expertise with the domain under consideration and familiarity with ASP syntax, semantics, and solving technology. The process of inventing useful heuristics would highly profit from automatic support. This paper presents a novel approach to the automatic learning of such heuristics. We use Inductive Logic Programming (ILP) to learn declarative domain-specific heuristics from examples stemming from (near-)optimal answer sets of small but representative problem instances. Our experimental results indicate that the learned heuristics can improve solving performance and solution quality when solving larger, harder instances of the same problem.
Deontic Paradoxes in ASP with Weak Constraints
Authors: Christian Hatschka (TU Vienna), Agata Ciabattoni (TU Vienna), Thomas Eiter (TU Vienna)
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Multiagent Systems (cs.MA)
Abstract
The rise of powerful AI technology for a range of applications that are sensitive to legal, social, and ethical norms demands decision-making support in presence of norms and regulations. Normative reasoning is the realm of deontic logics, that are challenged by well-known benchmark problems (deontic paradoxes), and lack efficient computational tools. In this paper, we use Answer Set Programming (ASP) for addressing these shortcomings and showcase how to encode and resolve several well-known deontic paradoxes utilizing weak constraints. By abstracting and generalizing this encoding, we present a methodology for translating normative systems in ASP with weak constraints. This methodology is applied to "ethical" versions of Pac-man, where we obtain a comparable performance with related works, but ethically preferable results.
Towards One-Shot Learning for Text Classification using Inductive Logic Programming
Authors: Ghazal Afroozi Milani (University of Surrey), Daniel Cyrus (University of Surrey), Alireza Tamaddoni-Nezhad (University of Surrey)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Logic in Computer Science (cs.LO)
Abstract
With the ever-increasing potential of AI to perform personalised tasks, it is becoming essential to develop new machine learning techniques which are data-efficient and do not require hundreds or thousands of training data. In this paper, we explore an Inductive Logic Programming approach for one-shot text classification. In particular, we explore the framework of Meta-Interpretive Learning (MIL), along with using common-sense background knowledge extracted from ConceptNet. Results indicate that MIL can learn text classification rules from a small number of training examples. Moreover, the higher complexity of chosen examples, the higher accuracy of the outcome.
Generalizing Level Ranking Constraints for Monotone and Convex Aggregates
Authors: Tomi Janhunen (Tampere University)
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
Abstract
In answer set programming (ASP), answer sets capture solutions to search problems of interest and thus the efficient computation of answer sets is of utmost importance. One viable implementation strategy is provided by translation-based ASP where logic programs are translated into other KR formalisms such as Boolean satisfiability (SAT), SAT modulo theories (SMT), and mixed-integer programming (MIP). Consequently, existing solvers can be harnessed for the computation of answer sets. Many of the existing translations rely on program completion and level rankings to capture the minimality of answer sets and default negation properly. In this work, we take level ranking constraints into reconsideration, aiming at their generalizations to cover aggregate-based extensions of ASP in more systematic way. By applying a number of program transformations, ranking constraints can be rewritten in a general form that preserves the structure of monotone and convex aggregates and thus offers a uniform basis for their incorporation into translation-based ASP. The results open up new possibilities for the implementation of translators and solver pipelines in practice.
Sorting Strategies for Interactive Conflict Resolution in ASP
Abstract
Answer set programs in practice are often subject to change. This can lead to inconsistencies in the modified program due to conflicts between rules which are the results of the derivation of strongly complementary literals. To facilitate the maintenance of consistency in answer set programs, in this paper we continue work on a recently presented framework that implements interactive conflict resolution by extending the bodies of conflicting rules by suitable literals, so-called $\lambda$-extensions. More precisely, we present strategies to choose $\lambda$-extensions that allow for resolving several conflicts at a time in an order that aims at minimizing (cognitive) efforts. In particular, we present a graphical representation of connections between conflicts and their possible solutions. Such a representation can be utilized to efficiently guide the user through the conflict resolution process by displaying conflicts and suggesting solutions in a suitable order.
Data reduction for directed feedback vertex set on graphs without long induced cycles
Authors: Jona Dirks, Enna Gerhard, Mario Grobler, Amer E. Mouawad, Sebastian Siebertz
Abstract
We study reduction rules for Directed Feedback Vertex Set (DFVS) on instances without long cycles. A DFVS instance without cycles longer than $d$ naturally corresponds to an instance of $d$-Hitting Set, however, enumerating all cycles in an $n$-vertex graph and then kernelizing the resulting $d$-Hitting Set instance can be too costly, as already enumerating all cycles can take time $\Omega(n^d)$. We show how to compute a kernel with at most $2^dk^d$ vertices and at most $d^{3d}k^d$ induced cycles of length at most $d$ (which however, cannot be enumerated efficiently), where $k$ is the size of a minimum directed feedback vertex set. We then study classes of graphs whose underlying undirected graphs have bounded expansion or are nowhere dense; these are very general classes of sparse graphs, containing e.g. classes excluding a minor or a topological minor. We prove that for such classes without induced cycles of length greater than $d$ we can compute a kernel with $Od(k)$ and $O{d,\epsilon}(k^{1+\epsilon})$ vertices for any $\epsilon>0$, respectively, in time $Od(n^{O(1)})$ and $O{d,\epsilon}(n^{O(1)})$, respectively. The most restricted classes we consider are strongly connected planar graphs without any (induced or non-induced) long cycles. We show that these have bounded treewidth and hence DFVS on planar graphs without cycles of length greater than $d$ can be solved in time $2^{O(d)}\cdot n^{O(1)}$. We finally present a new data reduction rule for general DFVS and prove that the rule together with a few standard rules subsumes all the rules applied by Bergougnoux et al. to obtain a polynomial kernel for DFVS[FVS], i.e., DFVS parameterized by the feedback vertex set number of the underlying (undirected) graph. We conclude by studying the LP-based approximation of DFVS.
Cyclophobic Reinforcement Learning
Authors: Stefan Sylvius Wagner, Peter Arndt, Jan Robine, Stefan Harmeling
Abstract
In environments with sparse rewards, finding a good inductive bias for exploration is crucial to the agent's success. However, there are two competing goals: novelty search and systematic exploration. While existing approaches such as curiosity-driven exploration find novelty, they sometimes do not systematically explore the whole state space, akin to depth-first-search vs breadth-first-search. In this paper, we propose a new intrinsic reward that is cyclophobic, i.e., it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent's cropped observations we are able to achieve excellent results in the MiniGrid and MiniHack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is more sample efficient than other state of the art methods in a variety of tasks.
IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation
Abstract
In the information age, recommendation systems are vital for efficiently filtering information and identifying user preferences. Online social platforms have enriched these systems by providing valuable auxiliary information. Socially connected users are assumed to share similar preferences, enhancing recommendation accuracy and addressing cold start issues. However, empirical findings challenge the assumption, revealing that certain social connections can actually harm system performance. Our statistical analysis indicates a significant amount of noise in the social network, where many socially connected users do not share common interests. To address this issue, we propose an innovative \underline{I}nterest-aware \underline{D}enoising and \underline{V}iew-guided \underline{T}uning (IDVT) method for the social recommendation. The first ID part effectively denoises social connections. Specifically, the denoising process considers both social network structure and user interaction interests in a global view. Moreover, in this global view, we also integrate denoised social information (social domain) into the propagation of the user-item interactions (collaborative domain) and aggregate user representations from two domains using a gating mechanism. To tackle potential user interest loss and enhance model robustness within the global view, our second VT part introduces two additional views (local view and dropout-enhanced view) for fine-tuning user representations in the global view through contrastive learning. Extensive evaluations on real-world datasets with varying noise ratios demonstrate the superiority of IDVT over state-of-the-art social recommendation methods.
Abstract
Finding dense subgraphs is a core problem in graph mining with many applications in diverse domains. At the same time many real-world networks vary over time, that is, the dataset can be represented as a sequence of graph snapshots. Hence, it is natural to consider the question of finding dense subgraphs in a temporal network that are allowed to vary over time to a certain degree. In this paper, we search for dense subgraphs that have large pairwise Jaccard similarity coefficients. More formally, given a set of graph snapshots and a weight $\lambda$, we find a collection of dense subgraphs such that the sum of densities of the induced subgraphs plus the sum of Jaccard indices, weighted by $\lambda$, is maximized. We prove that this problem is NP-hard. To discover dense subgraphs with good objective value, we present an iterative algorithm which runs in $\mathcal{O}(n^2k^2 + m \log n + k^3 n)$ time per single iteration, and a greedy algorithm which runs in $\mathcal{O}(n^2k^2 + m \log n + k^3 n)$ time, where $k$ is the length of the graph sequence and $n$ and $m$ denote number of nodes and total number of edges respectively. We show experimentally that our algorithms are efficient, they can find ground truth in synthetic datasets and provide interpretable results from real-world datasets. Finally, we present a case study that shows the usefulness of our problem.
Specx: a C++ task-based runtime system for heterogeneous distributed architectures
Authors: Paul Cardosi, Bérenger Bramas
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)
Abstract
Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific computing. Not only does it allow for efficient parallelization across distributed heterogeneous computing nodes, but it also allows for elegant source code structuring by describing hardware-independent algorithms. In this paper, we present Specx, a task-based runtime system written in modern C++. Specx supports distributed heterogeneous computing by simultaneously exploiting CPUs and GPUs (CUDA/HIP) and incorporating communication into the task graph. We describe the specificities of Specx and demonstrate its potential by running parallel applications.
Demo: Integration of Marketplace for the 5G Open RAN Ecosystem
Authors: Tim Farnham, Sajida Gufran, Peizheng Li, Adnan Aijaz
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
The Open RAN API and interface standards facilitate the new ecosystems where distinct hardware and software components are brought together to build 5G systems. Key to this concept is the seamless and efficient integration and monetization process among stakeholders. A marketplace serves as a means to realize this collaborative revenue sharing, eliminating the need for intricate proprietary agreements or contracts between each participant. This demo presents the marketplace strategy emphasizing software integration across diverse deployment settings, utilizing the API-centric integration Platform-as-a-Service (iPaas) model aligned with Open RAN standards.
MerA: Merging Pretrained Adapters For Few-Shot Learning
Authors: Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao
Abstract
Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment costs. Despite this, our preliminary study reveals that even single adapters can outperform Adapterfusion in few-shot learning, urging us to propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Extensive experiments on two PLMs demonstrate that MerA achieves substantial improvements compared to both single adapters and AdapterFusion. To further enhance the capacity of MerA, we also introduce a simple yet effective technique, referred to as the "\textit{same-track}" setting, that merges adapters from the same track of pretraining tasks. With the implementation of the "\textit{same-track}" setting, we observe even more impressive gains, surpassing the performance of both full fine-tuning and adapter tuning by a substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.
AI-powered Fraud Detection in Decentralized Finance: A Project Life Cycle Perspective
Abstract
In recent years, blockchain technology has introduced decentralized finance (DeFi) as an alternative to traditional financial systems. DeFi aims to create a transparent and efficient financial ecosystem using smart contracts and emerging decentralized applications. However, the growing popularity of DeFi has made it a target for fraudulent activities, resulting in losses of billions of dollars due to various types of frauds. To address these issues, researchers have explored the potential of artificial intelligence (AI) approaches to detect such fraudulent activities. Yet, there is a lack of a systematic survey to organize and summarize those existing works and to identify the future research opportunities. In this survey, we provide a systematic taxonomy of various frauds in the DeFi ecosystem, categorized by the different stages of a DeFi project's life cycle: project development, introduction, growth, maturity, and decline. This taxonomy is based on our finding: many frauds have strong correlations in the stage of the DeFi project. According to the taxonomy, we review existing AI-powered detection methods, including statistical modeling, natural language processing and other machine learning techniques, etc. We find that fraud detection in different stages employs distinct types of methods and observe the commendable performance of tree-based and graph-related models in tackling fraud detection tasks. By analyzing the challenges and trends, we present the findings to provide proactive suggestion and guide future research in DeFi fraud detection. We believe that this survey is able to support researchers, practitioners, and regulators in establishing a secure and trustworthy DeFi ecosystem.
Hidden-Role Games: Equilibrium Concepts and Computation
Authors: Luca Carminati, Brian Hu Zhang, Gabriele Farina, Nicola Gatti, Tuomas Sandholm
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
In this paper, we study the class of games known as hidden-role games in which players get privately assigned a team and are faced with the challenge of recognizing and cooperating with teammates. This model includes both popular recreational games such as the Mafia/Werewolf family and The Resistance (Avalon) and real-world security settings, where a distributed system wants to operate while some of its nodes are controlled by adversaries. There has been little to no formal mathematical grounding of such settings in the literature, and it is not even immediately clear what the right solution concept is. In particular, the suitable notion of equilibrium depends on communication available to the players (whether players can communicate, whether they can communicate in private, and whether they can observe who is communicating), and defining it turns out to be a nontrivial task with several surprising consequences. We show that in certain cases, including the above recreational games, near-optimal equilibria can be computed efficiently. In most other cases, we show that computing an optimal equilibrium is either NP-hard or coNP-hard. Lastly, we experimentally validate our approach by computing nearly-exact equilibria for complete Avalon instances up to 6 players whose size in terms of number of information sets is larger than $10^{56}$.
Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs
Authors: Jiani Liu, Qinghua Tao, Ce Zhu, Yipeng Liu, Xiaolin Huang, Johan A.K. Suykens
Abstract
Multitask learning (MTL) leverages task-relatedness to enhance performance. With the emergence of multimodal data, tasks can now be referenced by multiple indices. In this paper, we employ high-order tensors, with each mode corresponding to a task index, to naturally represent tasks referenced by multiple indices and preserve their structural relations. Based on this representation, we propose a general framework of low-rank MTL methods with tensorized support vector machines (SVMs) and least square support vector machines (LSSVMs), where the CP factorization is deployed over the coefficient tensor. Our approach allows to model the task relation through a linear combination of shared factors weighted by task-specific factors and is generalized to both classification and regression problems. Through the alternating optimization scheme and the Lagrangian function, each subproblem is transformed into a convex problem, formulated as a quadratic programming or linear system in the dual form. In contrast to previous MTL frameworks, our decision function in the dual induces a weighted kernel function with a task-coupling term characterized by the similarities of the task-specific factors, better revealing the explicit relations across tasks in MTL. Experimental results validate the effectiveness and superiority of our proposed methods compared to existing state-of-the-art approaches in MTL. The code of implementation will be available at https://github.com/liujiani0216/TSVM-MTL.
P2M: A Fast Solver for Querying Distance from Point to Mesh Surface
Abstract
Most of the existing point-to-mesh distance query solvers, such as Proximity Query Package (PQP), Embree and Fast Closest Point Query (FCPW), are based on bounding volume hierarchy (BVH). The hierarchical organizational structure enables one to eliminate the vast majority of triangles that do not help find the closest point. In this paper, we develop a totally different algorithmic paradigm, named P2M, to speed up point-to-mesh distance queries. Our original intention is to precompute a KD tree (KDT) of mesh vertices to approximately encode the geometry of a mesh surface containing vertices, edges and faces. However, it is very likely that the closest primitive to the query point is an edge e (resp., a face f), but the KDT reports a mesh vertex \u{psion} instead. We call \u{psion} an interceptor of e (resp., f). The main contribution of this paper is to invent a simple yet effective interception inspection rule and an efficient flooding interception inspection algorithm for quickly finding out all the interception pairs. Once the KDT and the interception table are precomputed, the query stage proceeds by first searching the KDT and then looking up the interception table to retrieve the closest geometric primitive. Statistics show that our query algorithm runs many times faster than the state-of-the-art solvers.
Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation
Abstract
Few-shot image generation, which aims to produce plausible and diverse images for one category given a few images from this category, has drawn extensive attention. Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients. However, such an intuitive combination of images/features only exploits the most relevant information for generation, leading to poor diversity and coarse-grained semantic fusion. To remedy this, this paper proposes a novel textural modulation (TexMod) mechanism to inject external semantic signals into internal local representations. Parameterized by the feedback from the discriminator, our TexMod enables more fined-grained semantic injection while maintaining the synthesis fidelity. Moreover, a global structural discriminator (StructD) is developed to explicitly guide the model to generate images with reasonable layout and outline. Furthermore, the frequency awareness of the model is reinforced by encouraging the model to distinguish frequency signals. Together with these techniques, we build a novel and effective model for few-shot image generation. The effectiveness of our model is identified by extensive experiments on three popular datasets and various settings. Besides achieving state-of-the-art synthesis performance on these datasets, our proposed techniques could be seamlessly integrated into existing models for a further performance boost.
Near-Field 3D Localization via MIMO Radar: Cramér-Rao Bound Analysis and Estimator Design
Authors: Haocheng Hua, Jie Xu, Yonina C. Eldar
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This paper studies a near-field multiple-input multiple-output (MIMO) radar sensing system, in which the transceivers with massive antennas aim to localize multiple near-field targets in the three-dimensional (3D) space over unknown cluttered environments. We consider a spherical wavefront propagation with both channel phase and amplitude variations over different antennas. Under this setup, the unknown parameters include the 3D coordinates and complex reflection coefficients of the targets, as well as the noise and interference covariance matrix. First, by considering general transmit signal waveforms, we derive the Fisher information matrix (FIM) corresponding to the 3D coordinates and the complex reflection coefficients of the targets and accordingly obtain the Cram\'er-Rao bound (CRB) for the 3D coordinates. This provides a performance bound for 3D near-field target localization. For the special single-target case, we obtain the CRB in an analytical form, and analyze its asymptotic scaling behaviors with respect to the target distance and antenna size of the transceiver. Next, to facilitate practical localization, we propose two estimators to localize targets based on the maximum likelihood (ML) criterion, namely the 3D approximate cyclic optimization (3D-ACO) and the 3D cyclic optimization with white Gaussian noise (3D-CO-WGN), respectively. Numerical results validate the asymptotic CRB analysis and show that the consideration of varying channel amplitudes is vital to achieve accurate CRB and localization when the targets are close to the transceivers. It is also shown that the proposed estimators achieve localization performance close to the derived CRB under various cluttered environments, thus validating their effectiveness in practical implementation. Furthermore, it is shown that transmit waveforms have a significant impact on CRB and the localization performance.
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Authors: Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs are deployed for increasingly complex tasks, they often face the needs to conduct longer reasoning processes or understanding larger contexts. In these situations, the length generalization failure of LLMs on long sequences become more prominent. Most pre-training schemes truncate training sequences to a fixed length (such as 2048 for LLaMa). LLMs often struggle to generate fluent texts, let alone carry out downstream tasks, after longer contexts, even with relative positional encoding which is designed to cope with this problem. Common solutions such as finetuning on longer corpora often involves daunting hardware and time costs and requires careful training process design. To more efficiently leverage the generation capacity of existing LLMs, we theoretically and empirically investigate the main out-of-distribution (OOD) factors contributing to this problem. Inspired by this diagnosis, we propose a simple yet effective solution for on-the-fly length generalization, LM-Infinite, which involves only a $\Lambda$-shaped attention mask and a distance limit while requiring no parameter updates or learning. We find it applicable to a variety of LLMs using relative-position encoding methods. LM-Infinite is computational efficient with $O(n)$ time and space, and demonstrates consistent fluency and generation quality to as long as 32k tokens on ArXiv and OpenWebText2 datasets, with 2.72x decoding speedup. On downstream task such as passkey retrieval, it continues to work on inputs much longer than training lengths where vanilla models fail immediately.
Keyword: faster
Finding Optimal 2-Packing Sets on Arbitrary Graphs at Scale
Authors: Jannick Borowitz, Ernestine Großmann, Christian Schulz, Dominik Schweisgut
Abstract
A 2-packing set for an undirected graph $G=(V,E)$ is a subset $\mathcal{S} \subset V$ such that any two vertices $v_1,v_2 \in \mathcal{S}$ have no common neighbors. Finding a 2-packing set of maximum cardinality is a NP-hard problem. We develop a new approach to solve this problem on arbitrary graphs using its close relation to the independent set problem. Thereby, our algorithm red2pack uses new data reduction rules specific to the 2-packing set problem as well as a graph transformation. Our experiments show that we outperform the state-of-the-art for arbitrary graphs with respect to solution quality and also are able to compute solutions multiple orders of magnitude faster than previously possible. For example, we are able to solve 63% of our graphs to optimality in less than a second while the competitor for arbitrary graphs can only solve 5% of the graphs in the data set to optimality even with a 10 hour time limit. Moreover, our approach can solve a wide range of large instances that have previously been unsolved.
Split Without a Leak: Reducing Privacy Leakage in Split Learning
Authors: Khoa Nguyen, Tanveer Khan, Antonis Michalas
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
The popularity of Deep Learning (DL) makes the privacy of sensitive data more imperative than ever. As a result, various privacy-preserving techniques have been implemented to preserve user data privacy in DL. Among various privacy-preserving techniques, collaborative learning techniques, such as Split Learning (SL) have been utilized to accelerate the learning and prediction process. Initially, SL was considered a promising approach to data privacy. However, subsequent research has demonstrated that SL is susceptible to many types of attacks and, therefore, it cannot serve as a privacy-preserving technique. Meanwhile, countermeasures using a combination of SL and encryption have also been introduced to achieve privacy-preserving deep learning. In this work, we propose a hybrid approach using SL and Homomorphic Encryption (HE). The idea behind it is that the client encrypts the activation map (the output of the split layer between the client and the server) before sending it to the server. Hence, during both forward and backward propagation, the server cannot reconstruct the client's input data from the intermediate activation map. This improvement is important as it reduces privacy leakage compared to other SL-based works, where the server can gain valuable information about the client's input. In addition, on the MIT-BIH dataset, our proposed hybrid approach using SL and HE yields faster training time (about 6 times) and significantly reduced communication overhead (almost 160 times) compared to other HE-based approaches, thereby offering improved privacy protection for sensitive data in DL.
On the Independencies Hidden in the Structure of a Probabilistic Logic Program
Authors: Kilian Rückschloß (Ludwig-Maximilians-Universität München), Felix Weitkämper (Ludwig-Maximilians-Universität München)
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Abstract
Pearl and Verma developed d-separation as a widely used graphical criterion to reason about the conditional independencies that are implied by the causal structure of a Bayesian network. As acyclic ground probabilistic logic programs correspond to Bayesian networks on their dependency graph, we can compute conditional independencies from d-separation in the latter. In the present paper, we generalize the reasoning above to the non-ground case. First, we abstract the notion of a probabilistic logic program away from external databases and probabilities to obtain so-called program structures. We then present a correct meta-interpreter that decides whether a certain conditional independence statement is implied by a program structure on a given external database. Finally, we give a fragment of program structures for which we obtain a completeness statement of our conditional independence oracle. We close with an experimental evaluation of our approach revealing that our meta-interpreter performs significantly faster than checking the definition of independence using exact inference in ProbLog 2.
Finding-Aware Anatomical Tokens for Chest X-Ray Automated Reporting
Authors: Francesco Dalla Serra, Chaoyang Wang, Fani Deligianni, Jeffrey Dalton, Alison Q. O'Neil
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Abstract
The task of radiology reporting comprises describing and interpreting the medical findings in radiographic images, including description of their location and appearance. Automated approaches to radiology reporting require the image to be encoded into a suitable token representation for input to the language model. Previous methods commonly use convolutional neural networks to encode an image into a series of image-level feature map representations. However, the generated reports often exhibit realistic style but imperfect accuracy. Inspired by recent works for image captioning in the general domain in which each visual token corresponds to an object detected in an image, we investigate whether using local tokens corresponding to anatomical structures can improve the quality of the generated reports. We introduce a novel adaptation of Faster R-CNN in which finding detection is performed for the candidate bounding boxes extracted during anatomical structure localisation. We use the resulting bounding box feature representations as our set of finding-aware anatomical tokens. This encourages the extracted anatomical tokens to be informative about the findings they contain (required for the final task of radiology reporting). Evaluating on the MIMIC-CXR dataset of chest X-Ray images, we show that task-aware anatomical tokens give state-of-the-art performance when integrated into an automated reporting pipeline, yielding generated reports with improved clinical accuracy.
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
Authors: Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz
Abstract
For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration. We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes.
P2M: A Fast Solver for Querying Distance from Point to Mesh Surface
Abstract
Most of the existing point-to-mesh distance query solvers, such as Proximity Query Package (PQP), Embree and Fast Closest Point Query (FCPW), are based on bounding volume hierarchy (BVH). The hierarchical organizational structure enables one to eliminate the vast majority of triangles that do not help find the closest point. In this paper, we develop a totally different algorithmic paradigm, named P2M, to speed up point-to-mesh distance queries. Our original intention is to precompute a KD tree (KDT) of mesh vertices to approximately encode the geometry of a mesh surface containing vertices, edges and faces. However, it is very likely that the closest primitive to the query point is an edge e (resp., a face f), but the KDT reports a mesh vertex \u{psion} instead. We call \u{psion} an interceptor of e (resp., f). The main contribution of this paper is to invent a simple yet effective interception inspection rule and an efficient flooding interception inspection algorithm for quickly finding out all the interception pairs. Once the KDT and the interception table are precomputed, the query stage proceeds by first searching the KDT and then looking up the interception table to retrieve the closest geometric primitive. Statistics show that our query algorithm runs many times faster than the state-of-the-art solvers.
Keyword: mobile
Generative AI for Semantic Communication: Architecture, Challenges, and Outlook
Authors: Le Xia, Yao Sun, Chengsi Liang, Lei Zhang, Muhammad Ali Imran, Dusit Niyato
Subjects: Networking and Internet Architecture (cs.NI); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Abstract
Semantic communication (SemCom) is expected to be a core paradigm in future communication networks, yielding significant benefits in terms of spectrum resource saving and information interaction efficiency. However, the existing SemCom structure is limited by the lack of context-reasoning ability and background knowledge provisioning, which, therefore, motivates us to seek the potential of incorporating generative artificial intelligence (GAI) technologies with SemCom. Recognizing GAI's powerful capability in automating and creating valuable, diverse, and personalized multimodal content, this article first highlights the principal characteristics of the combination of GAI and SemCom along with their pertinent benefits and challenges. To tackle these challenges, we further propose a novel GAI-assisted SemCom network (GAI-SCN) framework in a cloud-edge-mobile design. Specifically, by employing global and local GAI models, our GAI-SCN enables multimodal semantic content provisioning, semantic-level joint-source-channel coding, and AIGC acquisition to maximize the efficiency and reliability of semantic reasoning and resource utilization. Afterward, we present a detailed implementation workflow of GAI-SCN, followed by corresponding initial simulations for performance evaluation in comparison with two benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of GAI-SCN.
Deep Reinforcement Learning Based Framework for Mobile Energy Disseminator Dispatching to Charge On-the-Road Electric Vehicles
Authors: Jiaming Wang, Jiqian Dong, Sikai Chen, Shreyas Sundaram, Samuel Labi
Abstract
The exponential growth of electric vehicles (EVs) presents novel challenges in preserving battery health and in addressing the persistent problem of vehicle range anxiety. To address these concerns, wireless charging, particularly, Mobile Energy Disseminators (MEDs) have emerged as a promising solution. The MED is mounted behind a large vehicle and charges all participating EVs within a radius upstream of it. Unfortuantely, during such V2V charging, the MED and EVs inadvertently form platoons, thereby occupying multiple lanes and impairing overall corridor travel efficiency. In addition, constrained budgets for MED deployment necessitate the development of an effective dispatching strategy to determine optimal timing and locations for introducing the MEDs into traffic. This paper proposes a deep reinforcement learning (DRL) based methodology to develop a vehicle dispatching framework. In the first component of the framework, we develop a realistic reinforcement learning environment termed "ChargingEnv" which incorporates a reliable charging simulation system that accounts for common practical issues in wireless charging deployment, specifically, the charging panel misalignment. The second component, the Proximal-Policy Optimization (PPO) agent, is trained to control MED dispatching through continuous interactions with ChargingEnv. Numerical experiments were carried out to demonstrate the demonstrate the efficacy of the proposed MED deployment decision processor. The experiment results suggest that the proposed model can significantly enhance EV travel range while efficiently deploying a optimal number of MEDs. The proposed model is found to be not only practical in its applicability but also has promises of real-world effectiveness. The proposed model can help travelers to maximize EV range and help road agencies or private-sector vendors to manage the deployment of MEDs efficiently.
Towards Earlier Detection of Oral Diseases On Smartphones Using Oral and Dental RGB Images
Authors: Ayush Garg, Julia Lu, Anika Maji
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Oral diseases such as periodontal (gum) diseases and dental caries (cavities) affect billions of people across the world today. However, previous state-of-the-art models have relied on X-ray images to detect oral diseases, making them inaccessible to remote monitoring, developing countries, and telemedicine. To combat this overuse of X-ray imagery, we propose a lightweight machine learning model capable of detecting calculus (also known as hardened plaque or tartar) in RGB images while running efficiently on low-end devices. The model, a modified MobileNetV3-Small neural network transfer learned from ImageNet, achieved an accuracy of 72.73% (which is comparable to state-of-the-art solutions) while still being able to run on mobile devices due to its reduced memory requirements and processing times. A ResNet34-based model was also constructed and achieved an accuracy of 81.82%. Both of these models were tested on a mobile app, demonstrating their potential to limit the number of serious oral disease cases as their predictions can help patients schedule appointments earlier without the need to go to the clinic.
Early Detection of Red Palm Weevil Infestations using Deep Learning Classification of Acoustic Signals
Abstract
The Red Palm Weevil (RPW), also known as the palm weevil, is considered among the world's most damaging insect pests of palms. Current detection techniques include the detection of symptoms of RPW using visual or sound inspection and chemical detection of volatile signatures generated by infested palm trees. However, efficient detection of RPW diseases at an early stage is considered one of the most challenging issues for cultivating date palms. In this paper, an efficient approach to the early detection of RPW is proposed. The proposed approach is based on RPW sound activities being recorded and analyzed. The first step involves the conversion of sound data into images based on a selected set of features. The second step involves the combination of images from the same sound file but computed by different features into a single image. The third step involves the application of different Deep Learning (DL) techniques to classify resulting images into two classes: infested and not infested. Experimental results show good performances of the proposed approach for RPW detection using different DL techniques, namely MobileNetV2, ResNet50V2, ResNet152V2, VGG16, VGG19, DenseNet121, DenseNet201, Xception, and InceptionV3. The proposed approach outperformed existing techniques for public datasets.
Sparse Waypoint Validity Checking for Self-Entanglement-Free Tethered Path Planning
Authors: Tong Yang, Jiangpin Liu, Yue Wang, Rong Xiong
Abstract
A novel mechanism to derive self-entanglement-free (SEF) path for tethered differential-driven robots is proposed in this work. The problem is tailored to the deployment of tethered differential-driven robots in situations where an omni-directional tether re-tractor is not available. This is frequently encountered when it is impractical to concurrently equip an omni-directional tether retracting mechanism with other geometrically intricate devices, such as a manipulator, which is notably relevant in applications like disaster recovery, spatial exploration, etc. Without specific attention to the spatial relation between the shape of the tether and the pose of the mobile unit, the issue of self-entanglement arises when the robot moves, resulting in unsafe robot movements and the risk of damaging the tether. In this paper, the SEF constraint is first formulated as the boundedness of a relative angle function which characterises the angular difference between the tether stretching direction and the robot's heading direction. Then, a constrained searching-based path planning algorithm is proposed which produces a path that is sub-optimal whilst ensuring the avoidance of tether self-entanglement. Finally, the algorithmic efficiency of the proposed path planner is further enhanced by proving the conditioned sparsity of the primitive path validity checking module. The effectiveness of the proposed algorithm is assessed through case studies, comparing its performance against untethered differential-driven planners in challenging planning scenarios. A comparative analysis is further conducted between the normal node expansion module and the improved node expansion module which incorporates sparse waypoint validity checking. Real-world tests are also conducted to validate the algorithm's performance. An open-source implementation has also made available for the benefit of the robotics community.
Specx: a C++ task-based runtime system for heterogeneous distributed architectures
Authors: Paul Cardosi, Bérenger Bramas
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)
Abstract
Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific computing. Not only does it allow for efficient parallelization across distributed heterogeneous computing nodes, but it also allows for elegant source code structuring by describing hardware-independent algorithms. In this paper, we present Specx, a task-based runtime system written in modern C++. Specx supports distributed heterogeneous computing by simultaneously exploiting CPUs and GPUs (CUDA/HIP) and incorporating communication into the task graph. We describe the specificities of Specx and demonstrate its potential by running parallel applications.
Breaking the Interference and Fading Gridlock in Backscatter Communications: State-of-the-Art, Design Challenges, and Future Directions
Abstract
With the rapid advancement of the Internet of Things (IoT) and mobile communication technologies, a multitude of devices are becoming interconnected, marking the onset of an era where all things are connected. While this growth opens opportunities for novel products and applications, it also leads to increased energy demand and battery reliance in IoT devices, creating a significant bottleneck that hinders sustainable progress. Traditional energy harvesting (EH) techniques, although promising, face limitations such as insufficient efficiency, high costs, and practical constraints that impede widespread adoption. Backscatter communication (BackCom), a low-power and passive method, emerges as a promising solution to these challenges, directly addressing stranded energy impasse by reducing manufacturing costs and energy consumption in IoT devices. We perform an in-depth analysis of three primary BackCom architectures: Monostatic, Bistatic, and Ambient BackComs. In our exploration, we identify fundamental challenges, such as complex interference environments (including the direct-link interference and the mutual interference among tags) and double-path fading, which contribute to suboptimal performance in BackCom systems. This review aims to furnish a comprehensive examination of existing solutions designed to combat complex interference environments and double-path fading, offering insightful analysis and comparison to select effective strategies to address these challenges. We also delve into emerging trends and challenges in BackCom, forecasting potential paths for technological advancement and providing insights into navigating the intricate landscape of future communication needs. Our work provides researchers and engineers with a clear and comprehensive perspective, enabling them to better understand and creatively tackle the ongoing challenges in BackCom systems.
Rate-Splitting for CF Massive MIMO Systems With Channel Aging
Authors: Jiakang Zheng, Jiayi Zhang, Hongyang Du, Dusit Niyato, Dong In Kim, Bo Ai
Abstract
The cell-free (CF) massive multiple-input multiple-output (MIMO) system is considered a cutting-edge technology in next-generation mobile communication due to its ability to provide high-performance coverage seamlessly and uniformly. This paper aims to mitigate the negative impact of channel aging due to the movement of user equipment in CF massive MIMO systems by utilizing rate-splitting (RS) technology. Taking into account the outdated channel state information, we obtain the achievable sum spectral efficiency (SE) of the RS-assisted CF massive MIMO system, where the private messages can directly adopt conventional maximum ratio, local minimum mean square error (MMSE), and centralized MMSE precoding schemes. Moreover, we propose a bisection-based precoding scheme that maximizes the minimum SE of common messages, which outperforms the superposition-based and random precoding schemes and exhibits strong robustness in complex mobile scenarios. Furthermore, we derive a novel closed-form sum SE expression for the considered system. The results demonstrate that RS technology can mitigate interference in mobile CF massive MIMO systems, improving overall system performance.
Keyword: pruning
There is no result
Keyword: diffusion
Intriguing Properties of Diffusion Models: A Large-Scale Dataset for Evaluating Natural Attack Capability in Text-to-Image Generative Models
Abstract
Denoising probabilistic diffusion models have shown breakthrough performance that can generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is indeed a double-edged sword: We identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), by text prompts. The NDD attack can generate low-cost, model-agnostic, and transferrable adversarial attacks by exploiting the natural attack capability in diffusion models. Motivated by the finding, we construct a large-scale dataset, Natural Denoising Diffusion Attack (NDDA) dataset, to systematically evaluate the risk of the natural attack capability of diffusion models with state-of-the-art text-to-image diffusion models. We evaluate the natural attack capability by answering 6 research questions. Through a user study to confirm the validity of the NDD attack, we find that the NDD attack can achieve an 88% detection rate while being stealthy to 93% of human subjects. We also find that the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferrable attack capability, we perform the NDD attack against an AD vehicle and find that 73% of the physically printed attacks can be detected as a stop sign. We hope that our study and dataset can help our community to be aware of the risk of diffusion models and facilitate further research toward robust DNN models.
Density Stabilization Strategies for Nonholonomic Agents on Compact Manifolds
Authors: Karthik Elamvazhuthi, Spring Berman
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
In this article, we consider the problem of stabilizing a class of degenerate stochastic processes, which are constrained to a bounded Euclidean domain or a compact smooth manifold, to a given target probability density. Most existing works on modeling and control of robotic swarms that use partial differential equation (PDE) models assume that the robots' dynamics are holonomic, and hence, the associated stochastic processes have generators that are elliptic. We relax this assumption on the ellipticity of the generator of the stochastic processes, and consider the more practical case of the stabilization problem for a swarm of agents whose dynamics are given by a controllable driftless control-affine system. We construct state-feedback control laws that exponentially stabilize a swarm of nonholonomic agents to a target probability density that is sufficiently regular. State-feedback laws can stabilize a swarm only to target probability densities that are positive everywhere. To stabilize the swarm to probability densities that possibly have disconnected supports, we introduce a semilinear PDE model of a collection of interacting agents governed by a hybrid switching diffusion process. The interaction between the agents is modeled using a (mean-field) feedback law that is a function of the local density of the swarm, with the switching parameters as the control inputs. We show that under the action of this feedback law, the semilinear PDE system is globally asymptotically stable about the given target probability density. The stabilization strategies with and without agent interactions are verified numerically for agents that evolve according to the Brockett integrator; the strategy with interactions is additionally verified for agents that evolve according to an underactuated system on the sphere $S^2$.
Zero-shot Inversion Process for Image Attribute Editing with Diffusion Models
Authors: Zhanbo Feng, Zenan Ling, Ci Gong, Feng Zhou, Jie Li, Robert C. Qiu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Denoising diffusion models have shown outstanding performance in image editing. Existing works tend to use either image-guided methods, which provide a visual reference but lack control over semantic coherence, or text-guided methods, which ensure faithfulness to text guidance but lack visual quality. To address the problem, we propose the Zero-shot Inversion Process (ZIP), a framework that injects a fusion of generated visual reference and text guidance into the semantic latent space of a \textit{frozen} pre-trained diffusion model. Only using a tiny neural network, the proposed ZIP produces diverse content and attributes under the intuitive control of the text prompt. Moreover, ZIP shows remarkable robustness for both in-domain and out-of-domain attribute manipulation on real images. We perform detailed experiments on various benchmark datasets. Compared to state-of-the-art methods, ZIP produces images of equivalent quality while providing a realistic editing effect.
Feature Attention Network (FA-Net): A Deep-Learning Based Approach for Underwater Single Image Enhancement
Authors: Muhammad Hamza (1), Ammar Hawbani (1), Sami Ul Rehman (1), Xingfu Wang (1), Liang Zhao (2) ((1) Computer Science and Technology, University of Science and Technology of China, (2) School of Computer Science, Shenyang Aerospace University)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Underwater image processing and analysis have been a hotspot of study in recent years, as more emphasis has been focused to underwater monitoring and usage of marine resources. Compared with the open environment, underwater image encountered with more complicated conditions such as light abortion, scattering, turbulence, nonuniform illumination and color diffusion. Although considerable advances and enhancement techniques achieved in resolving these issues, they treat low-frequency information equally across the entire channel, which results in limiting the network's representativeness. We propose a deep learning and feature-attention-based end-to-end network (FA-Net) to solve this problem. In particular, we propose a Residual Feature Attention Block (RFAB), containing the channel attention, pixel attention, and residual learning mechanism with long and short skip connections. RFAB allows the network to focus on learning high-frequency information while skipping low-frequency information on multi-hop connections. The channel and pixel attention mechanism considers each channel's different features and the uneven distribution of haze over different pixels in the image. The experimental results shows that the FA-Net propose by us provides higher accuracy, quantitatively and qualitatively and superiority to previous state-of-the-art methods.
Physics-Informed DeepMRI: Bridging the Gap from Heat Diffusion to k-Space Interpolation
Abstract
In the field of parallel imaging (PI), alongside image-domain regularization methods, substantial research has been dedicated to exploring $k$-space interpolation. However, the interpretability of these methods remains an unresolved issue. Furthermore, these approaches currently face acceleration limitations that are comparable to those experienced by image-domain methods. In order to enhance interpretability and overcome the acceleration limitations, this paper introduces an interpretable framework that unifies both $k$-space interpolation techniques and image-domain methods, grounded in the physical principles of heat diffusion equations. Building upon this foundational framework, a novel $k$-space interpolation method is proposed. Specifically, we model the process of high-frequency information attenuation in $k$-space as a heat diffusion equation, while the effort to reconstruct high-frequency information from low-frequency regions can be conceptualized as a reverse heat equation. However, solving the reverse heat equation poses a challenging inverse problem. To tackle this challenge, we modify the heat equation to align with the principles of magnetic resonance PI physics and employ the score-based generative method to precisely execute the modified reverse heat diffusion. Finally, experimental validation conducted on publicly available datasets demonstrates the superiority of the proposed approach over traditional $k$-space interpolation methods, deep learning-based $k$-space interpolation methods, and conventional diffusion models in terms of reconstruction accuracy, particularly in high-frequency regions.
DiffuVolume: Diffusion Model for Volume based Stereo Matching
Abstract
Stereo matching is a significant part in many computer vision tasks and driving-based applications. Recently cost volume-based methods have achieved great success benefiting from the rich geometry information in paired images. However, the redundancy of cost volume also interferes with the model training and limits the performance. To construct a more precise cost volume, we pioneeringly apply the diffusion model to stereo matching. Our method, termed DiffuVolume, considers the diffusion model as a cost volume filter, which will recurrently remove the redundant information from the cost volume. Two main designs make our method not trivial. Firstly, to make the diffusion model more adaptive to stereo matching, we eschew the traditional manner of directly adding noise into the image but embed the diffusion model into a task-specific module. In this way, we outperform the traditional diffusion stereo matching method by 22% EPE improvement and 240 times inference acceleration. Secondly, DiffuVolume can be easily embedded into any volume-based stereo matching network with boost performance but slight parameters rise (only 2%). By adding the DiffuVolume into well-performed methods, we outperform all the published methods on Scene Flow, KITTI2012, KITTI2015 benchmarks and zero-shot generalization setting. It is worth mentioning that the proposed model ranks 1st on KITTI 2012 leader board, 2nd on KITTI 2015 leader board since 15, July 2023.
SignDiff: Learning Diffusion Models for American Sign Language Production
Authors: Sen Fang, Chunyu Sui, Xuedong Zhang, Yapeng Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The field of Sign Language Production (SLP) lacked a large-scale, pre-trained model based on deep learning for continuous American Sign Language (ASL) production in the past decade. This limitation hampers communication for all individuals with disabilities relying on ASL. To address this issue, we undertook the secondary development and utilization of How2Sign, one of the largest publicly available ASL datasets. Despite its significance, prior researchers in the field of sign language have not effectively employed this corpus due to the intricacies involved in American Sign Language Production (ASLP). To conduct large-scale ASLP, we propose SignDiff based on the latest work in related fields, which is a dual-condition diffusion pre-training model that can generate human sign language speakers from a skeleton pose. SignDiff has a novel Frame Reinforcement Network called FR-Net, similar to dense human pose estimation work, which enhances the correspondence between text lexical symbols and sign language dense pose frames reduce the occurrence of multiple fingers in the diffusion model. In addition, our ASLP method proposes two new improved modules and a new loss function to improve the accuracy and quality of sign language skeletal posture and enhance the ability of the model to train on large-scale data. We propose the first baseline for ASL production and report the scores of 17.19 and 12.85 on BLEU-4 on the How2Sign dev/test sets. We also evaluated our model on the previous mainstream dataset called PHOENIX14T, and the main experiments achieved the results of SOTA. In addition, our image quality far exceeds all previous results by 10 percentage points on the SSIM indicator. Finally, we conducted ablation studies and qualitative evaluations for discussion.
Keyword: adaptive
DebSDF: Delving into the Details and Bias of Neural Indoor Scene Reconstruction
Abstract
In recent years, the neural implicit surface has emerged as a powerful representation for multi-view surface reconstruction due to its simplicity and state-of-the-art performance. However, reconstructing smooth and detailed surfaces in indoor scenes from multi-view images presents unique challenges. Indoor scenes typically contain large texture-less regions, making the photometric loss unreliable for optimizing the implicit surface. Previous work utilizes monocular geometry priors to improve the reconstruction in indoor scenes. However, monocular priors often contain substantial errors in thin structure regions due to domain gaps and the inherent inconsistencies when derived independently from different views. This paper presents \textbf{DebSDF} to address these challenges, focusing on the utilization of uncertainty in monocular priors and the bias in SDF-based volume rendering. We propose an uncertainty modeling technique that associates larger uncertainties with larger errors in the monocular priors. High-uncertainty priors are then excluded from optimization to prevent bias. This uncertainty measure also informs an importance-guided ray sampling and adaptive smoothness regularization, enhancing the learning of fine structures. We further introduce a bias-aware signed distance function to density transformation that takes into account the curvature and the angle between the view direction and the SDF normals to reconstruct fine details better. Our approach has been validated through extensive experiments on several challenging datasets, demonstrating improved qualitative and quantitative results in reconstructing thin structures in indoor scenes, thereby outperforming previous work.
Unveiling Camouflage: A Learnable Fourier-based Augmentation for Camouflaged Object Detection and Instance Segmentation
Authors: Minh-Quan Le, Minh-Triet Tran, Trung-Nghia Le, Tam V. Nguyen, Thanh-Toan Do
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Camouflaged object detection (COD) and camouflaged instance segmentation (CIS) aim to recognize and segment objects that are blended into their surroundings, respectively. While several deep neural network models have been proposed to tackle those tasks, augmentation methods for COD and CIS have not been thoroughly explored. Augmentation strategies can help improve the performance of models by increasing the size and diversity of the training data and exposing the model to a wider range of variations in the data. Besides, we aim to automatically learn transformations that help to reveal the underlying structure of camouflaged objects and allow the model to learn to better identify and segment camouflaged objects. To achieve this, we propose a learnable augmentation method in the frequency domain for COD and CIS via Fourier transform approach, dubbed CamoFourier. Our method leverages a conditional generative adversarial network and cross-attention mechanism to generate a reference image and an adaptive hybrid swapping with parameters to mix the low-frequency component of the reference image and the high-frequency component of the input image. This approach aims to make camouflaged objects more visible for detection and segmentation models. Without bells and whistles, our proposed augmentation method boosts the performance of camouflaged object detectors and camouflaged instance segmenters by large margins.
Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification
Authors: Atefeh Mahdavi, Neda Keivandarian, Marco Carvalho
Abstract
Adversarial example detection plays a vital role in adaptive cyber defense, especially in the face of rapidly evolving attacks. In adaptive cyber defense, the nature and characteristics of attacks continuously change, making it crucial to have robust mechanisms in place to detect and counter these threats effectively. By incorporating adversarial example detection techniques, adaptive cyber defense systems can enhance their ability to identify and mitigate attacks that attempt to exploit vulnerabilities in machine learning models or other systems. Adversarial examples are inputs that are crafted by applying intentional perturbations to natural inputs that result in incorrect classification. In this paper, we propose a novel approach that leverages the power of BERT (Bidirectional Encoder Representations from Transformers) and introduces the concept of Space Exploration Features. We utilize the feature vectors obtained from the BERT model's output to capture a new representation of feature space to improve the density estimation method.
MDTD: A Multi Domain Trojan Detector for Deep Neural Networks
Abstract
Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks. An adversary carrying out a backdoor attack embeds a predefined perturbation called a trigger into a small subset of input samples and trains the DNN such that the presence of the trigger in the input results in an adversary-desired output class. Such adversarial retraining however needs to ensure that outputs for inputs without the trigger remain unaffected and provide high classification accuracy on clean samples. In this paper, we propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs containing a Trojan trigger at testing time. MDTD does not require knowledge of trigger-embedding strategy of the attacker and can be applied to a pre-trained DNN model with image, audio, or graph-based inputs. MDTD leverages an insight that input samples containing a Trojan trigger are located relatively farther away from a decision boundary than clean samples. MDTD estimates the distance to a decision boundary using adversarial learning methods and uses this distance to infer whether a test-time input sample is Trojaned or not. We evaluate MDTD against state-of-the-art Trojan detection methods across five widely used image-based datasets: CIFAR100, CIFAR10, GTSRB, SVHN, and Flowers102; four graph-based datasets: AIDS, WinMal, Toxicant, and COLLAB; and the SpeechCommand audio dataset. MDTD effectively identifies samples that contain different types of Trojan triggers. We evaluate MDTD against adaptive attacks where an adversary trains a robust DNN to increase (decrease) distance of benign (Trojan) inputs from a decision boundary.
Computing Geodesic Paths Encoding a Curvature Prior
Authors: Da Chen, Jean-Marie Mirebeau, Minglei Shu, Laurent D. Cohen
Abstract
In this paper, we introduce an efficient method for computing curves minimizing a variant of the Euler-Mumford elastica energy, with fixed endpoints and tangents at these endpoints, where the bending energy is enhanced with a user defined and data-driven scalar-valued term referred to as the curvature prior. In order to guarantee that the globally optimal curve is extracted, the proposed method involves the numerical computation of the viscosity solution to a specific static Hamilton-Jacobi-Bellman (HJB) partial differential equation (PDE). For that purpose, we derive the explicit Hamiltonian associated to this variant model equipped with a curvature prior, discretize the resulting HJB PDE using an adaptive finite difference scheme, and solve it in a single pass using a generalized Fast-Marching method. In addition, we also present a practical method for estimating the curvature prior values from image data, designed for the task of accurately tracking curvilinear structure centerlines. Numerical experiments on synthetic and real image data illustrate the advantages of the considered variant of the elastica model with a prior curvature enhancement in complex scenarios where challenging geometric structures appear.
Beard Segmentation and Recognition Bias
Authors: Kagan Ozturk, Grace Bezold, Aman Bhatta, Haiyu Wu, Kevin Bowyer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
A person's facial hairstyle, such as presence and size of beard, can significantly impact face recognition accuracy. There are publicly-available deep networks that achieve reasonable accuracy at binary attribute classification, such as beard / no beard, but few if any that segment the facial hair region. To investigate the effect of facial hair in a rigorous manner, we first created a set of fine-grained facial hair annotations to train a segmentation model and evaluate its accuracy across African-American and Caucasian face images. We then use our facial hair segmentations to categorize image pairs according to the degree of difference or similarity in the facial hairstyle. We find that the False Match Rate (FMR) for image pairs with different categories of facial hairstyle varies by a factor of over 10 for African-American males and over 25 for Caucasian males. To reduce the bias across image pairs with different facial hairstyles, we propose a scheme for adaptive thresholding based on facial hairstyle similarity. Evaluation on a subject-disjoint set of images shows that adaptive similarity thresholding based on facial hairstyles of the image pair reduces the ratio between the highest and lowest FMR across facial hairstyle categories for African-American from 10.7 to 1.8 and for Caucasians from 25.9 to 1.3. Facial hair annotations and facial hair segmentation model will be publicly available.
Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding
Authors: Yeongwoong Kim, Suyong Bahk, Seungeon Kim, Won Hee Lee, Dokwan Oh, Hui Yong Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In NVC, however, limited research has investigated the hierarchical B scheme. In this paper, we propose an NVC model exploiting hierarchical B-frame coding with temporal layer-adaptive optimization. We first extend an existing unidirectional NVC model to a bidirectional model, which achieves -21.13% BD-rate gain over the unidirectional baseline model. However, this model faces challenges when applied to sequences with complex or large motions, leading to performance degradation. To address this, we introduce temporal layer-adaptive optimization, incorporating methods such as temporal layer-adaptive quality scaling (TAQS) and temporal layer-adaptive latent scaling (TALS). The final model with the proposed methods achieves an impressive BD-rate gain of -39.86% against the baseline. It also resolves the challenges in sequences with large or complex motions with up to -49.13% more BD-rate gains than the simple bidirectional extension. This improvement is attributed to the allocation of more bits to lower temporal layers, thereby enhancing overall reconstruction quality with smaller bits. Since our method has little dependency on a specific NVC model architecture, it can serve as a general tool for extending unidirectional NVC models to the ones with hierarchical B-frame coding.
Funnel-based Control for Reach-Avoid-Stay Specifications
Authors: Ratnangshu Das, Pushpak Jagtap
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
The paper addresses the problem of controller synthesis for control-affine nonlinear systems to meet reach-avoid-stay specifications. Specifically, the goal of the research is to obtain a closed-form control law ensuring that the trajectories of the nonlinear system, reach a target set while avoiding all unsafe regions and adhering to the state-space constraints. To tackle this problem, we leverage the concept of the funnel-based control approach. Given an arbitrary unsafe region, we introduce a circumvent function that guarantees the system trajectory to steer clear of that region. Subsequently, an adaptive funnel framework is proposed based on the target, followed by the construction of a closed-form controller using the established funnel function, enforcing the reach-avoid-stay specifications. To demonstrate the efficacy of the proposed funnel-based control approach, a series of simulation experiments have been carried out.
Learning the References of Online Model Predictive Control for Urban Self-Driving
Authors: Yubin Wang, Zengqi Peng, Hakim Ghazzai, Jun Ma
Abstract
In this work, we propose a novel learning-based online model predictive control (MPC) framework for motion synthesis of self-driving vehicles. In this framework, the decision variables are generated as instantaneous references to modulate the cost functions of online MPC, where the constraints of collision avoidance and drivable surface boundaries are latently represented in the soft form. Hence, the embodied maneuvers of the ego vehicle are empowered to adapt to complex and dynamic traffic environments, even with unmodeled uncertainties of other traffic participants. Furthermore, we implement a deep reinforcement learning (DRL) framework for policy search to cast the step actions as the decision variables, where the practical and lightweight observations are considered as the input features of the policy network. The proposed approach is implemented in the high-fidelity simulator involving compound-complex urban driving scenarios, and the results demonstrate that the proposed development manifests remarkable adaptiveness to complex and dynamic traffic environments with a success rate of 85%. Also, its advantages in terms of safety, maneuverability, and robustness are illustrated.
Federated Two Stage Decoupling With Adaptive Personalization Layers
Abstract
Federated learning has gained significant attention due to its groundbreaking ability to enable distributed learning while maintaining privacy constraints. However, as a consequence of data heterogeneity among decentralized devices, it inherently experiences significant learning degradation and slow convergence speed. Therefore, it is natural to employ the concept of clustering homogeneous clients into the same group, allowing only the model weights within each group to be aggregated. While most existing clustered federated learning methods employ either model gradients or inference outputs as metrics for client partitioning, with the goal of grouping similar devices together, may still have heterogeneity within each cluster. Moreover, there is a scarcity of research exploring the underlying reasons for determining the appropriate timing for clustering, resulting in the common practice of assigning each client to its own individual cluster, particularly in the context of highly non independent and identically distributed (Non-IID) data. In this paper, we introduce a two-stage decoupling federated learning algorithm with adaptive personalization layers named FedTSDP, where client clustering is performed twice according to inference outputs and model weights, respectively. Hopkins amended sampling is adopted to determine the appropriate timing for clustering and the sampling weight of public unlabeled data. In addition, a simple yet effective approach is developed to adaptively adjust the personalization layers based on varying degrees of data skew. Experimental results show that our proposed method has reliable performance on both IID and non-IID scenarios.
WUDI: A Human Involved Self-Adaptive Framework to Prevent Childhood Obesity in Internet of Things Environment
Abstract
The Internet of Things (IoT) connects people, devices, and information resources, in various domains to improve efficiency. The healthcare domain has been transformed by the integration of the IoT, leading to the development of digital healthcare solutions such as health monitoring, emergency detection, and remote operation. This integration has led to an increase in the health data collected from a variety of IoT sources. Consequently, advanced technologies are required to analyze health data, and artificial intelligence has been employed to extract meaningful insights from the data. Childhood overweight and obesity have emerged as some of the most serious global public health challenges, as they can lead to a variety of health-related problems and the early development of chronic diseases. To address this, a self-adaptive framework is proposed to prevent childhood obesity by using lifelog data from IoT environments, with human involvement being an important consideration in the framework. The framework uses an ensemble-based learning model to predict obesity using the lifelog data. Empirical experiments using lifelog data from smartphone applications were conducted to validate the effectiveness of human involvement and obesity prediction. The results demonstrated the efficiency of the proposed framework with human involvement in obesity prediction. The proposed framework can be applied in real-world healthcare services for childhood obesity.
Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Abstract
Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due to fragmented research; 2) the focus on algorithm design over critical scheduling strategies, especially in CUDA-enabled GPU contexts; and 3) challenges in measuring practical latency, given that most libraries cater to static operations. Addressing these issues, we unveil the Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. To bridge the theoretical and practical efficiency gap, LAUDNet merges algorithmic design with scheduling optimization, guided by a latency predictor that accurately gauges dynamic operator latency. We've tested LAUDNet across multiple vision tasks, demonstrating its capacity to notably reduce the latency of models like ResNet-101 by over 50% on platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in balancing accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet.
Adaptive Multi-Modalities Fusion in Sequential Recommendation Systems
Authors: Hengchang Hu, Wei Guo, Yong Liu, Min-Yen Kan
Abstract
In sequential recommendation, multi-modal information (e.g., text or image) can provide a more comprehensive view of an item's profile. The optimal stage (early or late) to fuse modality features into item representations is still debated. We propose a graph-based approach (named MMSR) to fuse modality features in an adaptive order, enabling each modality to prioritize either its inherent sequential nature or its interplay with other modalities. MMSR represents each user's history as a graph, where the modality features of each item in a user's history sequence are denoted by cross-linked nodes. The edges between homogeneous nodes represent intra-modality sequential relationships, and the ones between heterogeneous nodes represent inter-modality interdependence relationships. During graph propagation, MMSR incorporates dual attention, differentiating homogeneous and heterogeneous neighbors. To adaptively assign nodes with distinct fusion orders, MMSR allows each node's representation to be asynchronously updated through an update gate. In scenarios where modalities exhibit stronger sequential relationships, the update gate prioritizes updates among homogeneous nodes. Conversely, when the interdependent relationships between modalities are more pronounced, the update gate prioritizes updates among heterogeneous nodes. Consequently, MMSR establishes a fusion order that spans a spectrum from early to late modality fusion. In experiments across six datasets, MMSR consistently outperforms state-of-the-art models, and our graph propagation methods surpass other graph neural networks. Additionally, MMSR naturally manages missing modalities.
Abstract
The Huge Object model is a distribution testing model in which we are given access to independent samples from an unknown distribution over the set of strings ${0,1}^n$, but are only allowed to query a few bits from the samples. We investigate the problem of testing whether a distribution is supported on $m$ elements in this model. It turns out that the behavior of this property is surprisingly intricate, especially when also considering the question of adaptivity. We prove lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime. Our bounds are tight when $m$ is fixed to a constant (and the distance parameter $\varepsilon$ is the only variable). For the general case, our bounds are at most $O(\log m)$ apart. In particular, our results show a surprising $O(\log \varepsilon^{-1})$ gap between the number of queries required for non-adaptive testing as compared to adaptive testing. For one sided error testing, we also show that an $O(\log m)$ gap between the number of samples and the number of queries is necessary. Our results utilize a wide variety of combinatorial and probabilistic methods.
DiffuVolume: Diffusion Model for Volume based Stereo Matching
Abstract
Stereo matching is a significant part in many computer vision tasks and driving-based applications. Recently cost volume-based methods have achieved great success benefiting from the rich geometry information in paired images. However, the redundancy of cost volume also interferes with the model training and limits the performance. To construct a more precise cost volume, we pioneeringly apply the diffusion model to stereo matching. Our method, termed DiffuVolume, considers the diffusion model as a cost volume filter, which will recurrently remove the redundant information from the cost volume. Two main designs make our method not trivial. Firstly, to make the diffusion model more adaptive to stereo matching, we eschew the traditional manner of directly adding noise into the image but embed the diffusion model into a task-specific module. In this way, we outperform the traditional diffusion stereo matching method by 22% EPE improvement and 240 times inference acceleration. Secondly, DiffuVolume can be easily embedded into any volume-based stereo matching network with boost performance but slight parameters rise (only 2%). By adding the DiffuVolume into well-performed methods, we outperform all the published methods on Scene Flow, KITTI2012, KITTI2015 benchmarks and zero-shot generalization setting. It is worth mentioning that the proposed model ranks 1st on KITTI 2012 leader board, 2nd on KITTI 2015 leader board since 15, July 2023.
Optimal Non-Adaptive Cell Probe Dictionaries and Hashing
Authors: Kasper Green Larsen, Rasmus Pagh, Toniann Pitassi, Or Zamir
Abstract
We present a simple and provably optimal non-adaptive cell probe data structure for the static dictionary problem. Our data structure supports storing a set of $n$ key-value pairs from $[u]\times [u]$ using $s$ words of space and answering key lookup queries in $t = O(\lg(u/n)/\lg(s/n))$ non-adaptive probes. This generalizes a solution to the membership problem (i.e., where no values are associated with keys) due to Buhrman et al. and matches a recent lower bound by Persiano and Yeo. Using the ideas underlying our data structure, we also obtain the first implementation of a $n$-wise independent family of hash functions with optimal evaluation time in the cell probe model.
Semantic Image Synthesis via Class-Adaptive Cross-Attention
Authors: Tomaso Fontanini, Claudio Ferrari, Giuseppe Lisanti, Massimo Bertozzi, Andrea Prati
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In semantic image synthesis, the state of the art is dominated by methods that use spatially-adaptive normalization layers, which allow for excellent visual generation quality and editing versatility. Granted their efficacy, recent research efforts have focused toward finer-grained local style control and multi-modal generation. By construction though, such layers tend to overlook global image statistics leading to unconvincing local style editing and causing global inconsistencies such as color or illumination distribution shifts. Also, the semantic layout is required for mapping styles in the generator, putting a strict alignment constraint over the features. In response, we designed a novel architecture where cross-attention layers are used in place of de-normalization ones for conditioning the image generation. Our model inherits the advantages of both solutions, retaining state-of-the-art reconstruction quality, as well as improved global and local style transfer. Code and models available at https://github.com/TFonta/CA2SIS.
Algebraic, Topological, and Mereological Foundations of Existential Granules
Authors: Mani A
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic (math.LO); Rings and Algebras (math.RA)
Abstract
In this research, new concepts of existential granules that determine themselves are invented, and are characterized from algebraic, topological, and mereological perspectives. Existential granules are those that determine themselves initially, and interact with their environment subsequently. Examples of the concept, such as those of granular balls, though inadequately defined, algorithmically established, and insufficiently theorized in earlier works by others, are already used in applications of rough sets and soft computing. It is shown that they fit into multiple theoretical frameworks (axiomatic, adaptive, and others) of granular computing. The characterization is intended for algorithm development, application to classification problems and possible mathematical foundations of generalizations of the approach. Additionally, many open problems are posed and directions provided.
Boosting Detection in Crowd Analysis via Underutilized Output Features
Authors: Shaokai Wu, Fengyu Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Detection-based methods have been viewed unfavorably in crowd analysis due to their poor performance in dense crowds. However, we argue that the potential of these methods has been underestimated, as they offer crucial information for crowd analysis that is often ignored. Specifically, the area size and confidence score of output proposals and bounding boxes provide insight into the scale and density of the crowd. To leverage these underutilized features, we propose Crowd Hat, a plug-and-play module that can be easily integrated with existing detection models. This module uses a mixed 2D-1D compression technique to refine the output features and obtain the spatial and numerical distribution of crowd-specific information. Based on these features, we further propose region-adaptive NMS thresholds and a decouple-then-align paradigm that address the major limitations of detection-based methods. Our extensive evaluations on various crowd analysis tasks, including crowd counting, localization, and detection, demonstrate the effectiveness of utilizing output features and the potential of detection-based methods in crowd analysis.
Keyword: quantization
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
Authors: Qingyuan Li, Yifan Zhang, Liang Li, Peng Yao, Bo Zhang, Xiangxiang Chu, Yerui Sun, Li Du, Yuchen Xie
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
In the era of large-scale language models, the substantial parameter size poses significant challenges for deployment. Being a prevalent compression technique, quantization has emerged as the mainstream practice to tackle this issue, which is mainly centered on two recipes W8A8 and W4A16 (i.e. weights and activations in such bit widths). In this study, we propose a novel W4A8 post-training quantization method for the available open-sourced LLMs, which combines the advantages of both two recipes. Therefore, we can leverage the benefit in the I/O utilization of 4-bit weight quantization and the acceleration due to 8-bit matrix computation. Nevertheless, the W4A8 faces notorious performance degradation. As a remedy, we involve layerwise activation quantization strategies which feature a novel logarithmic equalization for most intractable layers, and we combine them with fine-grained weight quantization. Without whistles and bells, we eliminate the necessity for further fine-tuning and obtain the state-of-the-art W4A8 quantized performance on BLOOM, LLaMA, and LLaMA-2 on standard benchmarks. We confirm that the W4A8 quantization is achievable for the deployment of large language models, fostering their wide-spreading real-world applications.
Keyword: efficient
Parallel Unconstrained Local Search for Partitioning Irregular Graphs
Chunked Lists versus Extensible Arrays for Text Inversion
Efficient Ray Sampling for Radiance Fields Reconstruction
Pure Exploration under Mediators' Feedback
Everything Perturbed All at Once: Enabling Differentiable Graph Attacks
Streaming, Local, and Multi-Level (Hyper)Graph Decomposition
Flexible Handover with Real-Time Robust Dynamic Grasp Trajectory Generation
Parameterized and Approximation Algorithms for the Maximum Bimodal Subgraph Problem
A Task-Parallel Approach for Localized Topological Data Structures
Clustering Without an Eigengap
AskIt: Unified Programming Interface for Programming with Large Language Models
Ensuring User-side Fairness in Dynamic Recommender Systems
Deep Reinforcement Learning Based Framework for Mobile Energy Disseminator Dispatching to Charge On-the-Road Electric Vehicles
Enabling Cooperative Hybrid Beamforming in TDD-based Distributed MIMO Systems
Convergence of non-linear diagonal frame filtering for regularizing inverse problems
Lower Bound for Independence Covering in $C_4$-Free Graphs
Towards Earlier Detection of Oral Diseases On Smartphones Using Oral and Dental RGB Images
Quantifying and Analyzing Entity-level Memorization in Large Language Models
Computing Geodesic Paths Encoding a Curvature Prior
Drone-NeRF: Efficient NeRF Based 3D Scene Reconstruction for Large-Scale Drone Survey
Efficient and Explainable Graph Neural Architecture Search via Monte-Carlo Tree Search
Random Shortening of Linear Codes and Applications
Reimagining Sense Amplifiers: Harnessing Phase Transition Materials for Current and Voltage Sensing
Task-Based MoE for Multitask Multilingual Machine Translation
Securing Blockchain Systems: A Novel Collaborative Learning Framework to Detect Attacks in Transactions and Smart Contracts
Early Detection of Red Palm Weevil Infestations using Deep Learning Classification of Acoustic Signals
Event-Triggered Polynomial Control for Trajectory Tracking of Unicycle Robots
Prompting Vision Language Model with Knowledge from Large Language Model for Knowledge-Based VQA
Domain Generalization without Excess Empirical Risk
Inductive Learning of Declarative Domain-Specific Heuristics for ASP
Deontic Paradoxes in ASP with Weak Constraints
Towards One-Shot Learning for Text Classification using Inductive Logic Programming
Generalizing Level Ranking Constraints for Monotone and Convex Aggregates
Sorting Strategies for Interactive Conflict Resolution in ASP
Data reduction for directed feedback vertex set on graphs without long induced cycles
Cyclophobic Reinforcement Learning
IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation
Jaccard-constrained dense subgraph discovery
Specx: a C++ task-based runtime system for heterogeneous distributed architectures
Demo: Integration of Marketplace for the 5G Open RAN Ecosystem
MerA: Merging Pretrained Adapters For Few-Shot Learning
AI-powered Fraud Detection in Decentralized Finance: A Project Life Cycle Perspective
Hidden-Role Games: Equilibrium Concepts and Computation
Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs
P2M: A Fast Solver for Querying Distance from Point to Mesh Surface
Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation
Near-Field 3D Localization via MIMO Radar: Cramér-Rao Bound Analysis and Estimator Design
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Keyword: faster
Finding Optimal 2-Packing Sets on Arbitrary Graphs at Scale
Split Without a Leak: Reducing Privacy Leakage in Split Learning
On the Independencies Hidden in the Structure of a Probabilistic Logic Program
Finding-Aware Anatomical Tokens for Chest X-Ray Automated Reporting
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
P2M: A Fast Solver for Querying Distance from Point to Mesh Surface
Keyword: mobile
Generative AI for Semantic Communication: Architecture, Challenges, and Outlook
Deep Reinforcement Learning Based Framework for Mobile Energy Disseminator Dispatching to Charge On-the-Road Electric Vehicles
Towards Earlier Detection of Oral Diseases On Smartphones Using Oral and Dental RGB Images
Early Detection of Red Palm Weevil Infestations using Deep Learning Classification of Acoustic Signals
Sparse Waypoint Validity Checking for Self-Entanglement-Free Tethered Path Planning
Specx: a C++ task-based runtime system for heterogeneous distributed architectures
Breaking the Interference and Fading Gridlock in Backscatter Communications: State-of-the-Art, Design Challenges, and Future Directions
Rate-Splitting for CF Massive MIMO Systems With Channel Aging
Keyword: pruning
There is no result
Keyword: diffusion
Intriguing Properties of Diffusion Models: A Large-Scale Dataset for Evaluating Natural Attack Capability in Text-to-Image Generative Models
Density Stabilization Strategies for Nonholonomic Agents on Compact Manifolds
Zero-shot Inversion Process for Image Attribute Editing with Diffusion Models
Feature Attention Network (FA-Net): A Deep-Learning Based Approach for Underwater Single Image Enhancement
Physics-Informed DeepMRI: Bridging the Gap from Heat Diffusion to k-Space Interpolation
DiffuVolume: Diffusion Model for Volume based Stereo Matching
SignDiff: Learning Diffusion Models for American Sign Language Production
Keyword: adaptive
DebSDF: Delving into the Details and Bias of Neural Indoor Scene Reconstruction
Unveiling Camouflage: A Learnable Fourier-based Augmentation for Camouflaged Object Detection and Instance Segmentation
Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification
MDTD: A Multi Domain Trojan Detector for Deep Neural Networks
Computing Geodesic Paths Encoding a Curvature Prior
Beard Segmentation and Recognition Bias
Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding
Funnel-based Control for Reach-Avoid-Stay Specifications
Learning the References of Online Model Predictive Control for Urban Self-Driving
Federated Two Stage Decoupling With Adaptive Personalization Layers
WUDI: A Human Involved Self-Adaptive Framework to Prevent Childhood Obesity in Internet of Things Environment
Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Adaptive Multi-Modalities Fusion in Sequential Recommendation Systems
Support Testing in the Huge Object Model
DiffuVolume: Diffusion Model for Volume based Stereo Matching
Optimal Non-Adaptive Cell Probe Dictionaries and Hashing
Semantic Image Synthesis via Class-Adaptive Cross-Attention
Algebraic, Topological, and Mereological Foundations of Existential Granules
Boosting Detection in Crowd Analysis via Underutilized Output Features
Keyword: quantization
FPTQ: Fine-grained Post-Training Quantization for Large Language Models