Abstract
The integration of emotional intelligence in machines is an important step in advancing human-computer interaction. This demands the development of reliable end-to-end emotion recognition systems. However, the scarcity of public affective datasets presents a challenge. In this literature review, we emphasize the use of generative models to address this issue in neurophysiological signals, particularly Electroencephalogram (EEG) and Functional Near-Infrared Spectroscopy (fNIRS). We provide a comprehensive analysis of different generative models used in the field, examining their input formulation, deployment strategies, and methodologies for evaluating the quality of synthesized data. This review serves as a comprehensive overview, offering insights into the advantages, challenges, and promising future directions in the application of generative models in emotion recognition systems. Through this review, we aim to facilitate the progression of neurophysiological data augmentation, thereby supporting the development of more efficient and reliable emotion recognition systems.
How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study
Authors: Alexander Isenko, Ruben Mayer, Hans-Arno Jacobsen
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Performance (cs.PF)
Abstract
Training deep learning models in the cloud or on dedicated hardware is expensive. A more cost-efficient option are hyperscale clouds offering spot instances, a cheap but ephemeral alternative to on-demand resources. As spot instance availability can change depending on the time of day, continent, and cloud provider, it could be more cost-efficient to distribute resources over the world. Still, it has not been investigated whether geo-distributed, data-parallel spot deep learning training could be a more cost-efficient alternative to centralized training. This paper aims to answer the question: Can deep learning models be cost-efficiently trained on a global market of spot VMs spanning different data centers and cloud providers? To provide guidance, we extensively evaluate the cost and throughput implications of training in different zones, continents, and clouds for representative CV and NLP models. To expand the current training options further, we compare the scalability potential for hybrid-cloud scenarios by adding cloud resources to on-premise hardware to improve training throughput. Finally, we show how leveraging spot instance pricing enables a new cost-efficient way to train models with multiple cheap VMs, trumping both more centralized and powerful hardware and even on-demand cloud offerings at competitive prices.
Segregated FLS Processing Cores for V/STOL Autonomous Landing Guidance Assistant System using FPGA
Abstract
It is highly predicted that the roads and parking areas will be extremely congested with vehicles to the point that searching for a novel solution will not be an optional choice for conserving the sustainability rate of the overall humanity's development growth. Such issue could be overcome by developing modified generations of the Urban Air Mobility (UAM) vehicles that essentially depend on the Vertical and/or Short Take-Off and Landing (V/STOL) feature to increase the efficiency of landing capabilities on limited-space parking areas. The complexity of integrating an efficient and safe V/STOL feature in such UAM vehicles is notably difficult comparing with the conventional and normal techniques for landing and take-off. The efficient V/STOL feature should be carried out by a complete and collaborative Cyber-Physical System (CPS) processing architecture, such as the CPS-5C architecture. In this paper, we only proposed two CPS-5C physical layers of a V/STOL Autonomous Landing Guidance Assistant System (ALGAS2) processing unit to increase the reliability of the vertical landing mechanism. The proposed V/STOL-ALGAS2 system depends on Fuzzy Logic System (FLS) as the advanced control unit. Furthermore, the proposed ALGAS2 system depends on four symmetric and segregated processing ALGAS2 cores that processing the data in a fully parallel and independent manner to enhance many essential security and safety factors for the futuristic UAM vehicles. The proposed ALGAS2 digital circuits architecture has been designed using MATLAB and VHDL. Also, it has been further analyzed for the implementation and validation tests using the Intel Altera OpenVINO FPGA board. The proposed ALGAS processing unit attained a maximum computational processing performance of about 21.22 Giga Operations per Seconds (GOPS).
DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic Echo Cancellation, Noise Suppression and Dereverberation
Authors: Evgenii Indenbom, Nicolae-Catalin Ristea, Ando Saabas, Tanel Parnamaa, Jegor Guzvin, Ross Cutler
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Abstract
Acoustic echo cancellation (AEC), noise suppression (NS) and dereverberation (DR) are an integral part of modern full-duplex communication systems. As the demand for teleconferencing systems increases, addressing these tasks is required for an effective and efficient online meeting experience. Most prior research proposes solutions for these tasks separately, combining them with digital signal processing (DSP) based components, resulting in complex pipelines that are often impractical to deploy in real-world applications. This paper proposes a real-time cross-attention deep model, named DeepVQE, based on residual convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to simultaneously address AEC, NS, and DR. We conduct several ablation studies to analyze the contributions of different components of our model to the overall performance. DeepVQE achieves state-of-the-art performance on non-personalized tracks from the ICASSP 2023 Acoustic Echo Cancellation Challenge and ICASSP 2023 Deep Noise Suppression Challenge test sets, showing that a single model can handle multiple tasks with excellent performance. Moreover, the model runs in real-time and has been successfully tested for the Microsoft Teams platform.
Lumos in the Night Sky: AI-enabled Visual Tool for Exploring Night-Time Light Patterns
Authors: Jakob Hederich, Shreya Ghosh, Zeyu He, Prasenjit Mitra
Abstract
We introduce NightPulse, an interactive tool for Night-time light (NTL) data visualization and analytics, which enables researchers and stakeholders to explore and analyze NTL data with a user-friendly platform. Powered by efficient system architecture, NightPulse supports image segmentation, clustering, and change pattern detection to identify urban development and sprawl patterns. It captures temporal trends of NTL and semantics of cities, answering questions about demographic factors, city boundaries, and unusual differences.
On the Parameterized Complexity of Computing $st$-Orientations with Few Transitive Edges
Abstract
Orienting the edges of an undirected graph such that the resulting digraph satisfies some given constraints is a classical problem in graph theory, with multiple algorithmic applications. In particular, an $st$-orientation orients each edge of the input graph such that the resulting digraph is acyclic, and it contains a single source $s$ and a single sink $t$. Computing an $st$-orientation of a graph can be done efficiently, and it finds notable applications in graph algorithms and in particular in graph drawing. On the other hand, finding an $st$-orientation with at most $k$ transitive edges is more challenging and it was recently proven to be NP-hard already when $k=0$. We strengthen this result by showing that the problem remains NP-hard even for graphs of bounded diameter, and for graphs of bounded vertex degree. These computational lower bounds naturally raise the question about which structural parameters can lead to tractable parameterizations of the problem. Our main result is a fixed-parameter tractable algorithm parameterized by treewidth.
A Static Evaluation of Code Completion by Large Language Models
Abstract
Large language models trained on code have shown great potential to increase productivity of software developers. Several execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. Nevertheless, it is expensive to perform the same evaluation on complex real-world projects considering the execution cost. On the contrary, static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. In this work, we propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees. Compared with execution-based evaluation, our method is not only more efficient, but also applicable to code in the wild. For experiments, we collect code context from open source repos to generate one million function bodies using public models. Our static analysis reveals that Undefined Name and Unused Variable are the most common errors among others made by language models. Through extensive studies, we also show the impact of sampling temperature, model size, and context on static errors in code completions.
End-to-end Differentiable Clustering with Associative Memories
Authors: Bishwajit Saha, Dmitry Krotov, Mohammed J. Zaki, Parikshit Ram
Abstract
Clustering is a widely used unsupervised learning technique involving an intensive discrete optimization problem. Associative Memory models or AMs are differentiable neural networks defining a recursive dynamical system, which have been integrated with various deep learning architectures. We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in clustering to propose a novel unconstrained continuous relaxation of the discrete clustering problem, enabling end-to-end differentiable clustering with AM, dubbed ClAM. Leveraging the pattern completion ability of AMs, we further develop a novel self-supervised clustering loss. Our evaluations on varied datasets demonstrate that ClAM benefits from the self-supervision, and significantly improves upon both the traditional Lloyd's k-means algorithm, and more recent continuous clustering relaxations (by upto 60% in terms of the Silhouette Coefficient).
CONCORD: Clone-aware Contrastive Learning for Source Code
Abstract
Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks, such as clone and bug detection. While previous work successfully learned from different code abstractions (e.g., token, AST, graph), we argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. On the one hand, human developers tend to write repetitive programs referencing existing code snippets from the current codebase or online resources (e.g., Stack Overflow website) rather than implementing functions from scratch; such behaviors result in a vast number of code clones. In contrast, a deviant clone by mistake might trigger malicious program behaviors. Thus, as a proxy to incorporate developers' coding behavior into the pre-training scheme, we propose to include code clones and their deviants. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart. We show that CONCORD's clone-aware contrastive learning drastically reduces the need for expensive pre-training resources while improving the performance of downstream SE tasks. We also empirically demonstrate that CONCORD can improve existing pre-trained models to learn better representations that consequently become more efficient in both identifying semantically equivalent programs and differentiating buggy from non-buggy code.
Understanding the Effectiveness of Early Weight Averaging for Training Large Language Models
Authors: Sunny Sanyal, Jean Kaddour, Abhishek Kumar, Sujay Sanghavi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
Training LLMs is expensive, and recent evidence indicates training all the way to convergence is inefficient. In this paper, we investigate the ability of a simple idea, checkpoint averaging along the trajectory of a training run to improve the quality of models before they have converged. This approach incurs no extra cost during training or inference. Specifically, we analyze the training trajectories of Pythia LLMs with 1 to 12 billion parameters and demonstrate that, particularly during the early to mid stages of training, this idea accelerates convergence and improves both test and zero-shot generalization. Loss spikes are a well recognized problem in LLM training; in our analysis we encountered two instances of this in the underlying trajectories, and both instances were mitigated by our averaging. For a 6.9B parameter LLM, for example, our early weight averaging recipe can save upto 4200 hours of GPU time, which corresponds to significant savings in cloud compute costs.
Construction d'un système de recommandation basé sur des contraintes via des graphes de connaissances
Authors: Ngoc Luyen Le, Marie-Hélène Abel, Philippe Gouspillou
Abstract
Knowledge graphs in RDF model entities and their relations using ontologies, and have gained popularity for information modeling. In recommender systems, knowledge graphs help represent more links and relationships between users and items. Constraint-based recommender systems leverage deep recommendation knowledge to identify relevant suggestions. When combined with knowledge graphs, they offer benefits in constraint sets. This paper explores a constraint-based recommender system using RDF knowledge graphs for the vehicle purchase/sale domain. Our experiments demonstrate that the proposed approach efficiently identifies recommendations based on user preferences.
Generating Private Synthetic Data with Genetic Algorithms
Authors: Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu
Subjects: Neural and Evolutionary Computing (cs.NE); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
We study the problem of efficiently generating differentially private synthetic data that approximate the statistical properties of an underlying sensitive dataset. In recent years, there has been a growing line of work that approaches this problem using first-order optimization techniques. However, such techniques are restricted to optimizing differentiable objectives only, severely limiting the types of analyses that can be conducted. For example, first-order mechanisms have been primarily successful in approximating statistical queries only in the form of marginals for discrete data domains. In some cases, one can circumvent such issues by relaxing the task's objective to maintain differentiability. However, even when possible, these approaches impose a fundamental limitation in which modifications to the minimization problem become additional sources of error. Therefore, we propose Private-GSD, a private genetic algorithm based on zeroth-order optimization heuristics that do not require modifying the original objective. As a result, it avoids the aforementioned limitations of first-order optimization. We empirically evaluate Private-GSD against baseline algorithms on data derived from the American Community Survey across a variety of statistics--otherwise known as statistical queries--both for discrete and real-valued attributes. We show that Private-GSD outperforms the state-of-the-art methods on non-differential queries while matching accuracy in approximating differentiable ones.
Efficient automatic design of robots
Authors: David Matthews, Andrew Spielberg, Daniela Rus, Sam Kriegman, Josh Bongard
Abstract
Robots are notoriously difficult to design because of complex interdependencies between their physical structure, sensory and motor layouts, and behavior. Despite this, almost every detail of every robot built to date has been manually determined by a human designer after several months or years of iterative ideation, prototyping, and testing. Inspired by evolutionary design in nature, the automated design of robots using evolutionary algorithms has been attempted for two decades, but it too remains inefficient: days of supercomputing are required to design robots in simulation that, when manufactured, exhibit desired behavior. Here we show for the first time de-novo optimization of a robot's structure to exhibit a desired behavior, within seconds on a single consumer-grade computer, and the manufactured robot's retention of that behavior. Unlike other gradient-based robot design methods, this algorithm does not presuppose any particular anatomical form; starting instead from a randomly-generated apodous body plan, it consistently discovers legged locomotion, the most efficient known form of terrestrial movement. If combined with automated fabrication and scaled up to more challenging tasks, this advance promises near instantaneous design, manufacture, and deployment of unique and useful machines for medical, environmental, vehicular, and space-based tasks.
Switching Autoregressive Low-rank Tensor Models
Authors: Hyun Dong Lee, Andrew Warrington, Joshua I. Glaser, Scott W. Linderman
Abstract
An important problem in time-series analysis is modeling systems with time-varying dynamics. Probabilistic models with joint continuous and discrete latent states offer interpretable, efficient, and experimentally useful descriptions of such data. Commonly used models include autoregressive hidden Markov models (ARHMMs) and switching linear dynamical systems (SLDSs), each with its own advantages and disadvantages. ARHMMs permit exact inference and easy parameter estimation, but are parameter intensive when modeling long dependencies, and hence are prone to overfitting. In contrast, SLDSs can capture long-range dependencies in a parameter efficient way through Markovian latent dynamics, but present an intractable likelihood and a challenging parameter estimation task. In this paper, we propose switching autoregressive low-rank tensor (SALT) models, which retain the advantages of both approaches while ameliorating the weaknesses. SALT parameterizes the tensor of an ARHMM with a low-rank factorization to control the number of parameters and allow longer range dependencies without overfitting. We prove theoretical and discuss practical connections between SALT, linear dynamical systems, and SLDSs. We empirically demonstrate quantitative advantages of SALT models on a range of simulated and real prediction tasks, including behavioral and neural datasets. Furthermore, the learned low-rank tensor provides novel insights into temporal dependencies within each discrete state.
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
Authors: Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, Peter Stone
Abstract
Lifelong learning offers a promising paradigm of building a generalist agent that learns and adapts over its lifespan. Unlike traditional lifelong learning problems in image and text domains, which primarily involve the transfer of declarative knowledge of entities and concepts, lifelong learning in decision-making (LLDM) also necessitates the transfer of procedural knowledge, such as actions and behaviors. To advance research in LLDM, we introduce LIBERO, a novel benchmark of lifelong learning for robot manipulation. Specifically, LIBERO highlights five key research topics in LLDM: 1) how to efficiently transfer declarative knowledge, procedural knowledge, or the mixture of both; 2) how to design effective policy architectures and 3) effective algorithms for LLDM; 4) the robustness of a lifelong learner with respect to task ordering; and 5) the effect of model pretraining for LLDM. We develop an extendible procedural generation pipeline that can in principle generate infinitely many tasks. For benchmarking purpose, we create four task suites (130 tasks in total) that we use to investigate the above-mentioned research topics. To support sample-efficient learning, we provide high-quality human-teleoperated demonstration data for all tasks. Our extensive experiments present several insightful or even unexpected discoveries: sequential finetuning outperforms existing lifelong learning methods in forward transfer, no single visual encoder architecture excels at all types of knowledge transfer, and naive supervised pretraining can hinder agents' performance in the subsequent LLDM. Check the website at https://libero-project.github.io for the code and the datasets.
Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents
Authors: Yashar Talebirad, Amirhossein Nadiri
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract
In this paper, we present a novel framework for enhancing the capabilities of large language models (LLMs) by leveraging the power of multi-agent systems. Our framework introduces a collaborative environment where multiple intelligent agent components, each with distinctive attributes and roles, work together to handle complex tasks more efficiently and effectively. We demonstrate the practicality and versatility of our framework through case studies in artificial general intelligence (AGI), specifically focusing on the Auto-GPT and BabyAGI models. We also examine the "Gorilla" model, which integrates external APIs into the LLM. Our framework addresses limitations and challenges such as looping issues, security risks, scalability, system evaluation, and ethical considerations. By modeling various domains such as courtroom simulations and software development scenarios, we showcase the potential applications and benefits of our proposed multi-agent system. Our framework provides an avenue for advancing the capabilities and performance of LLMs through collaboration and knowledge exchange among intelligent agents.
Stochastic Multi-Level Compositional Optimization Algorithms over Networks with Level-Independent Convergence Rate
Abstract
Stochastic multi-level compositional optimization problems cover many new machine learning paradigms, e.g., multi-step model-agnostic meta-learning, which require efficient optimization algorithms for large-scale applications. This paper studies the decentralized stochastic multi-level optimization algorithm, which is challenging because the multi-level structure and decentralized communication scheme may make the number of levels affect the order of the convergence rate. To this end, we develop two novel decentralized optimization algorithms to deal with the multi-level function and its gradient. Our theoretical results show that both algorithms can achieve the level-independent convergence rate for nonconvex problems under much milder conditions compared with existing single-machine algorithms. To the best of our knowledge, this is the first work that achieves the level-independent convergence rate under the decentralized setting. Moreover, extensive experiments confirm the efficacy of our proposed algorithms.
A Robust Likelihood Model for Novelty Detection
Authors: Ranya Almohsen, Shivang Patel, Donald A. Adjeroh, Gianfranco Doretto
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Current approaches to novelty or anomaly detection are based on deep neural networks. Despite their effectiveness, neural networks are also vulnerable to imperceptible deformations of the input data. This is a serious issue in critical applications, or when data alterations are generated by an adversarial attack. While this is a known problem that has been studied in recent years for the case of supervised learning, the case of novelty detection has received very limited attention. Indeed, in this latter setting the learning is typically unsupervised because outlier data is not available during training, and new approaches for this case need to be investigated. We propose a new prior that aims at learning a robust likelihood for the novelty test, as a defense against attacks. We also integrate the same prior with a state-of-the-art novelty detection approach. Because of the geometric properties of that approach, the resulting robust training is computationally very efficient. An initial evaluation of the method indicates that it is effective at improving performance with respect to the standard models in the absence and presence of attacks.
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Abstract
We introduce Inference-Time Intervention (ITI), a technique designed to enhance the truthfulness of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from 32.5% to 65.1%. We identify a tradeoff between truthfulness and helpfulness and demonstrate how to balance it by tuning the intervention strength. ITI is minimally invasive and computationally inexpensive. Moreover, the technique is data efficient: while approaches like RLHF require extensive annotations, ITI locates truthful directions using only few hundred examples. Our findings suggest that LLMs may have an internal representation of the likelihood of something being true, even as they produce falsehoods on the surface.
A sketch-and-project method for solving the matrix equation AXB = C
Abstract
In this paper, based on an optimization problem, a sketch-and-project method for solving the linear matrix equation AXB = C is proposed. We provide a thorough convergence analysis for the new method and derive a lower bound on the convergence rate and some convergence conditions including the case that the coefficient matrix is rank deficient. By varying three parameters in the new method and convergence theorems, the new method recovers an array of well-known algorithms and their convergence results. Meanwhile, with the use of Gaussian sampling, we can obtain the Gaussian global randomized Kaczmarz (GaussGRK) method which shows some advantages in solving the matrix equation AXB = C. Finally, numerical experiments are given to illustrate the effectiveness of recovered methods.
Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis
Abstract
Many machine learning algorithms require large numbers of labeled data to deliver state-of-the-art results. In applications such as medical diagnosis and fraud detection, though there is an abundance of unlabeled data, it is costly to label the data by experts, experiments, or simulations. Active learning algorithms aim to reduce the number of required labeled data points while preserving performance. For many convex optimization problems such as linear regression and $p$-norm regression, there are theoretical bounds on the number of required labels to achieve a certain accuracy. We call this the query complexity of active learning. However, today's active learning algorithms require the underlying learned function to have an orthogonal basis. For example, when applying active learning to linear regression, the requirement is the target function is a linear composition of a set of orthogonal linear functions, and active learning can find the coefficients of these linear functions. We present a theoretical result to show that active learning does not need an orthogonal basis but rather only requires a nearly orthogonal basis. We provide the corresponding theoretical proofs for the function family of nearly orthogonal basis, and its applications associated with the algorithmically efficient active learning framework.
Learning Representations on the Unit Sphere: Application to Online Continual Learning
Authors: Nicolas Michel, Giovanni Chierchia, Romain Negrel, Jean-François Bercher
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
We use the maximum a posteriori estimation principle for learning representations distributed on the unit sphere. We derive loss functions for the von Mises-Fisher distribution and the angular Gaussian distribution, both designed for modeling symmetric directional data. A noteworthy feature of our approach is that the learned representations are pushed toward fixed directions, allowing for a learning strategy that is resilient to data drift. This makes it suitable for online continual learning, which is the problem of training neural networks on a continuous data stream, where multiple classification tasks are presented sequentially so that data from past tasks are no longer accessible, and data from the current task can be seen only once. To address this challenging scenario, we propose a memory-based representation learning technique equipped with our new loss functions. Our approach does not require negative data or knowledge of task boundaries and performs well with smaller batch sizes while being computationally efficient. We demonstrate with extensive experiments that the proposed method outperforms the current state-of-the-art methods on both standard evaluation scenarios and realistic scenarios with blurry task boundaries. For reproducibility, we use the same training pipeline for every compared method and share the code at https://t.ly/SQTj.
ColdNAS: Search to Modulate for User Cold-Start Recommendation
Abstract
Making personalized recommendation for cold-start users, who only have a few interaction histories, is a challenging problem in recommendation systems. Recent works leverage hypernetworks to directly map user interaction histories to user-specific parameters, which are then used to modulate predictor by feature-wise linear modulation function. These works obtain the state-of-the-art performance. However, the physical meaning of scaling and shifting in recommendation data is unclear. Instead of using a fixed modulation function and deciding modulation position by expertise, we propose a modulation framework called ColdNAS for user cold-start problem, where we look for proper modulation structure, including function and position, via neural architecture search. We design a search space which covers broad models and theoretically prove that this search space can be transformed to a much smaller space, enabling an efficient and robust one-shot search algorithm. Extensive experimental results on benchmark datasets show that ColdNAS consistently performs the best. We observe that different modulation functions lead to the best performance on different datasets, which validates the necessity of designing a searching-based method.
Generate-then-Retrieve: Intent-Aware FAQ Retrieval in Product Search
Abstract
Customers interacting with product search engines are increasingly formulating information-seeking queries. Frequently Asked Question (FAQ) retrieval aims to retrieve common question-answer pairs for a user query with question intent. Integrating FAQ retrieval in product search can not only empower users to make more informed purchase decisions, but also enhance user retention through efficient post-purchase support. Determining when an FAQ entry can satisfy a user's information need within product search, without disrupting their shopping experience, represents an important challenge. We propose an intent-aware FAQ retrieval system consisting of (1) an intent classifier that predicts when a user's information need can be answered by an FAQ; (2) a reformulation model that rewrites a query into a natural question. Offline evaluation demonstrates that our approach improves Hit@1 by 13% on retrieving ground-truth FAQs, while reducing latency by 95% compared to baseline systems. These improvements are further validated by real user feedback, where 71% of displayed FAQs on top of product search results received explicit positive user feedback. Overall, our findings show promising directions for integrating FAQ retrieval into product search at scale.
DVIS: Decoupled Video Instance Segmentation Framework
Authors: Tao Zhang, Xingye Tian, Yu Wu, Shunping Ji, Xuebo Wang, Yuan Zhang, Pengfei Wan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Video instance segmentation (VIS) is a critical task with diverse applications, including autonomous driving and video editing. Existing methods often underperform on complex and long videos in real world, primarily due to two factors. Firstly, offline methods are limited by the tightly-coupled modeling paradigm, which treats all frames equally and disregards the interdependencies between adjacent frames. Consequently, this leads to the introduction of excessive noise during long-term temporal alignment. Secondly, online methods suffer from inadequate utilization of temporal information. To tackle these challenges, we propose a decoupling strategy for VIS by dividing it into three independent sub-tasks: segmentation, tracking, and refinement. The efficacy of the decoupling strategy relies on two crucial elements: 1) attaining precise long-term alignment outcomes via frame-by-frame association during tracking, and 2) the effective utilization of temporal information predicated on the aforementioned accurate alignment outcomes during refinement. We introduce a novel referring tracker and temporal refiner to construct the \textbf{D}ecoupled \textbf{VIS} framework (\textbf{DVIS}). DVIS achieves new SOTA performance in both VIS and VPS, surpassing the current SOTA methods by 7.3 AP and 9.6 VPQ on the OVIS and VIPSeg datasets, which are the most challenging and realistic benchmarks. Moreover, thanks to the decoupling strategy, the referring tracker and temporal refiner are super light-weight (only 1.69\% of the segmenter FLOPs), allowing for efficient training and inference on a single GPU with 11G memory. The code is available at \href{https://github.com/zhang-tao-whu/DVIS}{https://github.com/zhang-tao-whu/DVIS}.
Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
Authors: Peggy Tang, Junbin Gao, Lei Zhang, Zhiyong Wang
Abstract
Recently, compressive text summarisation offers a balance between the conciseness issue of extractive summarisation and the factual hallucination issue of abstractive summarisation. However, most existing compressive summarisation methods are supervised, relying on the expensive effort of creating a new training dataset with corresponding compressive summaries. In this paper, we propose an efficient and interpretable compressive summarisation method that utilises unsupervised dual-agent reinforcement learning to optimise a summary's semantic coverage and fluency by simulating human judgment on summarisation quality. Our model consists of an extractor agent and a compressor agent, and both agents have a multi-head attentional pointer-based structure. The extractor agent first chooses salient sentences from a document, and then the compressor agent compresses these extracted sentences by selecting salient words to form a summary without using reference summaries to compute the summary reward. To our best knowledge, this is the first work on unsupervised compressive summarisation. Experimental results on three widely used datasets (e.g., Newsroom, CNN/DM, and XSum) show that our model achieves promising performance and a significant improvement on Newsroom in terms of the ROUGE metric, as well as interpretability of semantic coverage of summarisation results.
GaitGCI: Generative Counterfactual Intervention for Gait Recognition
Authors: Huanzhang Dou, Pengyi Zhang, Wei Su, Yunlong Yu, Yining Lin, Xi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Gait is one of the most promising biometrics that aims to identify pedestrians from their walking patterns. However, prevailing methods are susceptible to confounders, resulting in the networks hardly focusing on the regions that reflect effective walking patterns. To address this fundamental problem in gait recognition, we propose a Generative Counterfactual Intervention framework, dubbed GaitGCI, consisting of Counterfactual Intervention Learning (CIL) and Diversity-Constrained Dynamic Convolution (DCDC). CIL eliminates the impacts of confounders by maximizing the likelihood difference between factual/counterfactual attention while DCDC adaptively generates sample-wise factual/counterfactual attention to efficiently perceive the sample-wise properties. With matrix decomposition and diversity constraint, DCDC guarantees the model to be efficient and effective. Extensive experiments indicate that proposed GaitGCI: 1) could effectively focus on the discriminative and interpretable regions that reflect gait pattern; 2) is model-agnostic and could be plugged into existing models to improve performance with nearly no extra cost; 3) efficiently achieves state-of-the-art performance on arbitrary scenarios (in-the-lab and in-the-wild).
A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and Touch
Authors: Federico Ceola, Elisa Maiettini, Lorenzo Rosasco, Lorenzo Natale
Abstract
Multi-fingered robotic hands could enable robots to perform sophisticated manipulation tasks. However, teaching a robot to grasp objects with an anthropomorphic hand is an arduous problem due to the high dimensionality of state and action spaces. Deep Reinforcement Learning (DRL) offers techniques to design control policies for this kind of problems without explicit environment or hand modeling. However, training these policies with state-of-the-art model-free algorithms is greatly challenging for multi-fingered hands. The main problem is that an efficient exploration of the environment is not possible for such high-dimensional problems, thus causing issues in the initial phases of policy optimization. One possibility to address this is to rely on off-line task demonstrations. However, oftentimes this is incredibly demanding in terms of time and computational resources. In this work, we overcome these requirements and propose the A Grasp Pose is All You Need (G-PAYN) method for the anthropomorphic hand of the iCub humanoid. We develop an approach to automatically collect task demonstrations to initialize the training of the policy. The proposed grasping pipeline starts from a grasp pose generated by an external algorithm, used to initiate the movement. Then a control policy (previously trained with the proposed G-PAYN) is used to reach and grab the object. We deployed the iCub into the MuJoCo simulator and use it to test our approach with objects from the YCB-Video dataset. The results show that G-PAYN outperforms current DRL techniques in the considered setting, in terms of success rate and execution time with respect to the baselines. The code to reproduce the experiments will be released upon acceptance.
Correlated Pseudorandomness from the Hardness of Quasi-Abelian Decoding
Authors: Maxime Bombar, Geoffroy Couteau, Alain Couvreur, Clément Ducros
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
Abstract
Secure computation often benefits from the use of correlated randomness to achieve fast, non-cryptographic online protocols. A recent paradigm put forth by Boyle $\textit{et al.}$ (CCS 2018, Crypto 2019) showed how pseudorandom correlation generators (PCG) can be used to generate large amounts of useful forms of correlated (pseudo)randomness, using minimal interactions followed solely by local computations, yielding silent secure two-party computation protocols (protocols where the preprocessing phase requires almost no communication). An additional property called programmability allows to extend this to build N-party protocols. However, known constructions for programmable PCG's can only produce OLE's over large fields, and use rather new splittable Ring-LPN assumption. In this work, we overcome both limitations. To this end, we introduce the quasi-abelian syndrome decoding problem (QA-SD), a family of assumptions which generalises the well-established quasi-cyclic syndrome decoding assumption. Building upon QA-SD, we construct new programmable PCG's for OLE's over any field $\mathbb{F}_q$ with $q>2$. Our analysis also sheds light on the security of the ring-LPN assumption used in Boyle $\textit{et al.}$ (Crypto 2020). Using our new PCG's, we obtain the first efficient N-party silent secure computation protocols for computing general arithmetic circuit over $\mathbb{F}_q$ for any $q>2$.
Complexity of Anchored Crossing Number and Crossing Number of Almost Planar Graphs
Abstract
In this paper we deal with the problem of computing the exact crossing number of almost planar graphs and the closely related problem of computing the exact anchored crossing number of a pair of planar graphs. It was shown by [Cabello and Mohar, 2013] that both problems are NP-hard; although they required an unbounded number of high-degree vertices (in the first problem) or an unbounded number of anchors (in the second problem) to prove their result. Somehow surprisingly, only three vertices of degree greater than 3, or only three anchors, are sufficient to maintain hardness of these problems, as we prove here. The new result also improves the previous result on hardness of joint crossing number on surfaces by [Hlin\v{e}n\'y and Salazar, 2015]. Our result is best possible in the anchored case since the anchored crossing number of a pair of planar graphs with two anchors each is trivial, and close to being best possible in the almost planar case since the crossing number is efficiently computable for almost planar graphs of maximum degree 3 [Riskin 1996, Cabello and Mohar 2011].
Distributed Flocking Control of Aerial Vehicles Based on a Markov Random Field
Abstract
The distributed flocking control of collective aerial vehicles has extraordinary advantages in scalability and reliability, \emph{etc.} However, it is still challenging to design a reliable, efficient, and responsive flocking algorithm. In this paper, a distributed predictive flocking framework is presented based on a Markov random field (MRF). The MRF is used to characterize the optimization problem that is eventually resolved by discretizing the input space. Potential functions are employed to describe the interactions between aerial vehicles and as indicators of flight performance. The dynamic constraints are taken into account in the candidate feasible trajectories which correspond to random variables. Numerical simulation shows that compared with some existing latest methods, the proposed algorithm has better-flocking cohesion and control efficiency performances. Experiments are also conducted to demonstrate the feasibility of the proposed algorithm.
Adversarial Attacks and Defenses for Semantic Communication in Vehicular Metaverses
Abstract
For vehicular metaverses, one of the ultimate user-centric goals is to optimize the immersive experience and Quality of Service (QoS) for users on board. Semantic Communication (SemCom) has been introduced as a revolutionary paradigm that significantly eases communication resource pressure for vehicular metaverse applications to achieve this goal. SemCom enables high-quality and ultra-efficient vehicular communication, even with explosively increasing data traffic among vehicles. In this article, we propose a hierarchical SemCom-enabled vehicular metaverses framework consisting of the global metaverse, local metaverses, SemCom module, and resource pool. The global and local metaverses are brand-new concepts from the metaverse's distribution standpoint. Considering the QoS of users, this article explores the potential security vulnerabilities of the proposed framework. To that purpose, this study highlights a specific security risk to the framework's SemCom module and offers a viable defense solution, so encouraging community researchers to focus more on vehicular metaverse security. Finally, we provide an overview of the open issues of secure SemCom in the vehicular metaverses, notably pointing out potential future research directions.
SciLit: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation
Abstract
Scientific writing involves retrieving, summarizing, and citing relevant papers, which can be time-consuming processes in large and rapidly evolving fields. By making these processes inter-operable, natural language processing (NLP) provides opportunities for creating end-to-end assistive writing tools. We propose SciLit, a pipeline that automatically recommends relevant papers, extracts highlights, and suggests a reference sentence as a citation of a paper, taking into consideration the user-provided context and keywords. SciLit efficiently recommends papers from large databases of hundreds of millions of papers using a two-stage pre-fetching and re-ranking literature search system that flexibly deals with addition and removal of a paper database. We provide a convenient user interface that displays the recommended papers as extractive summaries and that offers abstractively-generated citing sentences which are aligned with the provided context and which mention the chosen keyword(s). Our assistive tool for literature discovery and scientific writing is available at https://scilit.vercel.app
State Regularized Policy Optimization on Data with Dynamics Shift
Authors: Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
Abstract
In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data with dynamics shift, i.e., with different underlying environment dynamics. A majority of current methods address such issue by training context encoders to identify environment parameters. Data with dynamics shift are separated according to their environment parameters to train the corresponding policy. However, these methods can be sample inefficient as data are used \textit{ad hoc}, and policies trained for one dynamics cannot benefit from data collected in all other environments with different dynamics. In this paper, we find that in many environments with similar structures and different dynamics, optimal policies have similar stationary state distributions. We exploit such property and learn the stationary state distribution from data with dynamics shift for efficient data reuse. Such distribution is used to regularize the policy trained in a new environment, leading to the SRPO (\textbf{S}tate \textbf{R}egularized \textbf{P}olicy \textbf{O}ptimization) algorithm. To conduct theoretical analyses, the intuition of similar environment structures is characterized by the notion of homomorphous MDPs. We then demonstrate a lower-bound performance guarantee on policies regularized by the stationary state distribution. In practice, SRPO can be an add-on module to context-based algorithms in both online and offline RL settings. Experimental results show that SRPO can make several context-based algorithms far more data efficient and significantly improve their overall performance.
Enabling Efficient Interaction between an Algorithm Agent and an LLM: A Reinforcement Learning Approach
Authors: Bin Hu, Chenyang Zhao, Pu Zhang, Zihao Zhou, Yuanhang Yang, Zenglin Xu, Bin Liu
Abstract
Large language models (LLMs) encode a vast amount of world knowledge acquired from massive text datasets. Recent studies have demonstrated that LLMs can assist an algorithm agent in solving complex sequential decision making tasks in embodied environments by providing high-level instructions. However, interacting with LLMs can be time-consuming, as in many practical scenarios, they require a significant amount of storage space that can only be deployed on remote cloud server nodes. Additionally, using commercial LLMs can be costly since they may charge based on usage frequency. In this paper, we explore how to enable efficient and cost-effective interactions between the agent and an LLM. We propose a reinforcement learning based mediator model that determines when it is necessary to consult LLMs for high-level instructions to accomplish a target task. Experiments on 4 MiniGrid environments that entail planning sub-goals demonstrate that our method can learn to solve target tasks with only a few necessary interactions with an LLM, significantly reducing interaction costs in testing environments, compared with baseline methods. Experimental results also suggest that by learning a mediator model to interact with the LLM, the agent's performance becomes more robust against both exploratory and stochastic environments.
BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs
Authors: Daniel Daza, Dimitrios Alivanistos, Payal Mitra, Thom Pijnenburg, Michael Cochez, Paul Groth
Abstract
Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. We find settings involving low degree entities, which make up for a substantial amount of the set of entities in the KG, where our method outperforms the baselines. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. Our implementation is available at https://github.com/elsevier-AI-Lab/BioBLP .
A Data-Efficient Approach for Long-Term Human Motion Prediction Using Maps of Dynamics
Authors: Yufei Zhu, Andrey Rudenko, Tomasz P. Kucner, Achim J. Lilienthal, Martin Magnusson
Abstract
Human motion prediction is essential for the safe and smooth operation of mobile service robots and intelligent vehicles around people. Commonly used neural network-based approaches often require large amounts of complete trajectories to represent motion dynamics in complex semantically-rich spaces. This requirement may complicate deployment of physical systems in new environments, especially when the data is being collected online from onboard sensors. In this paper we explore a data-efficient alternative using maps of dynamics (MoD) to represent place-dependent multi-modal spatial motion patterns, learned from prior observations. Our approach can perform efficient human motion prediction in the long-term perspective of up to 60 seconds. We quantitatively evaluate its accuracy with limited amount of training data in comparison to an LSTM-based baseline, and qualitatively show that the predicted trajectories reflect the natural semantic properties of the environment, e.g. the locations of short- and long-term goals, navigation in narrow passages, around obstacles, etc.
FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping
Authors: Minchen Yu, Ao Wang, Dong Chen, Haoxuan Yu, Xiaonan Luo, Zhuohao Li, Wei Wang, Ruichuan Chen, Dapeng Nie, Haoran Yang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
The dynamic request patterns of machine learning (ML) inference workloads have driven an increasing trend towards exploiting serverless computing for scalable ML model serving. However, today's serverless platforms lack efficient support for GPUs -- provisioning functions on GPUs incurs extremely high overhead, forcing them to keep long-running even when idling for reduced cold starts. This leads to significant resource waste to perform ML inference and hinders the pay-per-use billing for GPUs. In this paper, we present FaaSwap, a serverless platform enabling fine-grained, request-level GPU sharing for resource-efficient ML inference. FaaSwap leverages model swapping to support fast inference execution at low resource cost. It keeps models in a host which has a large amount of cheap memory and quickly swaps models to GPUs when requested, reducing per-function keep-alive cost and enabling efficient GPU sharing across much more functions. FaaSwap also supports swapping models between GPUs for load balancing and improved inference performance. In FaaSwap, we design sophisticated request scheduling and memory management algorithms that efficiently exploit model swapping to reduce GPU cost and meet latency service-level objectives (SLOs) for all inference functions. We have implemented and integrated FaaSwap into Alibaba Cloud Function Compute (FC), one of the world's largest commercial serverless platform. Evaluation results show that FaaSwap can achieve low-latency model swapping, efficiently share a GPU across hundreds of functions, and satisfy per-function latency SLOs at scale.
On Manipulating Signals of User-Item Graph: A Jacobi Polynomial-based Graph Collaborative Filtering
Authors: Jiayan Guo, Lun Du, Xu Chen, Xiaojun Ma, Qiang Fu, Shi Han, Dongmei Zhang, Yan Zhang
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Abstract
Collaborative filtering (CF) is an important research direction in recommender systems that aims to make recommendations given the information on user-item interactions. Graph CF has attracted more and more attention in recent years due to its effectiveness in leveraging high-order information in the user-item bipartite graph for better recommendations. Specifically, recent studies show the success of graph neural networks (GNN) for CF is attributed to its low-pass filtering effects. However, current researches lack a study of how different signal components contributes to recommendations, and how to design strategies to properly use them well. To this end, from the view of spectral transformation, we analyze the important factors that a graph filter should consider to achieve better performance. Based on the discoveries, we design JGCF, an efficient and effective method for CF based on Jacobi polynomial bases and frequency decomposition strategies. Extensive experiments on four widely used public datasets show the effectiveness and efficiency of the proposed methods, which brings at most 27.06% performance gain on Alibaba-iFashion. Besides, the experimental results also show that JGCF is better at handling sparse datasets, which shows potential in making recommendations for cold-start users.
Novel DeepONet architecture to predict stresses in elastoplastic structures with variable complex geometries and loads
Abstract
A novel deep operator network (DeepONet) with a residual U-Net (ResUNet) as the trunk network is devised to predict full-field highly nonlinear elastic-plastic stress response for complex geometries obtained from topology optimization under variable loads. The proposed DeepONet uses a ResUNet in the trunk to encode complex input geometries, and a fully-connected branch network encodes the parametric loads. Additional information fusion is introduced via an element-wise multiplication of the encoded latent space to improve prediction accuracy further. The performance of the proposed DeepONet was compared to two baseline models, a standalone ResUNet and a DeepONet with fully connected networks as the branch and trunk. The results show that ResUNet and the proposed DeepONet share comparable accuracy; both can predict the stress field and accurately identify stress concentration points. However, the novel DeepONet is more memory efficient and allows greater flexibility with framework architecture modifications. The DeepONet with fully connected networks suffers from high prediction error due to its inability to effectively encode the complex, varying geometry. Once trained, all three networks can predict the full stress distribution orders of magnitude faster than finite element simulations. The proposed network can quickly guide preliminary optimization, designs, sensitivity analysis, uncertainty quantification, and many other nonlinear analyses that require extensive forward evaluations with variable geometries, loads, and other parameters. This work marks the first time a ResUNet is used as the trunk network in the DeepONet architecture and the first time that DeepONet solves problems with complex, varying input geometries under parametric loads and elasto-plastic material behavior.
Efficient Centrality Maximization with Rademacher Averages
Authors: Leonardo Pellegrina
Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS)
Abstract
The identification of the set of k most central nodes of a graph, or centrality maximization, is a key task in network analysis, with various applications ranging from finding communities in social and biological networks to understanding which seed nodes are important to diffuse information in a graph. As the exact computation of centrality measures does not scale to modern-sized networks, the most practical solution is to resort to rigorous, but efficiently computable, randomized approximations. In this work we present CentRA, the first algorithm based on progressive sampling to compute high-quality approximations of the set of k most central nodes. CentRA is based on a novel approach to efficiently estimate Monte Carlo Rademacher Averages, a powerful tool from statistical learning theory to compute sharp data-dependent approximation bounds. Then, we study the sample complexity of centrality maximization using the VC-dimension, a key concept from statistical learning theory. We show that the number of random samples required to compute high-quality approximations scales with finer characteristics of the graph, such as its vertex diameter, or of the centrality of interest, significantly improving looser bounds derived from standard techniques. We apply CentRA to analyze large real-world networks, showing that it significantly outperforms the state-of-the-art approximation algorithm in terms of number of samples, running times, and accuracy.
Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?
Authors: Jonathan Will, Lauritz Thamsen, Dominik Scheinert, Odej Kao
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)
Abstract
Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial consideration. In this paper, we analyze the challenge of efficient resource allocation for distributed data processing, focusing on memory. We emphasize that in-memory processing with in-memory data processing frameworks can undermine resource efficiency. Based on the findings of our trace data analysis, we compile requirements towards an automated solution for efficient cluster resource allocation.
Human-imperceptible, Machine-recognizable Images
Authors: Fusheng Hao, Fengxiang He, Yikai Wang, Fuxiang Wu, Jing Zhang, Jun Cheng, Dacheng Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Abstract
Massive human-related data is collected to train neural networks for computer vision tasks. A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data. To reconcile this conflict, this paper proposes an efficient privacy-preserving learning paradigm, where images are first encrypted to become ``human-imperceptible, machine-recognizable'' via one of the two encryption strategies: (1) random shuffling to a set of equally-sized patches and (2) mixing-up sub-patches of the images. Then, minimal adaptations are made to vision transformer to enable it to learn on the encrypted images for vision tasks, including image classification and object detection. Extensive experiments on ImageNet and COCO show that the proposed paradigm achieves comparable accuracy with the competitive methods. Decrypting the encrypted images requires solving an NP-hard jigsaw puzzle or an ill-posed inverse problem, which is empirically shown intractable to be recovered by various attackers, including the powerful vision transformer-based attacker. We thus show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information. The code is available at \url{https://github.com/FushengHao/PrivacyPreservingML.}
YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection
Abstract
Accurate polyp detection is essential for assisting clinical rectal cancer diagnoses. Colonoscopy videos contain richer information than still images, making them a valuable resource for deep learning methods. Great efforts have been made to conduct video polyp detection through multi-frame temporal/spatial aggregation. However, unlike common fixed-camera video, the camera-moving scene in colonoscopy videos can cause rapid video jitters, leading to unstable training for existing video detection models. Additionally, the concealed nature of some polyps and the complex background environment further hinder the performance of existing video detectors. In this paper, we propose the \textbf{YONA} (\textbf{Y}ou \textbf{O}nly \textbf{N}eed one \textbf{A}djacent Reference-frame) method, an efficient end-to-end training framework for video polyp detection. YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations. Specifically, for the foreground, YONA adaptively aligns the current frame's channel activation patterns with its adjacent reference frames according to their foreground similarity. For the background, YONA conducts background dynamic alignment guided by inter-frame difference to eliminate the invalid features produced by drastic spatial jitters. Moreover, YONA applies cross-frame contrastive learning during training, leveraging the ground truth bounding box to improve the model's perception of polyp and background. Quantitative and qualitative experiments on three public challenging benchmarks demonstrate that our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.
ESL-SNNs: An Evolutionary Structure Learning Strategy for Spiking Neural Networks
Authors: Jiangrong Shen, Qi Xu, Jian K. Liu, Yueming Wang, Gang Pan, Huajin Tang
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Abstract
Spiking neural networks (SNNs) have manifested remarkable advantages in power consumption and event-driven property during the inference process. To take full advantage of low power consumption and improve the efficiency of these models further, the pruning methods have been explored to find sparse SNNs without redundancy connections after training. However, parameter redundancy still hinders the efficiency of SNNs during training. In the human brain, the rewiring process of neural networks is highly dynamic, while synaptic connections maintain relatively sparse during brain development. Inspired by this, here we propose an efficient evolutionary structure learning (ESL) framework for SNNs, named ESL-SNNs, to implement the sparse SNN training from scratch. The pruning and regeneration of synaptic connections in SNNs evolve dynamically during learning, yet keep the structural sparsity at a certain level. As a result, the ESL-SNNs can search for optimal sparse connectivity by exploring all possible parameters across time. Our experiments show that the proposed ESL-SNNs framework is able to learn SNNs with sparse structures effectively while reducing the limited accuracy. The ESL-SNNs achieve merely 0.28% accuracy loss with 10% connection density on the DVS-Cifar10 dataset. Our work presents a brand-new approach for sparse training of SNNs from scratch with biologically plausible evolutionary mechanisms, closing the gap in the expressibility between sparse training and dense training. Hence, it has great potential for SNN lightweight training and inference with low power consumption and small memory usage.
DashQL -- Complete Analysis Workflows with SQL
Authors: André Kohn, Dominik Moritz, Thomas Neumann
Abstract
We present DashQL, a language that describes complete analysis workflows in self-contained scripts. DashQL combines SQL, the grammar of relational database systems, with a grammar of graphics in a grammar of analytics. It supports preparing and visualizing arbitrarily complex SQL statements in a single coherent language. The proximity to SQL facilitates holistic optimizations of analysis workflows covering data input, encoding, transformations, and visualizations. These optimizations use model and query metadata for visualization-driven aggregation, remote predicate pushdown, and adaptive materialization. We introduce the DashQL language as an extension of SQL and describe the efficient and interactive processing of text-based analysis workflows.
Numerical solution of the Biot/elasticity interface problem using virtual element methods
Authors: Sarvesh Kumar, David Mora, Ricardo Ruiz-Baier, Nitesh Verma
Abstract
We propose, analyze and implement a virtual element discretization for an interfacial poroelasticity-elasticity consolidation problem. The formulation of the time-dependent poroelasticity equations uses displacement, fluid pressure, and total pressure, and the elasticity equations are written in the displacement-pressure formulation. The construction of the virtual element scheme does not require Lagrange multipliers to impose the transmission conditions (continuity of displacement and total traction, and no-flux for the fluid) on the interface. We show the stability and convergence of the virtual element method for different polynomial degrees, and the error bounds are robust with respect to delicate model parameters (such as Lame constants, permeability, and storativity coefficient). Finally, we provide numerical examples that illustrate the properties of the scheme.
Towards Memory-Efficient Training for Extremely Large Output Spaces -- Learning with 500k Labels on a Single Commodity GPU
Abstract
In classification problems with large output spaces (up to millions of labels), the last layer can require an enormous amount of memory. Using sparse connectivity would drastically reduce the memory requirements, but as we show below, it can result in much diminished predictive performance of the model. Fortunately, we found that this can be mitigated by introducing a penultimate layer of intermediate size. We further demonstrate that one can constrain the connectivity of the sparse layer to be uniform, in the sense that each output neuron will have the exact same number of incoming connections. This allows for efficient implementations of sparse matrix multiplication and connection redistribution on GPU hardware. Via a custom CUDA implementation, we show that the proposed approach can scale to datasets with 670,000 labels on a single commodity GPU with only 4GB memory.
GMMap: Memory-Efficient Continuous Occupancy Map Using Gaussian Mixture Model
Authors: Peter Zhi Xuan Li, Sertac Karaman, Vivienne Sze
Abstract
Energy consumption of memory accesses dominates the compute energy in energy-constrained robots which require a compact 3D map of the environment to achieve autonomy. Recent mapping frameworks only focused on reducing the map size while incurring significant memory usage during map construction due to multi-pass processing of each depth image. In this work, we present a memory-efficient continuous occupancy map, named GMMap, that accurately models the 3D environment using a Gaussian Mixture Model (GMM). Memory-efficient GMMap construction is enabled by the single-pass compression of depth images into local GMMs which are directly fused together into a globally-consistent map. By extending Gaussian Mixture Regression to model unexplored regions, occupancy probability is directly computed from Gaussians. Using a low-power ARM Cortex A57 CPU, GMMap can be constructed in real-time at up to 60 images per second. Compared with prior works, GMMap maintains high accuracy while reducing the map size by at least 56%, memory overhead by at least 88%, DRAM access by at least 78%, and energy consumption by at least 69%. Thus, GMMap enables real-time 3D mapping on energy-constrained robots.
Residual-based error bound for physics-informed neural networks
Abstract
Neural networks are universal approximators and are studied for their use in solving differential equations. However, a major criticism is the lack of error bounds for obtained solutions. This paper proposes a technique to rigorously evaluate the error bound of Physics-Informed Neural Networks (PINNs) on most linear ordinary differential equations (ODEs), certain nonlinear ODEs, and first-order linear partial differential equations (PDEs). The error bound is based purely on equation structure and residual information and does not depend on assumptions of how well the networks are trained. We propose algorithms that bound the error efficiently. Some proposed algorithms provide tighter bounds than others at the cost of longer run time.
Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning
Abstract
We study a sequential decision making problem between a principal and an agent with incomplete information on both sides. In this model, the principal and the agent interact in a stochastic environment, and each is privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to provide signals about her own information. The principal and the agent communicate their signals to each other, and select their actions independently based on this communication. Each player receives a payoff based on the state and their joint actions, and the environment moves to a new state. The interaction continues over a finite time horizon, and both players act to optimize their own total payoffs over the horizon. Our model encompasses as special cases stochastic games of incomplete information and POMDPs, as well as sequential Bayesian persuasion and mechanism design problems. We study both computation of optimal policies and learning in our setting. While the general problems are computationally intractable, we study algorithmic solutions under a conditional independence assumption on the underlying state-observation distributions. We present an polynomial-time algorithm to compute the principal's optimal policy up to an additive approximation. Additionally, we show an efficient learning algorithm in the case where the transition probabilities are not known beforehand. The algorithm guarantees sublinear regret for both players.
MTS2Graph: Interpretable Multivariate Time Series Classification with Temporal Evolving Graphs
Authors: Raneen Younis, Abdul Hakmeh, Zahra Ahmadi
Abstract
Conventional time series classification approaches based on bags of patterns or shapelets face significant challenges in dealing with a vast amount of feature candidates from high-dimensional multivariate data. In contrast, deep neural networks can learn low-dimensional features efficiently, and in particular, Convolutional Neural Networks (CNN) have shown promising results in classifying Multivariate Time Series (MTS) data. A key factor in the success of deep neural networks is this astonishing expressive power. However, this power comes at the cost of complex, black-boxed models, conflicting with the goals of building reliable and human-understandable models. An essential criterion in understanding such predictive deep models involves quantifying the contribution of time-varying input variables to the classification. Hence, in this work, we introduce a new framework for interpreting multivariate time series data by extracting and clustering the input representative patterns that highly activate CNN neurons. This way, we identify each signal's role and dependencies, considering all possible combinations of signals in the MTS input. Then, we construct a graph that captures the temporal relationship between the extracted patterns for each layer. An effective graph merging strategy finds the connection of each node to the previous layer's nodes. Finally, a graph embedding algorithm generates new representations of the created interpretable time-series features. To evaluate the performance of our proposed framework, we run extensive experiments on eight datasets of the UCR/UEA archive, along with HAR and PAM datasets. The experiments indicate the benefit of our time-aware graph-based representation in MTS classification while enriching them with more interpretability.
Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere
Authors: Boris Bonev, Thorsten Kurth, Christian Hundt, Jaideep Pathak, Maximilian Baust, Karthik Kashinath, Anima Anandkumar
Abstract
Fourier Neural Operators (FNOs) have proven to be an efficient and effective method for resolution-independent operator learning in a broad variety of application areas across scientific machine learning. A key reason for their success is their ability to accurately model long-range dependencies in spatio-temporal data by learning global convolutions in a computationally efficient manner. To this end, FNOs rely on the discrete Fourier transform (DFT), however, DFTs cause visual and spectral artifacts as well as pronounced dissipation when learning operators in spherical coordinates since they incorrectly assume a flat geometry. To overcome this limitation, we generalize FNOs on the sphere, introducing Spherical FNOs (SFNOs) for learning operators on spherical geometries. We apply SFNOs to forecasting atmospheric dynamics, and demonstrate stable auto-regressive rollouts for a year of simulated time (1,460 steps), while retaining physically plausible dynamics. The SFNO has important implications for machine learning-based simulation of climate dynamics that could eventually help accelerate our response to climate change.
Faster real root decision algorithm for symmetric polynomials
Authors: George Labahn, Cordian Riener, Mohab Safey El Din, Éric Schost, Thi Xuan Vu
Abstract
In this paper, we consider the problem of deciding the existence of real solutions to a system of polynomial equations having real coefficients, and which are invariant under the action of the symmetric group. We construct and analyze a Monte Carlo probabilistic algorithm which solves this problem, under some regularity assumptions on the input, by taking advantage of the symmetry invariance property. The complexity of our algorithm is polynomial in $d^s, {{n+d} \choose d}$, and ${{n} \choose {s+1}}$, where $n$ is the number of variables and $d$ is the maximal degree of $s$ input polynomials defining the real algebraic set under study. In particular, this complexity is polynomial in $n$ when $d$ and $s$ are fixed and is equal to $n^{O(1)}2^n$ when $d=n$.
Correction of Errors in Preference Ratings from Automated Metrics for Text Generation
Authors: Jan Deriu, Pius von Däniken, Don Tuggener, Mark Cieliebak
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
A major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensive, and automated metrics often display considerable disagreement with human judgments. In this paper, we propose a statistical model of Text Generation evaluation that accounts for the error-proneness of automated metrics when used to generate preference rankings between system outputs. We show that existing automated metrics are generally over-confident in assigning significant differences between systems in this setting. However, our model enables an efficient combination of human and automated ratings to remedy the error-proneness of the automated metrics. We show that using this combination, we only require about 50% of the human annotations typically used in evaluations to arrive at robust and statistically significant results while yielding the same evaluation outcome as the pure human evaluation in 95% of cases. We showcase the benefits of approach for three text generation tasks: dialogue systems, machine translation, and text summarization.
Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation
Authors: Xinrong Hu, Yu-Jen Chen, Tsung-Yi Ho, Yiyu Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent advances in denoising diffusion probabilistic models have shown great success in image synthesis tasks. While there are already works exploring the potential of this powerful tool in image semantic segmentation, its application in weakly supervised semantic segmentation (WSSS) remains relatively under-explored. Observing that conditional diffusion models (CDM) is capable of generating images subject to specific distributions, in this work, we utilize category-aware semantic information underlied in CDM to get the prediction mask of the target object with only image-level annotations. More specifically, we locate the desired class by approximating the derivative of the output of CDM w.r.t the input condition. Our method is different from previous diffusion model methods with guidance from an external classifier, which accumulates noises in the background during the reconstruction process. Our method outperforms state-of-the-art CAM and diffusion model methods on two public medical image segmentation datasets, which demonstrates that CDM is a promising tool in WSSS. Also, experiment shows our method is more time-efficient than existing diffusion model methods, making it practical for wider applications.
Fast Context Adaptation in Cost-Aware Continual Learning
Authors: Seyyidahmed Lahmer, Federico Mason, Federico Chiariotti, Andrea Zanella
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Abstract
In the past few years, DRL has become a valuable solution to automatically learn efficient resource management strategies in complex networks with time-varying statistics. However, the increased complexity of 5G and Beyond networks requires correspondingly more complex learning agents and the learning process itself might end up competing with users for communication and computational resources. This creates friction: on the one hand, the learning process needs resources to quickly convergence to an effective strategy; on the other hand, the learning process needs to be efficient, i.e., take as few resources as possible from the user's data plane, so as not to throttle users' QoS. In this paper, we investigate this trade-off and propose a dynamic strategy to balance the resources assigned to the data plane and those reserved for learning. With the proposed approach, a learning agent can quickly converge to an efficient resource allocation strategy and adapt to changes in the environment as for the CL paradigm, while minimizing the impact on the users' QoS. Simulation results show that the proposed method outperforms static allocation methods with minimal learning overhead, almost reaching the performance of an ideal out-of-band CL solution.
Model Spider: Learning to Rank Pre-Trained Models Efficiently
Abstract
Figuring out which Pre-Trained Model (PTM) from a model zoo fits the target task is essential to take advantage of plentiful model resources. With the availability of numerous heterogeneous PTMs from diverse fields, efficiently selecting the most suitable PTM is challenging due to the time-consuming costs of carrying out forward or backward passes over all PTMs. In this paper, we propose Model Spider, which tokenizes both PTMs and tasks by summarizing their characteristics into vectors to enable efficient PTM selection. By leveraging the approximated performance of PTMs on a separate set of training tasks, Model Spider learns to construct tokens and measure the fitness score between a model-task pair via their tokens. The ability to rank relevant PTMs higher than others generalizes to new tasks. With the top-ranked PTM candidates, we further learn to enrich task tokens with their PTM-specific semantics to re-rank the PTMs for better selection. Model Spider balances efficiency and selection ability, making PTM selection like a spider preying on a web. Model Spider demonstrates promising performance in various configurations of model zoos.
Keyword: faster
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models
Authors: Alexander Lin, Bahareh Tolooshams, Yves Atchadé, Demba Ba
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Computation (stat.CO)
Abstract
Latent Gaussian models have a rich history in statistics and machine learning, with applications ranging from factor analysis to compressed sensing to time series analysis. The classical method for maximizing the likelihood of these models is the expectation-maximization (EM) algorithm. For problems with high-dimensional latent variables and large datasets, EM scales poorly because it needs to invert as many large covariance matrices as the number of data points. We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversion. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
Accelerating Range Minimum Queries with Ray Tracing Cores
Authors: Enzo Meneses, Cristóbal A. Navarro, Héctor Ferrada, Felipe A. Quezada
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)
Abstract
During the last decade GPU technology has shifted from pure general purpose computation to the inclusion of application specific integrated circuits (ASICs), such as Tensor Cores and Ray Tracing (RT) cores. Although these special purpose GPU cores were designed to further accelerate specific fields such as AI and real-time rendering, recent research has managed to exploit them to further accelerate other tasks that typically used regular GPU computing. In this work we present RTXRMQ, a new approach that can compute range minimum queries (RMQs) with RT cores. The main contribution is the proposal of a geometric solution for RMQ, where elements become triangles that are placed and shaped according to the element's value and position in the array, respectively, such that the closest hit of a ray launched from a point given by the query parameters corresponds to the result of that query. Experimental results show that RTXRMQ is currently best suited for small query ranges relative to the problem size, achieving up to $5\times$ and $2.3\times$ of speedup over state of the art CPU (HRMQ) and GPU (LCA) approaches, respectively. Although for medium and large query ranges RTXRMQ is currently surpassed by LCA, it is still competitive by being $2.5\times$ and $4\times$ faster than HRMQ which is a highly parallel CPU approach. Furthermore, performance scaling experiments across the latest RTX GPU architectures show that if the current RT scaling trend continues, then RTXRMQ's performance would scale at a higher rate than HRMQ and LCA, making the approach even more relevant for future high performance applications that employ batches of RMQs.
CoSiNES: Contrastive Siamese Network for Entity Standardization
Authors: Jiaqing Yuan, Michele Merler, Mihir Choudhury, Raju Pavuluri, Munindar P. Singh, Maja Vukovic
Abstract
Entity standardization maps noisy mentions from free-form text to standard entities in a knowledge base. The unique challenge of this task relative to other entity-related tasks is the lack of surrounding context and numerous variations in the surface form of the mentions, especially when it comes to generalization across domains where labeled data is scarce. Previous research mostly focuses on developing models either heavily relying on context, or dedicated solely to a specific domain. In contrast, we propose CoSiNES, a generic and adaptable framework with Contrastive Siamese Network for Entity Standardization that effectively adapts a pretrained language model to capture the syntax and semantics of the entities in a new domain. We construct a new dataset in the technology domain, which contains 640 technical stack entities and 6,412 mentions collected from industrial content management systems. We demonstrate that CoSiNES yields higher accuracy and faster runtime than baselines derived from leading methods in this domain. CoSiNES also achieves competitive performance in four standard datasets from the chemistry, medicine, and biomedical domains, demonstrating its cross-domain applicability.
G-CAME: Gaussian-Class Activation Mapping Explainer for Object Detectors
Authors: Quoc Khanh Nguyen, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Van Binh Truong, Quoc Hung Cao
Abstract
Nowadays, deep neural networks for object detection in images are very prevalent. However, due to the complexity of these networks, users find it hard to understand why these objects are detected by models. We proposed Gaussian Class Activation Mapping Explainer (G-CAME), which generates a saliency map as the explanation for object detection models. G-CAME can be considered a CAM-based method that uses the activation maps of selected layers combined with the Gaussian kernel to highlight the important regions in the image for the predicted box. Compared with other Region-based methods, G-CAME can transcend time constraints as it takes a very short time to explain an object. We also evaluated our method qualitatively and quantitatively with YOLOX on the MS-COCO 2017 dataset and guided to apply G-CAME into the two-stage Faster-RCNN model.
Rigorous Runtime Analysis of MOEA/D for Solving Multi-Objective Minimum Weight Base Problems
Authors: Anh Viet Do, Aneta Neumann, Frank Neumann, Andrew M. Sutton
Subjects: Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Neural and Evolutionary Computing (cs.NE)
Abstract
We study the multi-objective minimum weight base problem, an abstraction of classical NP-hard combinatorial problems such as the multi-objective minimum spanning tree problem. We prove some important properties of the convex hull of the non-dominated front, such as its approximation quality and an upper bound on the number of extreme points. Using these properties, we give the first run-time analysis of the MOEA/D algorithm for this problem, an evolutionary algorithm that effectively optimizes by decomposing the objectives into single-objective components. We show that the MOEA/D, given an appropriate decomposition setting, finds all extreme points within expected fixed-parameter polynomial time in the oracle model, the parameter being the number of objectives. Experiments are conducted on random bi-objective minimum spanning tree instances, and the results agree with our theoretical findings. Furthermore, compared with a previously studied evolutionary algorithm for the problem GSEMO, MOEA/D finds all extreme points much faster across all instances.
Machine learning in and out of equilibrium
Authors: Shishir Adhikari, Alkan Kabakçıoğlu, Alexander Strang, Deniz Yuret, Michael Hinczewski
Abstract
The algorithms used to train neural networks, like stochastic gradient descent (SGD), have close parallels to natural processes that navigate a high-dimensional parameter space -- for example protein folding or evolution. Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels in a single, unified framework. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium, exhibiting persistent currents in the space of network parameters. As in its physical analogues, the current is associated with an entropy production rate for any given training trajectory. The stationary distribution of these rates obeys the integral and detailed fluctuation theorems -- nonequilibrium generalizations of the second law of thermodynamics. We validate these relations in two numerical examples, a nonlinear regression network and MNIST digit classification. While the fluctuation theorems are universal, there are other aspects of the stationary state that are highly sensitive to the training details. Surprisingly, the effective loss landscape and diffusion matrix that determine the shape of the stationary distribution vary depending on the simple choice of minibatching done with or without replacement. We can take advantage of this nonequilibrium sensitivity to engineer an equilibrium stationary state for a particular application: sampling from a posterior distribution of network weights in Bayesian machine learning. We propose a new variation of stochastic gradient Langevin dynamics (SGLD) that harnesses without replacement minibatching. In an example system where the posterior is exactly known, this SGWORLD algorithm outperforms SGLD, converging to the posterior orders of magnitude faster as a function of the learning rate.
BackpropTools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
Authors: Jonas Eschmann, Dario Albani, Giuseppe Loianno
Abstract
Deep Reinforcement Learning (RL) has been demonstrated to yield capable agents and control policies in several domains but is commonly plagued by prohibitively long training times. Additionally, in the case of continuous control problems, the applicability of learned policies on real-world embedded devices is limited due to the lack of real-time guarantees and portability of existing deep learning libraries. To address these challenges, we present BackpropTools, a dependency-free, header-only, pure C++ library for deep supervised and reinforcement learning. Leveraging the template meta-programming capabilities of recent C++ standards, we provide composable components that can be tightly integrated by the compiler. Its novel architecture allows BackpropTools to be used seamlessly on a heterogeneous set of platforms, from HPC clusters over workstations and laptops to smartphones, smartwatches, and microcontrollers. Specifically, due to the tight integration of the RL algorithms with simulation environments, BackpropTools can solve popular RL problems like the Pendulum-v1 swing-up about 7 to 15 times faster in terms of wall-clock training time compared to other popular RL frameworks when using TD3. We also provide a low-overhead and parallelized interface to the MuJoCo simulator, showing that our PPO implementation achieves state of the art returns in the Ant-v4 environment while achieving a 25 to 30 percent faster wall-clock training time. Finally, we also benchmark the policy inference on a diverse set of microcontrollers and show that in most cases our optimized inference implementation is much faster than even the manufacturer's DSP libraries. To the best of our knowledge, BackpropTools enables the first-ever demonstration of training a deep RL algorithm directly on a microcontroller, giving rise to the field of Tiny Reinforcement Learning (TinyRL). Project page: https://backprop.tools
Constant Sequence Extension for Fast Search Using Weighted Hamming Distance
Authors: Zhenyu Weng, Huiping Zhuang, Haizhou Li, Zhiping Lin
Abstract
Representing visual data using compact binary codes is attracting increasing attention as binary codes are used as direct indices into hash table(s) for fast non-exhaustive search. Recent methods show that ranking binary codes using weighted Hamming distance (WHD) rather than Hamming distance (HD) by generating query-adaptive weights for each bit can better retrieve query-related items. However, search using WHD is slower than that using HD. One main challenge is that the complexity of extending a monotone increasing sequence using WHD to probe buckets in hash table(s) for existing methods is at least proportional to the square of the sequence length, while that using HD is proportional to the sequence length. To overcome this challenge, we propose a novel fast non-exhaustive search method using WHD. The key idea is to design a constant sequence extension algorithm to perform each sequence extension in constant computational complexity and the total complexity is proportional to the sequence length, which is justified by theoretical analysis. Experimental results show that our method is faster than other WHD-based search methods. Also, compared with the HD-based non-exhaustive search method, our method has comparable efficiency but retrieves more query-related items for the dataset of up to one billion items.
Tight Complexity Bounds for Counting Generalized Dominating Sets in Bounded-Treewidth Graphs Part II: Hardness Results
Authors: Jacob Focke, Dániel Marx, Fionn Mc Inerney, Daniel Neuen, Govind S. Sankar, Philipp Schepper, Philip Wellnitz
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
Abstract
For a well-studied family of domination-type problems, in bounded-treewidth graphs, we investigate whether it is possible to find faster algorithms. For sets $\sigma,\rho$ of non-negative integers, a $(\sigma,\rho)$-set of a graph $G$ is a set $S$ of vertices such that $|N(u)\cap S|\in \sigma$ for every $u\in S$, and $|N(v)\cap S|\in \rho$ for every $v\not\in S$. The problem of finding a $(\sigma,\rho)$-set (of a certain size) unifies common problems like $\text{Independent Set}$, $\text{Dominating Set}$, $\text{Independent Dominating Set}$, and many others. In an accompanying paper, it is proven that, for all pairs of finite or cofinite sets $(\sigma,\rho)$, there is an algorithm that counts $(\sigma,\rho)$-sets in time $(c{\sigma,\rho})^{\text{tw}}\cdot n^{O(1)}$ (if a tree decomposition of width $\text{tw}$ is given in the input). Here, $c{\sigma,\rho}$ is a constant with an intricate dependency on $\sigma$ and $\rho$. Despite this intricacy, we show that the algorithms in the accompanying paper are most likely optimal, i.e., for any pair $(\sigma, \rho)$ of finite or cofinite sets where the problem is non-trivial, and any $\varepsilon>0$, a $(c_{\sigma,\rho}-\varepsilon)^{\text{tw}}\cdot n^{O(1)}$-algorithm counting the number of $(\sigma,\rho)$-sets would violate the Counting Strong Exponential-Time Hypothesis ($#$SETH). For finite sets $\sigma$ and $\rho$, our lower bounds also extend to the decision version, showing that those algorithms are optimal in this setting as well.
Novel DeepONet architecture to predict stresses in elastoplastic structures with variable complex geometries and loads
Abstract
A novel deep operator network (DeepONet) with a residual U-Net (ResUNet) as the trunk network is devised to predict full-field highly nonlinear elastic-plastic stress response for complex geometries obtained from topology optimization under variable loads. The proposed DeepONet uses a ResUNet in the trunk to encode complex input geometries, and a fully-connected branch network encodes the parametric loads. Additional information fusion is introduced via an element-wise multiplication of the encoded latent space to improve prediction accuracy further. The performance of the proposed DeepONet was compared to two baseline models, a standalone ResUNet and a DeepONet with fully connected networks as the branch and trunk. The results show that ResUNet and the proposed DeepONet share comparable accuracy; both can predict the stress field and accurately identify stress concentration points. However, the novel DeepONet is more memory efficient and allows greater flexibility with framework architecture modifications. The DeepONet with fully connected networks suffers from high prediction error due to its inability to effectively encode the complex, varying geometry. Once trained, all three networks can predict the full stress distribution orders of magnitude faster than finite element simulations. The proposed network can quickly guide preliminary optimization, designs, sensitivity analysis, uncertainty quantification, and many other nonlinear analyses that require extensive forward evaluations with variable geometries, loads, and other parameters. This work marks the first time a ResUNet is used as the trunk network in the DeepONet architecture and the first time that DeepONet solves problems with complex, varying input geometries under parametric loads and elasto-plastic material behavior.
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Authors: Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Zhangyang Wang
Abstract
Large pre-trained transformers are show-stealer in modern-day deep learning, and it becomes crucial to comprehend the parsimonious patterns that exist within them as they grow in scale. With exploding parameter counts, Lottery Ticket Hypothesis (LTH) and its variants, have lost their pragmatism in sparsifying them due to high computation and memory bottleneck of the repetitive train-prune-retrain routine of iterative magnitude pruning (IMP) which worsens with increasing model size. In this paper, we comprehensively study induced sparse patterns across multiple large pre-trained vision and language transformers. We propose the existence of -- essential sparsity defined with a sharp dropping point beyond which the performance declines much faster w.r.t the rise of sparsity level, when we directly remove weights with the smallest magnitudes in one-shot. In the sparsity-performance curve We also present an intriguing emerging phenomenon of abrupt sparsification during the pre-training of BERT, i.e., BERT suddenly becomes heavily sparse in pre-training after certain iterations. Moreover, our observations also indicate a counter-intuitive finding that BERT trained with a larger amount of pre-training data tends to have a better ability to condense knowledge in comparatively relatively fewer parameters. Lastly, we investigate the effect of the pre-training loss on essential sparsity and discover that self-supervised learning (SSL) objectives trigger stronger emergent sparsification properties than supervised learning (SL). Our codes are available at \url{https://github.com/VITA-Group/essential\_sparsity}.
Abstract
The evolving data framework was first proposed by Anagnostopoulos et al., where an evolver makes small changes to a structure behind the scenes. Instead of taking a single input and producing a single output, an algorithm judiciously probes the current state of the structure and attempts to continuously maintain a sketch of the structure that is as close as possible to its actual state. There have been a number of problems that have been studied in the evolving framework including our own work on labeled trees. We were motivated by the problem of maintaining a labeling in the plane, where updating the labels require physically moving them. Applications involve tracking evolving disease hot-spots via mobile testing units , and tracking unmanned aerial vehicles. To be specific, we consider the problem of tracking labeled nodes in the plane, where an evolver continuously swaps labels of any two nearby nodes in the background unknown to us. We are tasked with maintaining a hypothesis, an approximate sketch of the locations of these labels, which we can only update by physically moving them over a sparse graph. We assume the existence of an Oracle, which when suitably probed, guides us in fixing our hypothesis.
Reconstructing human activities via coupling mobile phone data with location-based social networks
Authors: Le Huang, Fan Xia, Hui Chen, Bowen Hu, Xiao Zhou, Chunxiao Li, Yaohui Jin, Yanyan Xu
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
Abstract
In the era of big data, the ubiquity of location-aware portable devices provides an unprecedented opportunity to understand inhabitants' behavior and their interactions with the built environments. Among the widely used data resources, mobile phone data is the one passively collected and has the largest coverage in the population. However, mobile operators cannot pinpoint one user within meters, leading to the difficulties in activity inference. To that end, we propose a data analysis framework to identify user's activity via coupling the mobile phone data with location-based social networks (LBSN) data. The two datasets are integrated into a Bayesian inference module, considering people's circadian rhythms in both time and space. Specifically, the framework considers the pattern of arrival time to each type of facility and the spatial distribution of facilities. The former can be observed from the LBSN Data and the latter is provided by the points of interest (POIs) dataset. Taking Shanghai as an example, we reconstruct the activity chains of 1,000,000 active mobile phone users and analyze the temporal and spatial characteristics of each activity type. We assess the results with some official surveys and a real-world check-in dataset collected in Shanghai, indicating that the proposed method can capture and analyze human activities effectively. Next, we cluster users' inferred activity chains with a topic model to understand the behavior of different groups of users. This data analysis framework provides an example of reconstructing and understanding the activity of the population at an urban scale with big data fusion.
A Data-Efficient Approach for Long-Term Human Motion Prediction Using Maps of Dynamics
Authors: Yufei Zhu, Andrey Rudenko, Tomasz P. Kucner, Achim J. Lilienthal, Martin Magnusson
Abstract
Human motion prediction is essential for the safe and smooth operation of mobile service robots and intelligent vehicles around people. Commonly used neural network-based approaches often require large amounts of complete trajectories to represent motion dynamics in complex semantically-rich spaces. This requirement may complicate deployment of physical systems in new environments, especially when the data is being collected online from onboard sensors. In this paper we explore a data-efficient alternative using maps of dynamics (MoD) to represent place-dependent multi-modal spatial motion patterns, learned from prior observations. Our approach can perform efficient human motion prediction in the long-term perspective of up to 60 seconds. We quantitatively evaluate its accuracy with limited amount of training data in comparison to an LSTM-based baseline, and qualitatively show that the predicted trajectories reflect the natural semantic properties of the environment, e.g. the locations of short- and long-term goals, navigation in narrow passages, around obstacles, etc.
Towards Scalable Multi-View Reconstruction of Geometry and Materials
Authors: Carolin Schmitt, Božidar Antić, Andrei Neculai, Joo Ho Lee, Andreas Geiger
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be captured with stationary light stages. The input are high-resolution RGB-D images captured by a mobile, hand-held capture system with point lights for active illumination. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. To facilitate scalability to large numbers of observation views and optimization variables, we introduce a distributed optimization algorithm that reconstructs 2.5D keyframe-based representations of the scene. A novel multi-view consistency regularizer effectively synchronizes neighboring keyframes such that the local optimization results allow for seamless integration into a globally consistent 3D model. We provide a study on the importance of each component in our formulation and show that our method compares favorably to baselines. We further demonstrate that our method accurately reconstructs various objects and materials and allows for expansion to spatially larger scenes. We believe that this work represents a significant step towards making geometry and material estimation from hand-held scanners scalable.
Keyword: pruning
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks
Abstract
Finetuning large language models inflates the costs of NLU applications and remains the bottleneck of development cycles. Recent works in computer vision use data pruning to reduce training time. Pruned data selection with static methods is based on a score calculated for each training example prior to finetuning, which involves important computational overhead. Moreover, the score may not necessarily be representative of sample importance throughout the entire training duration. We propose to address these issues with a refined version of dynamic data pruning, a curriculum which periodically scores and discards unimportant examples during finetuning. Our method leverages an EL2N metric that we extend to the joint intent and slot classification task, and an initial finetuning phase on the full train set. Our results on the GLUE benchmark and four joint NLU datasets show a better time-accuracy trade-off compared to static methods. Our method preserves full accuracy while training on 50% of the data points and reduces computational times by up to 41%. If we tolerate instead a minor drop of accuracy of 1%, we can prune 80% of the training examples for a reduction in finetuning time reaching 66%.
ESL-SNNs: An Evolutionary Structure Learning Strategy for Spiking Neural Networks
Authors: Jiangrong Shen, Qi Xu, Jian K. Liu, Yueming Wang, Gang Pan, Huajin Tang
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Abstract
Spiking neural networks (SNNs) have manifested remarkable advantages in power consumption and event-driven property during the inference process. To take full advantage of low power consumption and improve the efficiency of these models further, the pruning methods have been explored to find sparse SNNs without redundancy connections after training. However, parameter redundancy still hinders the efficiency of SNNs during training. In the human brain, the rewiring process of neural networks is highly dynamic, while synaptic connections maintain relatively sparse during brain development. Inspired by this, here we propose an efficient evolutionary structure learning (ESL) framework for SNNs, named ESL-SNNs, to implement the sparse SNN training from scratch. The pruning and regeneration of synaptic connections in SNNs evolve dynamically during learning, yet keep the structural sparsity at a certain level. As a result, the ESL-SNNs can search for optimal sparse connectivity by exploring all possible parameters across time. Our experiments show that the proposed ESL-SNNs framework is able to learn SNNs with sparse structures effectively while reducing the limited accuracy. The ESL-SNNs achieve merely 0.28% accuracy loss with 10% connection density on the DVS-Cifar10 dataset. Our work presents a brand-new approach for sparse training of SNNs from scratch with biologically plausible evolutionary mechanisms, closing the gap in the expressibility between sparse training and dense training. Hence, it has great potential for SNN lightweight training and inference with low power consumption and small memory usage.
Bayesian post-hoc regularization of random forests
Abstract
Random Forests are powerful ensemble learning algorithms widely used in various machine learning tasks. However, they have a tendency to overfit noisy or irrelevant features, which can result in decreased generalization performance. Post-hoc regularization techniques aim to mitigate this issue by modifying the structure of the learned ensemble after its training. Here, we propose Bayesian post-hoc regularization to leverage the reliable patterns captured by leaf nodes closer to the root, while potentially reducing the impact of more specific and potentially noisy leaf nodes deeper in the tree. This approach allows for a form of pruning that does not alter the general structure of the trees but rather adjusts the influence of leaf nodes based on their proximity to the root node. We have evaluated the performance of our method on various machine learning data sets. Our approach demonstrates competitive performance with the state-of-the-art methods and, in certain cases, surpasses them in terms of predictive accuracy and generalization.
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Authors: Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Zhangyang Wang
Abstract
Large pre-trained transformers are show-stealer in modern-day deep learning, and it becomes crucial to comprehend the parsimonious patterns that exist within them as they grow in scale. With exploding parameter counts, Lottery Ticket Hypothesis (LTH) and its variants, have lost their pragmatism in sparsifying them due to high computation and memory bottleneck of the repetitive train-prune-retrain routine of iterative magnitude pruning (IMP) which worsens with increasing model size. In this paper, we comprehensively study induced sparse patterns across multiple large pre-trained vision and language transformers. We propose the existence of -- essential sparsity defined with a sharp dropping point beyond which the performance declines much faster w.r.t the rise of sparsity level, when we directly remove weights with the smallest magnitudes in one-shot. In the sparsity-performance curve We also present an intriguing emerging phenomenon of abrupt sparsification during the pre-training of BERT, i.e., BERT suddenly becomes heavily sparse in pre-training after certain iterations. Moreover, our observations also indicate a counter-intuitive finding that BERT trained with a larger amount of pre-training data tends to have a better ability to condense knowledge in comparatively relatively fewer parameters. Lastly, we investigate the effect of the pre-training loss on essential sparsity and discover that self-supervised learning (SSL) objectives trigger stronger emergent sparsification properties than supervised learning (SL). Our codes are available at \url{https://github.com/VITA-Group/essential\_sparsity}.
Utterance Classification with Logical Neural Network: Explainable AI for Mental Disorder Diagnosis
Abstract
In response to the global challenge of mental health problems, we proposes a Logical Neural Network (LNN) based Neuro-Symbolic AI method for the diagnosis of mental disorders. Due to the lack of effective therapy coverage for mental disorders, there is a need for an AI solution that can assist therapists with the diagnosis. However, current Neural Network models lack explainability and may not be trusted by therapists. The LNN is a Recurrent Neural Network architecture that combines the learning capabilities of neural networks with the reasoning capabilities of classical logic-based AI. The proposed system uses input predicates from clinical interviews to output a mental disorder class, and different predicate pruning techniques are used to achieve scalability and higher scores. In addition, we provide an insight extraction method to aid therapists with their diagnosis. The proposed system addresses the lack of explainability of current Neural Network models and provides a more trustworthy solution for mental disorder diagnosis.
Keyword: diffusion
SwinRDM: Integrate SwinRNN with Diffusion Model towards High-Resolution and High-Quality Weather Forecasting
Authors: Lei Chen, Fei Du, Yuan Hu, Fan Wang, Zhibin Wang
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)
Abstract
Data-driven medium-range weather forecasting has attracted much attention in recent years. However, the forecasting accuracy at high resolution is unsatisfactory currently. Pursuing high-resolution and high-quality weather forecasting, we develop a data-driven model SwinRDM which integrates an improved version of SwinRNN with a diffusion model. SwinRDM performs predictions at 0.25-degree resolution and achieves superior forecasting accuracy to IFS (Integrated Forecast System), the state-of-the-art operational NWP model, on representative atmospheric variables including 500 hPa geopotential (Z500), 850 hPa temperature (T850), 2-m temperature (T2M), and total precipitation (TP), at lead times of up to 5 days. We propose to leverage a two-step strategy to achieve high-resolution predictions at 0.25-degree considering the trade-off between computation memory and forecasting accuracy. Recurrent predictions for future atmospheric fields are firstly performed at 1.40625-degree resolution, and then a diffusion-based super-resolution model is leveraged to recover the high spatial resolution and finer-scale atmospheric details. SwinRDM pushes forward the performance and potential of data-driven models for a large margin towards operational applications.
An Upwind Finite Difference Method to Singularly Perturbed Convection Diffusion Problems on a Shishkin Mesh
Authors: Daniel T. Gregory, Charuka D. Wickramasinghe
Abstract
This paper introduces a numerical approach to solve singularly perturbed convection diffusion boundary value problems for second-order ordinary differential equations that feature a small positive parameter {\epsilon} multiplying the highest derivative. We specifically examine Dirichlet boundary conditions. To solve this differential equation, we propose an upwind finite difference method and incorporate the Shishkin mesh scheme to capture the solution near boundary layers. Our solver is both direct and of high accuracy, with computation time that scales linearly with the number of grid points. MATLAB code of the numerical recipe is made publicly available. We present numerical results to validate the theoretical results and assess the accuracy of our method. The tables and graphs included in this paper demonstrate the numerical outcomes, which indicate that our proposed method offers a highly accurate approximation of the exact solution.
Optimizing Sampling Patterns for Compressed Sensing MRI with Diffusion Generative Models
Authors: Sriram Ravula, Brett Levac, Ajil Jalal, Jonathan I. Tamir, Alexandros G. Dimakis
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Diffusion-based generative models have been used as powerful priors for magnetic resonance imaging (MRI) reconstruction. We present a learning method to optimize sub-sampling patterns for compressed sensing multi-coil MRI that leverages pre-trained diffusion generative models. Crucially, during training we use a single-step reconstruction based on the posterior mean estimate given by the diffusion model and the MRI measurement process. Experiments across varying anatomies, acceleration factors, and pattern types show that sampling operators learned with our method lead to competitive, and in the case of 2D patterns, improved reconstructions compared to baseline patterns. Our method requires as few as five training images to learn effective sampling patterns.
ISI-Mitigating Character Encoding for Molecular communications via Diffusion
Abstract
This letter introduces a novel algorithm for generating codebooks in molecular communications (MC). The proposed algorithm utilizes character entropy to effectively mitigate inter-symbol interference (ISI) during MC via diffusion. Based on Huffman coding, the algorithm ensures that consecutive bit-1s are avoided in the resulting codebook. Additionally, the error-correction process at the receiver effectively eliminates ISI in the time slot immediately following a bit-1. We conduct an ISI analysis, which confirms that the proposed algorithm significantly reduces decoding errors. Through numerical analysis, we demonstrate that the proposed codebook exhibits superior performance in terms of character error rate compared to existing codebooks. Furthermore, we validate the performance of the algorithm through experimentation on a real-time testbed.
DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views
Authors: Paul Yoo, Jiaxian Guo, Yutaka Matsuo, Shixiang Shane Gu
Abstract
Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. 2D diffusion models, nevertheless, lack 3D awareness, leading to distorted image synthesis and compromising the identity. To address these problems, we propose DreamSparse, a framework that enables the frozen pre-trained diffusion model to generate geometry and identity-consistent novel view image. Specifically, DreamSparse incorporates a geometry module designed to capture 3D features from sparse views as a 3D prior. Subsequently, a spatial guidance model is introduced to convert these 3D feature maps into spatial information for the generative process. This information is then used to guide the pre-trained diffusion model, enabling it to generate geometrically consistent images without tuning it. Leveraging the strong image priors in the pre-trained diffusion models, DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images and generalising to open-set images. Experimental results demonstrate that our framework can effectively synthesize novel view images from sparse views and outperforms baselines in both trained and open-set category images. More results can be found on our project page: https://sites.google.com/view/dreamsparse-webpage.
Change Diffusion: Change Detection Map Generation Based on Difference-Feature Guided DDPM
Abstract
Deep learning (DL) approaches based on CNN-purely or Transformer networks have demonstrated promising results in bitemporal change detection (CD). However, their performance is limited by insufficient contextual information aggregation, as they struggle to fully capture the implicit contextual dependency relationships among feature maps at different levels. Additionally, researchers have utilized pre-trained denoising diffusion probabilistic models (DDPMs) for training lightweight CD classifiers. Nevertheless, training a DDPM to generate intricately detailed, multi-channel remote sensing images requires months of training time and a substantial volume of unlabeled remote sensing datasets, making it significantly more complex than generating a single-channel change map. To overcome these challenges, we propose a novel end-to-end DDPM-based model architecture called change-aware diffusion model (CADM), which can be trained using a limited annotated dataset quickly. Furthermore, we introduce dynamic difference conditional encoding to enhance step-wise regional attention in DDPM for bitemporal images in CD datasets. This method establishes state-adaptive conditions for each sampling step, emphasizing two main innovative points of our model: 1) its end-to-end nature and 2) difference conditional encoding. We evaluate CADM on four remote sensing CD tasks with different ground scenarios, including CDD, WHU, Levier, and GVLM. Experimental results demonstrate that CADM significantly outperforms state-of-the-art methods, indicating the generalization and effectiveness of the proposed model.
Protecting the Intellectual Property of Diffusion Models by the Watermark Diffusion Process
Authors: Sen Peng, Yufei Chen, Cong Wang, Xiaohua Jia
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Diffusion models have emerged as state-of-the-art deep generative architectures with the increasing demands for generation tasks. Training large diffusion models for good performance requires high resource costs, making them valuable intellectual properties to protect. While most of the existing ownership solutions, including watermarking, mainly focus on discriminative models. This paper proposes WDM, a novel watermarking method for diffusion models, including watermark embedding, extraction, and verification. WDM embeds the watermark data through training or fine-tuning the diffusion model to learn a Watermark Diffusion Process (WDP), different from the standard diffusion process for the task data. The embedded watermark can be extracted by sampling using the shared reverse noise from the learned WDP without degrading performance on the original task. We also provide theoretical foundations and analysis of the proposed method by connecting the WDP to the diffusion process with a modified Gaussian kernel. Extensive experiments are conducted to demonstrate its effectiveness and robustness against various attacks.
DFormer: Diffusion-guided Transformer for Universal Image Segmentation
Authors: Hefeng Wang, Jiale Cao, Rao Muhammad Anwer, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper introduces an approach, named DFormer, for universal image segmentation. The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model. DFormer first adds various levels of Gaussian noise to ground-truth masks, and then learns a model to predict denoising masks from corrupted masks. Specifically, we take deep pixel-level features along with the noisy masks as inputs to generate mask features and attention masks, employing diffusion-based decoder to perform mask prediction gradually. At inference, our DFormer directly predicts the masks and corresponding categories from a set of randomly-generated masks. Extensive experiments reveal the merits of our proposed contributions on different image segmentation tasks: panoptic segmentation, instance segmentation, and semantic segmentation. Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val2017 set. Further, DFormer achieves promising semantic segmentation performance outperforming the recent diffusion-based method by 2.2% on ADE20K val set. Our source code and models will be publicly on https://github.com/cp3wan/DFormer
Abstract
Most recent works focus on answering first order logical queries to explore the knowledge graph reasoning via multi-hop logic predictions. However, existing reasoning models are limited by the circumscribed logical paradigms of training samples, which leads to a weak generalization of unseen logic. To address these issues, we propose a plug-in module called Logic Diffusion (LoD) to discover unseen queries from surroundings and achieves dynamical equilibrium between different kinds of patterns. The basic idea of LoD is relation diffusion and sampling sub-logic by random walking as well as a special training mechanism called gradient adaption. Besides, LoD is accompanied by a novel loss function to further achieve the robust logical diffusion when facing noisy data in training or testing sets. Extensive experiments on four public datasets demonstrate the superiority of mainstream knowledge graph reasoning models with LoD over state-of-the-art. Moreover, our ablation study proves the general effectiveness of LoD on the noise-rich knowledge graph.
Machine learning in and out of equilibrium
Authors: Shishir Adhikari, Alkan Kabakçıoğlu, Alexander Strang, Deniz Yuret, Michael Hinczewski
Abstract
The algorithms used to train neural networks, like stochastic gradient descent (SGD), have close parallels to natural processes that navigate a high-dimensional parameter space -- for example protein folding or evolution. Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels in a single, unified framework. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium, exhibiting persistent currents in the space of network parameters. As in its physical analogues, the current is associated with an entropy production rate for any given training trajectory. The stationary distribution of these rates obeys the integral and detailed fluctuation theorems -- nonequilibrium generalizations of the second law of thermodynamics. We validate these relations in two numerical examples, a nonlinear regression network and MNIST digit classification. While the fluctuation theorems are universal, there are other aspects of the stationary state that are highly sensitive to the training details. Surprisingly, the effective loss landscape and diffusion matrix that determine the shape of the stationary distribution vary depending on the simple choice of minibatching done with or without replacement. We can take advantage of this nonequilibrium sensitivity to engineer an equilibrium stationary state for a particular application: sampling from a posterior distribution of network weights in Bayesian machine learning. We propose a new variation of stochastic gradient Langevin dynamics (SGLD) that harnesses without replacement minibatching. In an example system where the posterior is exactly known, this SGWORLD algorithm outperforms SGLD, converging to the posterior orders of magnitude faster as a function of the learning rate.
Towards Visual Foundational Models of Physical Scenes
Authors: Chethan Parameshwara, Alessandro Achille, Matthew Trager, Xiaolong Li, Jiawei Mo, Matthew Trager, Ashwin Swaminathan, CJ Taylor, Dheera Venkatraman, Xiaohan Fei, Stefano Soatto
Abstract
We describe a first step towards learning general-purpose visual representations of physical scenes using only image prediction as a training criterion. To do so, we first define "physical scene" and show that, even though different agents may maintain different representations of the same scene, the underlying physical scene that can be inferred is unique. Then, we show that NeRFs cannot represent the physical scene, as they lack extrapolation mechanisms. Those, however, could be provided by Diffusion Models, at least in theory. To test this hypothesis empirically, NeRFs can be combined with Diffusion Models, a process we refer to as NeRF Diffusion, used as unsupervised representations of the physical scene. Our analysis is limited to visual data, without external grounding mechanisms that can be provided by independent sensory modalities.
Abstract
Art curatorial processes are characterized by the presentation of a collection of artworks in a knowledgeable way. Machine processes are characterized by their capacity to manage and analyze large amounts of data. This paper envisages machine curation and audience interaction as a means to explore the implications of contemporary AI models for the curatorial world. This project was developed for the occasion of the 2023 Helsinki Art Biennial, entitled New Directions May Emerge. We use the Helsinki Art Museum (HAM) collection to re-imagine the city of Helsinki through the lens of machine perception. We use visual-textual models to place artworks currently hosted inside the museum in outdoor public spaces of the city, assigning fictional coordinates based on similarity scores. Synthetic 360{\deg} art panoramas are generated using diffusion-based models to propose a machinic visual style guided by the artworks. The result of this project will be virtually presented as a web-based installation, where such a re-contextualization allows the navigation of an alternative version of the city while exploring its artistic heritage. Finally, we discuss our contributions to machine curation and the ethical implications that such a process entails. The web-based installation is available at this link: this http URL
Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation
Authors: Xinrong Hu, Yu-Jen Chen, Tsung-Yi Ho, Yiyu Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent advances in denoising diffusion probabilistic models have shown great success in image synthesis tasks. While there are already works exploring the potential of this powerful tool in image semantic segmentation, its application in weakly supervised semantic segmentation (WSSS) remains relatively under-explored. Observing that conditional diffusion models (CDM) is capable of generating images subject to specific distributions, in this work, we utilize category-aware semantic information underlied in CDM to get the prediction mask of the target object with only image-level annotations. More specifically, we locate the desired class by approximating the derivative of the output of CDM w.r.t the input condition. Our method is different from previous diffusion model methods with guidance from an external classifier, which accumulates noises in the background during the reconstruction process. Our method outperforms state-of-the-art CAM and diffusion model methods on two public medical image segmentation datasets, which demonstrates that CDM is a promising tool in WSSS. Also, experiment shows our method is more time-efficient than existing diffusion model methods, making it practical for wider applications.
Abstract
Finding correspondences between images is a fundamental problem in computer vision. In this paper, we show that correspondence emerges in image diffusion models without any explicit supervision. We propose a simple strategy to extract this implicit knowledge out of diffusion networks as image features, namely DIffusion FeaTures (DIFT), and use them to establish correspondences between real images. Without any additional fine-tuning or supervision on the task-specific data or annotations, DIFT is able to outperform both weakly-supervised methods and competitive off-the-shelf features in identifying semantic, geometric, and temporal correspondences. Particularly for semantic correspondence, DIFT from Stable Diffusion is able to outperform DINO and OpenCLIP by 19 and 14 accuracy points respectively on the challenging SPair-71k benchmark. It even outperforms the state-of-the-art supervised methods on 9 out of 18 categories while remaining on par for the overall performance. Project page: https://diffusionfeatures.github.io
Keyword: adaptive
Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training
Authors: Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, Dacheng Tao
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Abstract
Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios. Contrastive pre-training, which constructs pseudo-positive examples from unlabeled data, has shown great potential to solve this problem. However, the pseudo-positive examples crafted by data augmentations can be irrelevant. To this end, we propose relevance-aware contrastive learning. It takes the intermediate-trained model itself as an imperfect oracle to estimate the relevance of positive pairs and adaptively weighs the contrastive loss of different pairs according to the estimated relevance. Our method consistently improves the SOTA unsupervised Contriever model on the BEIR and open-domain QA retrieval benchmarks. Further exploration shows that our method can not only beat BM25 after further pre-training on the target corpus but also serves as a good few-shot learner. Our code is publicly available at https://github.com/Yibin-Lei/ReContriever.
shs-nlp at RadSum23: Domain-Adaptive Pre-training of Instruction-tuned LLMs for Radiology Report Impression Generation
Abstract
Instruction-tuned generative Large language models (LLMs) like ChatGPT and Bloomz possess excellent generalization abilities, but they face limitations in understanding radiology reports, particularly in the task of generating the IMPRESSIONS section from the FINDINGS section. They tend to generate either verbose or incomplete IMPRESSIONS, mainly due to insufficient exposure to medical text data during training. We present a system which leverages large-scale medical text data for domain-adaptive pre-training of instruction-tuned LLMs to enhance its medical knowledge and performance on specific medical tasks. We show that this system performs better in a zero-shot setting than a number of pretrain-and-finetune adaptation methods on the IMPRESSIONS generation task, and ranks 1st among participating systems in Task 1B: Radiology Report Summarization at the BioNLP 2023 workshop.
Vid2Act: Activate Offline Videos for Visual RL
Authors: Pan Minting, Zheng Yitao, Wang Yunbo, Yang Xiaokang
Abstract
Pretraining RL models on offline video datasets is a promising way to improve their training efficiency in online tasks, but challenging due to the inherent mismatch in tasks, dynamics, and behaviors across domains. A recent model, APV, sidesteps the accompanied action records in offline datasets and instead focuses on pretraining a task-irrelevant, action-free world model within the source domains. We present Vid2Act, a model-based RL method that learns to transfer valuable action-conditioned dynamics and potentially useful action demonstrations from offline to online settings. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the domain relevance for both dynamics representation transfer and policy transfer. Specifically, we train the world models to generate a set of time-varying task similarities using a domain-selective knowledge distillation loss. These similarities serve two purposes: (i) adaptively transferring the most useful source knowledge to facilitate dynamics learning, and (ii) learning to replay the most relevant source actions to guide the target policy. We demonstrate the advantages of Vid2Act over the action-free visual RL pretraining method in both Meta-World and DeepMind Control Suite.
Boosting Offline Reinforcement Learning with Action Preference Query
Abstract
Training practical agents usually involve offline and online reinforcement learning (RL) to balance the policy's performance and interaction costs. In particular, online fine-tuning has become a commonly used method to correct the erroneous estimates of out-of-distribution data learned in the offline training phase. However, even limited online interactions can be inaccessible or catastrophic for high-stake scenarios like healthcare and autonomous driving. In this work, we introduce an interaction-free training scheme dubbed Offline-with-Action-Preferences (OAP). The main insight is that, compared to online fine-tuning, querying the preferences between pre-collected and learned actions can be equally or even more helpful to the erroneous estimate problem. By adaptively encouraging or suppressing policy constraint according to action preferences, OAP could distinguish overestimation from beneficial policy improvement and thus attains a more accurate evaluation of unseen data. Theoretically, we prove a lower bound of the behavior policy's performance improvement brought by OAP. Moreover, comprehensive experiments on the D4RL benchmark and state-of-the-art algorithms demonstrate that OAP yields higher (29% on average) scores, especially on challenging AntMaze tasks (98% higher).
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
Abstract
End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework. Typical methods heavily rely on Region-of-Interest (RoI) operations to extract local features and complex post-processing steps to produce final predictions. To address these limitations, we propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. Specifically, using query embedding per text instance, TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing without sacrificing flexibility or simplicity. Additionally, we design an Adaptive Global aGgregation (AGG) module to transfer global features into sequential features for reading arbitrarily-shaped texts, which overcomes the sub-optimization problem of RoI operations. Furthermore, potential corpus information is utilized from weak annotations to full labels through mixed supervision, further improving text detection and end-to-end text spotting results. Extensive experiments on various bilingual (i.e., English and Chinese) benchmarks demonstrate the superiority of our method. Especially on TDA-ReCTS dataset, TextFormer surpasses the state-of-the-art method in terms of 1-NED by 13.2%.
A Lightweight Method for Tackling Unknown Participation Probabilities in Federated Averaging
Authors: Shiqiang Wang, Mingyue Ji
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
In federated learning (FL), clients usually have diverse participation probabilities that are unknown a priori, which can significantly harm the performance of FL if not handled properly. Existing works aiming at addressing this problem are usually based on global variance reduction, which requires a substantial amount of additional memory in a multiplicative factor equal to the total number of clients. An important open problem is to find a lightweight method for FL in the presence of clients with unknown participation rates. In this paper, we address this problem by adapting the aggregation weights in federated averaging (FedAvg) based on the participation history of each client. We first show that, with heterogeneous participation probabilities, FedAvg with non-optimal aggregation weights can diverge from the optimal solution of the original FL objective, indicating the need of finding optimal aggregation weights. However, it is difficult to compute the optimal weights when the participation probabilities are unknown. To address this problem, we present a new algorithm called FedAU, which improves FedAvg by adaptively weighting the client updates based on online estimates of the optimal weights without knowing the probabilities of client participation. We provide a theoretical convergence analysis of FedAU using a novel methodology to connect the estimation error and convergence. Our theoretical results reveal important and interesting insights, while showing that FedAU converges to an optimal solution of the original objective and has desirable properties such as linear speedup. Our experimental results also verify the advantage of FedAU over baseline methods.
Change Diffusion: Change Detection Map Generation Based on Difference-Feature Guided DDPM
Abstract
Deep learning (DL) approaches based on CNN-purely or Transformer networks have demonstrated promising results in bitemporal change detection (CD). However, their performance is limited by insufficient contextual information aggregation, as they struggle to fully capture the implicit contextual dependency relationships among feature maps at different levels. Additionally, researchers have utilized pre-trained denoising diffusion probabilistic models (DDPMs) for training lightweight CD classifiers. Nevertheless, training a DDPM to generate intricately detailed, multi-channel remote sensing images requires months of training time and a substantial volume of unlabeled remote sensing datasets, making it significantly more complex than generating a single-channel change map. To overcome these challenges, we propose a novel end-to-end DDPM-based model architecture called change-aware diffusion model (CADM), which can be trained using a limited annotated dataset quickly. Furthermore, we introduce dynamic difference conditional encoding to enhance step-wise regional attention in DDPM for bitemporal images in CD datasets. This method establishes state-adaptive conditions for each sampling step, emphasizing two main innovative points of our model: 1) its end-to-end nature and 2) difference conditional encoding. We evaluate CADM on four remote sensing CD tasks with different ground scenarios, including CDD, WHU, Levier, and GVLM. Experimental results demonstrate that CADM significantly outperforms state-of-the-art methods, indicating the generalization and effectiveness of the proposed model.
GaitGCI: Generative Counterfactual Intervention for Gait Recognition
Authors: Huanzhang Dou, Pengyi Zhang, Wei Su, Yunlong Yu, Yining Lin, Xi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Gait is one of the most promising biometrics that aims to identify pedestrians from their walking patterns. However, prevailing methods are susceptible to confounders, resulting in the networks hardly focusing on the regions that reflect effective walking patterns. To address this fundamental problem in gait recognition, we propose a Generative Counterfactual Intervention framework, dubbed GaitGCI, consisting of Counterfactual Intervention Learning (CIL) and Diversity-Constrained Dynamic Convolution (DCDC). CIL eliminates the impacts of confounders by maximizing the likelihood difference between factual/counterfactual attention while DCDC adaptively generates sample-wise factual/counterfactual attention to efficiently perceive the sample-wise properties. With matrix decomposition and diversity constraint, DCDC guarantees the model to be efficient and effective. Extensive experiments indicate that proposed GaitGCI: 1) could effectively focus on the discriminative and interpretable regions that reflect gait pattern; 2) is model-agnostic and could be plugged into existing models to improve performance with nearly no extra cost; 3) efficiently achieves state-of-the-art performance on arbitrary scenarios (in-the-lab and in-the-wild).
MetaGait: Learning to Learn an Omni Sample Adaptive Representation for Gait Recognition
Authors: Huanzhang Dou, Pengyi Zhang, Wei Su, Yunlong Yu, Xi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Gait recognition, which aims at identifying individuals by their walking patterns, has recently drawn increasing research attention. However, gait recognition still suffers from the conflicts between the limited binary visual clues of the silhouette and numerous covariates with diverse scales, which brings challenges to the model's adaptiveness. In this paper, we address this conflict by developing a novel MetaGait that learns to learn an omni sample adaptive representation. Towards this goal, MetaGait injects meta-knowledge, which could guide the model to perceive sample-specific properties, into the calibration network of the attention mechanism to improve the adaptiveness from the omni-scale, omni-dimension, and omni-process perspectives. Specifically, we leverage the meta-knowledge across the entire process, where Meta Triple Attention and Meta Temporal Pooling are presented respectively to adaptively capture omni-scale dependency from spatial/channel/temporal dimensions simultaneously and to adaptively aggregate temporal information through integrating the merits of three complementary temporal aggregation methods. Extensive experiments demonstrate the state-of-the-art performance of the proposed MetaGait. On CASIA-B, we achieve rank-1 accuracy of 98.7%, 96.0%, and 89.3% under three conditions, respectively. On OU-MVLP, we achieve rank-1 accuracy of 92.4%.
The Unscented Kalman Filter for Nonlinear Parameter Identification of Adaptive Cruise Control Systems
Authors: Konstantinos Ampountolas
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
This paper develops and investigates a dual unscented Kalman filter (DUKF) for the joint nonlinear state and parameter identification of commercial adaptive cruise control (ACC) systems. Although the core functionality of stock ACC systems, including their proprietary control logic and parameters, is not publicly available, this work considers a car-following scenario with a human-driven vehicle (leader) and an ACC engaged ego vehicle (follower) that employs a constant time-headway policy (CTHP). The objective of the DUKF is to determine the CTHP parameters of the ACC by using real-time observations of space-gap and relative velocity from the vehicle's onboard sensors. Real-time parameter identification of stock ACC systems is essential for assessing their string stability, large-scale deployment on motorways, and impact on traffic flow and throughput. In this regard, $L2$ and $L\infty$ string stability conditions are considered. The observability rank condition for nonlinear systems is adopted to evaluate the ability of the proposed estimation scheme to estimate stock ACC system parameters using empirical data. The proposed filter is evaluated using empirical data collected from the onboard sensors of two 2019 SUV vehicles, namely Hyundai Nexo and SsangYong Rexton, equipped with stock ACC systems; and is compared with batch and recursive least-squares optimization. The set of ACC model parameters obtained from the proposed filter revealed that the commercially implemented ACC system of the considered vehicle (Hyundai Nexo) is neither $L2$ nor $L\infty$ string stable.
Instructive Feature Enhancement for Dichotomous Medical Image Segmentation
Authors: Lian Liu, Han Zhou, Jiongquan Chen, Sijing Liu, Wenlong Shi, Dong Ni, Deng-Ping Fan, Xin Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural networks have been widely applied in dichotomous medical image segmentation (DMIS) of many anatomical structures in several modalities, achieving promising performance. However, existing networks tend to struggle with task-specific, heavy and complex designs to improve accuracy. They made little instructions to which feature channels would be more beneficial for segmentation, and that may be why the performance and universality of these segmentation models are hindered. In this study, we propose an instructive feature enhancement approach, namely IFE, to adaptively select feature channels with rich texture cues and strong discriminability to enhance raw features based on local curvature or global information entropy criteria. Being plug-and-play and applicable for diverse DMIS tasks, IFE encourages the model to focus on texture-rich features which are especially important for the ambiguous and challenging boundary identification, simultaneously achieving simplicity, universality, and certain interpretability. To evaluate the proposed IFE, we constructed the first large-scale DMIS dataset Cosmos55k, which contains 55,023 images from 7 modalities and 26 anatomical structures. Extensive experiments show that IFE can improve the performance of classic segmentation networks across different anatomies and modalities with only slight modifications. Code is available at https://github.com/yezi-66/IFE
A modified combined active-set Newton method for solving phase-field fracture into the monolithic limit
Authors: Leon Maximilian Kolditz, Katrin Mang, Thomas Wick
Abstract
In this work, we examine a numerical phase-field fracture framework in which the crack irreversibility constraint is treated with a primal-dual active set method and a linearization is used in the degradation function to enhance the numerical stability. The first goal is to carefully derive from a complementarity system our primal-dual active set formulation, which has been used in the literature in numerous studies, but for phase-field fracture without its detailed mathematical derivation yet. Based on the latter, we formulate a modified combined active-set Newton approach that significantly reduces the computational cost in comparison to comparable prior algorithms for quasi-monolithic settings. For many practical problems, Newton converges fast, but active set needs many iterations, for which three different efficiency improvements are suggested in this paper. Afterwards, we design an iteration on the linearization in order to iterate the problem to the monolithic limit. Our new algorithms are implemented in the programming framework pfm-cracks [T. Heister, T. Wick; pfm-cracks: A parallel-adaptive framework for phase-field fracture propagation, Software Impacts, Vol. 6 (2020), 100045]. In the numerical examples, we conduct performance studies and investigate efficiency enhancements. The main emphasis is on the cost complexity by keeping the accuracy of numerical solutions and goal functionals. Our algorithmic suggestions are substantiated with the help of several benchmarks in two and three spatial dimensions. Therein, predictor-corrector adaptivity and parallel performance studies are explored as well.
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Authors: Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
We are interested in a novel task, namely low-resource text-to-talking avatar. Given only a few-minute-long talking person video with the audio track as the training data and arbitrary texts as the driving input, we aim to synthesize high-quality talking portrait videos corresponding to the input text. This task has broad application prospects in the digital human industry but has not been technically achieved yet due to two challenges: (1) It is challenging to mimic the timbre from out-of-domain audio for a traditional multi-speaker Text-to-Speech system. (2) It is hard to render high-fidelity and lip-synchronized talking avatars with limited training data. In this paper, we introduce Adaptive Text-to-Talking Avatar (Ada-TTA), which (1) designs a generic zero-shot multi-speaker TTS model that well disentangles the text content, timbre, and prosody; and (2) embraces recent advances in neural rendering to achieve realistic audio-driven talking face video generation. With these designs, our method overcomes the aforementioned two challenges and achieves to generate identity-preserving speech and realistic talking person video. Experiments demonstrate that our method could synthesize realistic, identity-preserving, and audio-visual synchronized talking avatar videos.
dMAPAR-HMM: Reforming Traffic Model for Improving Performance Bound with Stochastic Network Calculus
Authors: Qingqing Yang, Xi Peng, Huiwen Yang, Gong Zhang, Bo Bai
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
A popular branch of stochastic network calculus (SNC) utilizes moment-generating functions (MGFs) to characterize arrivals and services, which enables end-to-end performance analysis. However, existing traffic models for SNC cannot effectively represent the complicated nature of real-world network traffic such as dramatic burstiness. To conquer this challenge, we propose an adaptive spatial-temporal traffic model: dMAPAR-HMM. Specifically, we model the temporal on-off switching process as a dual Markovian arrival process (dMAP) and the arrivals during the on phases as an autoregressive hidden Markov model (AR-HMM). The dMAPAR-HMM model fits in with the MGF-SNC analysis framework, unifies various state-of-the-art arrival models, and matches real-world data more closely. We perform extensive experiments with real-world traces under different network topologies and utilization levels. Experimental results show that dMAPAR-HMM significantly outperforms prevailing models in MGF-SNC.
Measuring User Experience of Adaptive User Interfaces using EEG: A Replication Study
Authors: Daniel Gaspar-Figueiredo, Silvia Abrahão, Emilio Insfrán, Jean Vanderdonckt
Abstract
Adaptive user interfaces have the advantage of being able to dynamically change their aspect and/or behaviour depending on the characteristics of the context of use, i.e. to improve user experience(UX). UX is an important quality factor that has been primarily evaluated with classical measures but to a lesser extent with physiological measures, such as emotion recognition, skin response, or brain activity.In a previous exploratory experiment involving users with different profiles and a wide range of ages, we analysed user experience in terms of cognitive load, engagement, attraction and memorisation when employing twenty graphical adaptive menus through the use of an Electroencephalogram (EEG) device. The results indicated that there were statistically significant differences for these four variables. However, we considered that it was necessary to confirm or reject these findings using a more homogeneous group of users.We conducted an operational internal replication study with 40 participants. We also investigated the potential correlation between EEG signals and the participants' user experience ratings, such as their preferences.The results of this experiment confirm that there are statistically significant differences between the EEG variables when the participants interact with the different adaptive menus. Moreover, there is a high correlation among the participants' UX ratings and the EEG signals, and a trend regarding performance has emerged from our analysis.These findings suggest that EEG signals could be used to evaluate UX. With regard to the menus studied, our results suggest that graphical menus with different structures and font types produce more differences in users' brain responses, while menus which use colours produce more similarities in users' brain responses. Several insights with which to improve users' experience of graphical adaptive menus are outlined.
RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion
Abstract
The raw depth image captured by indoor depth sensors usually has an extensive range of missing depth values due to inherent limitations such as the inability to perceive transparent objects and the limited distance range. The incomplete depth map with missing values burdens many downstream vision tasks, and a rising number of depth completion methods have been proposed to alleviate this issue. While most existing methods can generate accurate dense depth maps from sparse and uniformly sampled depth maps, they are not suitable for complementing large contiguous regions of missing depth values, which is common and critical in images captured in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. In the other branch, we propose an RGB-depth fusion CycleGAN to transfer the RGB image to the fine-grained textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate the features across the two branches, and we append a confidence fusion head to fuse the two outputs of the branches for the final depth map. Extensive experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method clearly improves the depth completion performance, especially in a more realistic setting of indoor environments, with the help of our proposed pseudo depth maps in training.
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations
Authors: Torsten Krauß (1), Alexandra Dmitrienko (1) ((1) University of Würzburg)
Abstract
Federated Learning (FL) trains machine learning models on data distributed across multiple devices, avoiding data transfer to a central location. This improves privacy, reduces communication costs, and enhances model performance. However, FL is prone to poisoning attacks, which can be untargeted aiming to reduce the model performance, or targeted, so-called backdoors, which add adversarial behavior that can be triggered with appropriately crafted inputs. Striving for stealthiness, backdoor attacks are harder to deal with. Mitigation techniques against poisoning attacks rely on monitoring certain metrics and filtering malicious model updates. However, previous works didn't consider real-world adversaries and data distributions. To support our statement, we define a new notion of strong adaptive adversaries that can simultaneously adapt to multiple objectives and demonstrate through extensive tests, that existing defense methods can be circumvented in this adversary model. We also demonstrate, that existing defenses have limited effectiveness when no assumptions are made about underlying data distributions. To address realistic scenarios and adversary models, we propose Metric-Cascades (MESAS) a new defense that leverages multiple detection metrics simultaneously for the filtering of poisoned model updates. This approach forces adaptive attackers into a heavy multi-objective optimization problem, and our evaluation with nine backdoors and three datasets shows that even our strong adaptive attacker cannot evade MESAS's detection. We show that MESAS outperforms existing defenses in distinguishing backdoors from distortions originating from different data distributions within and across the clients. Overall, MESAS is the first defense that is robust against strong adaptive adversaries and is effective in real-world data scenarios while introducing a low overhead of 24.37s on average.
Buying Information for Stochastic Optimization
Authors: Mingchen Ma, Christos Tzamos
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Abstract
Stochastic optimization is one of the central problems in Machine Learning and Theoretical Computer Science. In the standard model, the algorithm is given a fixed distribution known in advance. In practice though, one may acquire at a cost extra information to make better decisions. In this paper, we study how to buy information for stochastic optimization and formulate this question as an online learning problem. Assuming the learner has an oracle for the original optimization problem, we design a $2$-competitive deterministic algorithm and a $e/(e-1)$-competitive randomized algorithm for buying information. We show that this ratio is tight as the problem is equivalent to a robust generalization of the ski-rental problem, which we call super-martingale stopping. We also consider an adaptive setting where the learner can choose to buy information after taking some actions for the underlying optimization problem. We focus on the classic optimization problem, Min-Sum Set Cover, where the goal is to quickly find an action that covers a given request drawn from a known distribution. We provide an $8$-competitive algorithm running in polynomial time that chooses actions and decides when to buy information about the underlying request.
Constant Sequence Extension for Fast Search Using Weighted Hamming Distance
Authors: Zhenyu Weng, Huiping Zhuang, Haizhou Li, Zhiping Lin
Abstract
Representing visual data using compact binary codes is attracting increasing attention as binary codes are used as direct indices into hash table(s) for fast non-exhaustive search. Recent methods show that ranking binary codes using weighted Hamming distance (WHD) rather than Hamming distance (HD) by generating query-adaptive weights for each bit can better retrieve query-related items. However, search using WHD is slower than that using HD. One main challenge is that the complexity of extending a monotone increasing sequence using WHD to probe buckets in hash table(s) for existing methods is at least proportional to the square of the sequence length, while that using HD is proportional to the sequence length. To overcome this challenge, we propose a novel fast non-exhaustive search method using WHD. The key idea is to design a constant sequence extension algorithm to perform each sequence extension in constant computational complexity and the total complexity is proportional to the sequence length, which is justified by theoretical analysis. Experimental results show that our method is faster than other WHD-based search methods. Also, compared with the HD-based non-exhaustive search method, our method has comparable efficiency but retrieves more query-related items for the dataset of up to one billion items.
Proximal Symmetric Non-negative Latent Factor Analysis: A Novel Approach to Highly-Accurate Representation of Undirected Weighted Networks
Authors: Yurong Zhong, Zhe Xie, Weiling Li, Xin Luo
Abstract
An Undirected Weighted Network (UWN) is commonly found in big data-related applications. Note that such a network's information connected with its nodes, and edges can be expressed as a Symmetric, High-Dimensional and Incomplete (SHDI) matrix. However, existing models fail in either modeling its intrinsic symmetry or low-data density, resulting in low model scalability or representation learning ability. For addressing this issue, a Proximal Symmetric Nonnegative Latent-factor-analysis (PSNL) model is proposed. It incorporates a proximal term into symmetry-aware and data density-oriented objective function for high representation accuracy. Then an adaptive Alternating Direction Method of Multipliers (ADMM)-based learning scheme is implemented through a Tree-structured of Parzen Estimators (TPE) method for high computational efficiency. Empirical studies on four UWNs demonstrate that PSNL achieves higher accuracy gain than state-of-the-art models, as well as highly competitive computational efficiency.
YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection
Abstract
Accurate polyp detection is essential for assisting clinical rectal cancer diagnoses. Colonoscopy videos contain richer information than still images, making them a valuable resource for deep learning methods. Great efforts have been made to conduct video polyp detection through multi-frame temporal/spatial aggregation. However, unlike common fixed-camera video, the camera-moving scene in colonoscopy videos can cause rapid video jitters, leading to unstable training for existing video detection models. Additionally, the concealed nature of some polyps and the complex background environment further hinder the performance of existing video detectors. In this paper, we propose the \textbf{YONA} (\textbf{Y}ou \textbf{O}nly \textbf{N}eed one \textbf{A}djacent Reference-frame) method, an efficient end-to-end training framework for video polyp detection. YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations. Specifically, for the foreground, YONA adaptively aligns the current frame's channel activation patterns with its adjacent reference frames according to their foreground similarity. For the background, YONA conducts background dynamic alignment guided by inter-frame difference to eliminate the invalid features produced by drastic spatial jitters. Moreover, YONA applies cross-frame contrastive learning during training, leveraging the ground truth bounding box to improve the model's perception of polyp and background. Quantitative and qualitative experiments on three public challenging benchmarks demonstrate that our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.
DashQL -- Complete Analysis Workflows with SQL
Authors: André Kohn, Dominik Moritz, Thomas Neumann
Abstract
We present DashQL, a language that describes complete analysis workflows in self-contained scripts. DashQL combines SQL, the grammar of relational database systems, with a grammar of graphics in a grammar of analytics. It supports preparing and visualizing arbitrarily complex SQL statements in a single coherent language. The proximity to SQL facilitates holistic optimizations of analysis workflows covering data input, encoding, transformations, and visualizations. These optimizations use model and query metadata for visualization-driven aggregation, remote predicate pushdown, and adaptive materialization. We introduce the DashQL language as an extension of SQL and describe the efficient and interactive processing of text-based analysis workflows.
Soft Merging of Experts with Adaptive Routing
Authors: Mohammed Muqeeth, Haokun Liu, Colin Raffel
Abstract
Sparsely activated neural networks with conditional computation learn to route their inputs through different "expert" subnetworks, providing a form of modularity that densely activated models lack. Despite their possible benefits, models with learned routing often underperform their parameter-matched densely activated counterparts as well as models that use non-learned heuristic routing strategies. In this paper, we hypothesize that these shortcomings stem from the gradient estimation techniques used to train sparsely activated models that use non-differentiable discrete routing decisions. To address this issue, we introduce Soft Merging of Experts with Adaptive Routing (SMEAR), which avoids discrete routing by using a single "merged" expert constructed via a weighted average of all of the experts' parameters. By routing activations through a single merged expert, SMEAR does not incur a significant increase in computational costs and enables standard gradient-based training. We empirically validate that models using SMEAR outperform models that route based on metadata or learn sparse routing through gradient estimation. Furthermore, we provide qualitative analysis demonstrating that the experts learned via SMEAR exhibit a significant amount of specialization. All of the code used in our experiments is publicly available.
FAMO: Fast Adaptive Multitask Optimization
Authors: Bo Liu, Yihao Feng, Peter Stone, Qiang Liu
Abstract
One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, gradient descent (GD) on the average loss across all tasks may yield poor multitask performance due to severe under-optimization of certain tasks. Previous approaches that manipulate task gradients for a more balanced loss decrease require storing and computing all task gradients (O(K) space and time where K is the number of tasks), limiting their use in large-scale scenarios. In this work, we introduce Fast Adaptive Multitask Optimization (FAMO), a dynamic weighting method that decreases task losses in a balanced way using O(1) space and time. We conduct an extensive set of experiments covering multi-task supervised and reinforcement learning problems. Our results indicate that FAMO achieves comparable or superior performance to state-of-the-art gradient manipulation techniques while offering significant improvements in space and computational efficiency. Code is available at https://github.com/Cranial-XIX/FAMO.
Pivotuner: automatic real-time pure intonation and microtonal modulation
Abstract
Pivotuner is a VST3/AU MIDI effect plugin that automatically tunes note data in an adaptive pure intonation, in real time. Where previously pure intonation was out of reach for most musicians due to difficulty and impracticality, Pivotuner enables it to be achieved easily and straightforwardly by using novel yet simple algorithms. This may lead to more widespread exploration of pure intonation for a larger and more diverse crowd of musicians! This paper includes a review of prior systems for adaptive pure intonation systems, including Hermode Tuning/Kontakt Dynamic Pure Tuning and Just Intonation. The paper introduces the notion of an adaptive tuning center and how it serves as a flexible underlying concept for multiple tuning algorithms, as well as extensions to offer greater control for performers, including pitch and tuning center locking and resetting, and gradual interpolation between equal temperament and pure intonation. The paper then showcases some pieces which use Pivotuner effectively, then discusses areas for future exploration within Pivotuner's feature set, and plans for future development.
Keyword: efficient
Synthesizing Affective Neurophysiological Signals Using Generative Models: A Review Paper
How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study
Segregated FLS Processing Cores for V/STOL Autonomous Landing Guidance Assistant System using FPGA
DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic Echo Cancellation, Noise Suppression and Dereverberation
Lumos in the Night Sky: AI-enabled Visual Tool for Exploring Night-Time Light Patterns
On the Parameterized Complexity of Computing $st$-Orientations with Few Transitive Edges
A Static Evaluation of Code Completion by Large Language Models
End-to-end Differentiable Clustering with Associative Memories
CONCORD: Clone-aware Contrastive Learning for Source Code
Understanding the Effectiveness of Early Weight Averaging for Training Large Language Models
Construction d'un système de recommandation basé sur des contraintes via des graphes de connaissances
Generating Private Synthetic Data with Genetic Algorithms
Efficient automatic design of robots
Switching Autoregressive Low-rank Tensor Models
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents
Stochastic Multi-Level Compositional Optimization Algorithms over Networks with Level-Independent Convergence Rate
A Robust Likelihood Model for Novelty Detection
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
A sketch-and-project method for solving the matrix equation AXB = C
Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis
Learning Representations on the Unit Sphere: Application to Online Continual Learning
ColdNAS: Search to Modulate for User Cold-Start Recommendation
Generate-then-Retrieve: Intent-Aware FAQ Retrieval in Product Search
DVIS: Decoupled Video Instance Segmentation Framework
Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
GaitGCI: Generative Counterfactual Intervention for Gait Recognition
A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and Touch
Correlated Pseudorandomness from the Hardness of Quasi-Abelian Decoding
Complexity of Anchored Crossing Number and Crossing Number of Almost Planar Graphs
Distributed Flocking Control of Aerial Vehicles Based on a Markov Random Field
Adversarial Attacks and Defenses for Semantic Communication in Vehicular Metaverses
SciLit: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation
State Regularized Policy Optimization on Data with Dynamics Shift
Enabling Efficient Interaction between an Algorithm Agent and an LLM: A Reinforcement Learning Approach
BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs
A Data-Efficient Approach for Long-Term Human Motion Prediction Using Maps of Dynamics
FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping
On Manipulating Signals of User-Item Graph: A Jacobi Polynomial-based Graph Collaborative Filtering
Novel DeepONet architecture to predict stresses in elastoplastic structures with variable complex geometries and loads
Efficient Centrality Maximization with Rademacher Averages
Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?
Human-imperceptible, Machine-recognizable Images
YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection
ESL-SNNs: An Evolutionary Structure Learning Strategy for Spiking Neural Networks
DashQL -- Complete Analysis Workflows with SQL
Numerical solution of the Biot/elasticity interface problem using virtual element methods
Towards Memory-Efficient Training for Extremely Large Output Spaces -- Learning with 500k Labels on a Single Commodity GPU
GMMap: Memory-Efficient Continuous Occupancy Map Using Gaussian Mixture Model
Residual-based error bound for physics-informed neural networks
Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning
MTS2Graph: Interpretable Multivariate Time Series Classification with Temporal Evolving Graphs
Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere
Faster real root decision algorithm for symmetric polynomials
Correction of Errors in Preference Ratings from Automated Metrics for Text Generation
Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation
Fast Context Adaptation in Cost-Aware Continual Learning
Model Spider: Learning to Rank Pre-Trained Models Efficiently
Keyword: faster
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models
Accelerating Range Minimum Queries with Ray Tracing Cores
CoSiNES: Contrastive Siamese Network for Entity Standardization
G-CAME: Gaussian-Class Activation Mapping Explainer for Object Detectors
Rigorous Runtime Analysis of MOEA/D for Solving Multi-Objective Minimum Weight Base Problems
Machine learning in and out of equilibrium
BackpropTools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
Constant Sequence Extension for Fast Search Using Weighted Hamming Distance
Tight Complexity Bounds for Counting Generalized Dominating Sets in Bounded-Treewidth Graphs Part II: Hardness Results
Novel DeepONet architecture to predict stresses in elastoplastic structures with variable complex geometries and loads
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Keyword: mobile
Tracking Evolving labels using Cone based Oracles
Reconstructing human activities via coupling mobile phone data with location-based social networks
A Data-Efficient Approach for Long-Term Human Motion Prediction Using Maps of Dynamics
Towards Scalable Multi-View Reconstruction of Geometry and Materials
Keyword: pruning
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks
ESL-SNNs: An Evolutionary Structure Learning Strategy for Spiking Neural Networks
Bayesian post-hoc regularization of random forests
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Utterance Classification with Logical Neural Network: Explainable AI for Mental Disorder Diagnosis
Keyword: diffusion
SwinRDM: Integrate SwinRNN with Diffusion Model towards High-Resolution and High-Quality Weather Forecasting
An Upwind Finite Difference Method to Singularly Perturbed Convection Diffusion Problems on a Shishkin Mesh
Optimizing Sampling Patterns for Compressed Sensing MRI with Diffusion Generative Models
ISI-Mitigating Character Encoding for Molecular communications via Diffusion
DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views
Change Diffusion: Change Detection Map Generation Based on Difference-Feature Guided DDPM
Protecting the Intellectual Property of Diffusion Models by the Watermark Diffusion Process
DFormer: Diffusion-guided Transformer for Universal Image Segmentation
Logic Diffusion for Knowledge Graph Reasoning
Machine learning in and out of equilibrium
Towards Visual Foundational Models of Physical Scenes
Newly Formed Cities: an AI Curation
Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation
Emergent Correspondence from Image Diffusion
Keyword: adaptive
Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training
shs-nlp at RadSum23: Domain-Adaptive Pre-training of Instruction-tuned LLMs for Radiology Report Impression Generation
Vid2Act: Activate Offline Videos for Visual RL
Boosting Offline Reinforcement Learning with Action Preference Query
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
A Lightweight Method for Tackling Unknown Participation Probabilities in Federated Averaging
Change Diffusion: Change Detection Map Generation Based on Difference-Feature Guided DDPM
GaitGCI: Generative Counterfactual Intervention for Gait Recognition
MetaGait: Learning to Learn an Omni Sample Adaptive Representation for Gait Recognition
The Unscented Kalman Filter for Nonlinear Parameter Identification of Adaptive Cruise Control Systems
Instructive Feature Enhancement for Dichotomous Medical Image Segmentation
A modified combined active-set Newton method for solving phase-field fracture into the monolithic limit
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
dMAPAR-HMM: Reforming Traffic Model for Improving Performance Bound with Stochastic Network Calculus
Measuring User Experience of Adaptive User Interfaces using EEG: A Replication Study
RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations
Buying Information for Stochastic Optimization
Constant Sequence Extension for Fast Search Using Weighted Hamming Distance
Proximal Symmetric Non-negative Latent Factor Analysis: A Novel Approach to Highly-Accurate Representation of Undirected Weighted Networks
YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection
DashQL -- Complete Analysis Workflows with SQL
Soft Merging of Experts with Adaptive Routing
FAMO: Fast Adaptive Multitask Optimization
Pivotuner: automatic real-time pure intonation and microtonal modulation
Keyword: quantization
There is no result