Abstract
A popular paradigm for offline Reinforcement Learning (RL) tasks is to first fit the offline trajectories to a sequence model, and then prompt the model for actions that lead to high expected return. While a common consensus is that more expressive sequence models imply better performance, this paper highlights that tractability, the ability to exactly and efficiently answer various probabilistic queries, plays an equally important role. Specifically, due to the fundamental stochasticity from the offline data-collection policies and the environment dynamics, highly non-trivial conditional/constrained generation is required to elicit rewarding actions. While it is still possible to approximate such queries, we observe that such crude estimates significantly undermine the benefits brought by expressive sequence models. To overcome this problem, this paper proposes Trifle (Tractable Inference for Offline RL), which leverages modern Tractable Probabilistic Models (TPMs) to bridge the gap between good sequence models and high expected returns at evaluation time. Empirically, Trifle achieves the most state-of-the-art scores in 9 Gym-MuJoCo benchmarks against strong baselines. Further, owing to its tractability, Trifle significantly outperforms prior approaches in stochastic environments and safe RL tasks (e.g. with action constraints) with minimum algorithmic modifications.
FairWASP: Fast and Optimal Fair Wasserstein Pre-processing
Authors: Zikai Xiong, Niccolò Dalmasso, Alan Mishler, Vamsi K. Potluru, Tucker Balch, Manuela Veloso
Abstract
Recent years have seen a surge of machine learning approaches aimed at reducing disparities in model outputs across different subgroups. In many settings, training data may be used in multiple downstream applications by different users, which means it may be most effective to intervene on the training data itself. In this work, we present FairWASP, a novel pre-processing approach designed to reduce disparities in classification datasets without modifying the original data. FairWASP returns sample-level weights such that the reweighted dataset minimizes the Wasserstein distance to the original dataset while satisfying (an empirical version of) demographic parity, a popular fairness criterion. We show theoretically that integer weights are optimal, which means our method can be equivalently understood as duplicating or eliminating samples. FairWASP can therefore be used to construct datasets which can be fed into any classification method, not just methods which accept sample weights. Our work is based on reformulating the pre-processing task as a large-scale mixed-integer program (MIP), for which we propose a highly efficient algorithm based on the cutting plane method. Experiments on synthetic datasets demonstrate that our proposed optimization algorithm significantly outperforms state-of-the-art commercial solvers in solving both the MIP and its linear program relaxation. Further experiments highlight the competitive performance of FairWASP in reducing disparities while preserving accuracy in downstream classification settings.
Stochastic Time-Optimal Trajectory Planning for Connected and Automated Vehicles in Mixed-Traffic Merging Scenarios
Authors: Viet-Anh Le, Behdad Chalaki, Filippos N. Tzortzoglou, Andreas A. Malikopoulos
Abstract
Addressing safe and efficient interaction between connected and automated vehicles (CAVs) and human-driven vehicles in a mixed-traffic environment has attracted considerable attention. In this paper, we develop a framework for stochastic time-optimal trajectory planning for coordinating multiple CAVs in mixed-traffic merging scenarios. We present a data-driven model, combining Newell's car-following model with Bayesian linear regression, for efficiently learning the driving behavior of human drivers online. Using the prediction model and uncertainty quantification, a stochastic time-optimal control problem is formulated to find robust trajectories for CAVs. We also integrate a replanning mechanism that determines when deriving new trajectories for CAVs is needed based on the accuracy of the Bayesian linear regression predictions. Finally, we demonstrate the performance of our proposed framework using a realistic simulation environment.
Rethinking the Cloudonomics of Efficient I/O for Data-Intensive Analytics Applications
Authors: Chunxu Tang, Yi Wang, Bin Fan, Beinan Wang, Shouwei Chen, Ziyue Qiu, Chen Liang, Jing Zhao, Yu Zhu, Mingmin Chen, Zhongting Hu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)
Abstract
This paper explores a prevailing trend in the industry: migrating data-intensive analytics applications from on-premises to cloud-native environments. We find that the unique cost models associated with cloud-based storage necessitate a more nuanced understanding of optimizing performance. Specifically, based on traces collected from Uber's Presto fleet in production, we argue that common I/O optimizations, such as table scan and filter, and broadcast join, may lead to unexpected costs when naively applied in the cloud. This is because traditional I/O optimizations mainly focus on improving throughput or latency in on-premises settings, without taking into account the monetary costs associated with storage API calls. In cloud environments, these costs can be significant, potentially involving billions of API calls per day just for Presto workloads at Uber scale. Presented as a case study, this paper serves as a starting point for further research to design efficient I/O strategies specifically tailored for data-intensive applications in cloud settings.
Consistent Video-to-Video Transfer Using Synthetic Dataset
Authors: Jiaxin Cheng, Tianjun Xiao, Tong He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning. At the core of our approach is a synthetic paired video dataset tailored for video-to-video transfer tasks. Inspired by Instruct Pix2Pix's image transfer via editing instruction, we adapt this paradigm to the video domain. Extending the Prompt-to-Prompt to videos, we efficiently generate paired samples, each with an input video and its edited counterpart. Alongside this, we introduce the Long Video Sampling Correction during sampling, ensuring consistent long videos across batches. Our method surpasses current methods like Tune-A-Video, heralding substantial progress in text-based video-to-video editing and suggesting exciting avenues for further exploration and deployment.
DistDNAS: Search Efficient Feature Interactions within 2 Hours
Authors: Tunhou Zhang, Wei Wen, Igor Fedorov, Xi Liu, Buyun Zhang, Fangqiu Han, Wen-Yen Chen, Yiping Han, Feng Yan, Hai Li, Yiran Chen
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. On large-scale benchmarks, searching for the optimal feature interaction design requires extensive cost due to the sequential workflow on the large volume of data. In addition, fusing interactions of various sources, orders, and mathematical operations introduces potential conflicts and additional redundancy toward recommender models, leading to sub-optimal trade-offs in performance and serving cost. In this paper, we present DistDNAS as a neat solution to brew swift and efficient feature interaction design. DistDNAS proposes a supernet to incorporate interaction modules of varying orders and types as a search space. To optimize search efficiency, DistDNAS distributes the search and aggregates the choice of optimal interaction modules on varying data dates, achieving over 25x speed-up and reducing search cost from 2 days to 2 hours. To optimize serving efficiency, DistDNAS introduces a differentiable cost-aware loss to penalize the selection of redundant interaction modules, enhancing the efficiency of discovered feature interactions in serving. We extensively evaluate the best models crafted by DistDNAS on a 1TB Criteo Terabyte dataset. Experimental evaluations demonstrate 0.001 AUC improvement and 60% FLOPs saving over current state-of-the-art CTR models.
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Authors: Ran Xu, Hejie Cui, Yue Yu, Xuan Kan, Wenqi Shi, Yuchen Zhuang, Wei Jin, Joyce Ho, Carl Yang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Abstract
Clinical natural language processing requires methods that can address domain-specific challenges, such as complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we delve into synthetic clinical text generation using LLMs for clinical NLP tasks. We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Our extensive empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks, effectively aligning the distribution of real datasets and significantly enriching the diversity of generated training instances. We will publish our code and all the generated data in \url{https://github.com/ritaranx/ClinGen}.
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding
Abstract
The exploration of brain activity and its decoding from fMRI data has been a longstanding pursuit, driven by its potential applications in brain-computer interfaces, medical diagnostics, and virtual reality. Previous approaches have primarily focused on individual subject analysis, highlighting the need for a more universal and adaptable framework, which is the core motivation behind our work. In this work, we propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training, with a focus on addressing the challenges of varying fMRI data dimensions due to individual brain differences. Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving distinct brain activity patterns. We introduce a novel learning strategy tailored for pre-training 2D fMRI images, enhancing the quality of reconstruction. fMRI-PTE's adaptability with image generators enables the generation of well-represented fMRI features, facilitating various downstream tasks, including within-subject and cross-subject brain activity decoding. Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach. Extensive experiments validate and support our claims, offering a promising foundation for further research in this domain.
Towards Omni-supervised Referring Expression Segmentation
Authors: Minglang Huang, Yiyi Zhou, Gen Luo, Guannan Jiang, Weilin Zhuang, Xiaoshuai Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.
AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification
Abstract
Recent work has found that few-shot sentence classification based on pre-trained Sentence Encoders (SEs) is efficient, robust, and effective. In this work, we investigate strategies for domain-specialization in the context of few-shot sentence classification with SEs. We first establish that unsupervised Domain-Adaptive Pre-Training (DAPT) of a base Pre-trained Language Model (PLM) (i.e., not an SE) substantially improves the accuracy of few-shot sentence classification by up to 8.4 points. However, applying DAPT on SEs, on the one hand, disrupts the effects of their (general-domain) Sentence Embedding Pre-Training (SEPT). On the other hand, applying general-domain SEPT on top of a domain-adapted base PLM (i.e., after DAPT) is effective but inefficient, since the computationally expensive SEPT needs to be executed on top of a DAPT-ed PLM of each domain. As a solution, we propose AdaSent, which decouples SEPT from DAPT by training a SEPT adapter on the base PLM. The adapter can be inserted into DAPT-ed PLMs from any domain. We demonstrate AdaSent's effectiveness in extensive experiments on 17 different few-shot sentence classification datasets. AdaSent matches or surpasses the performance of full SEPT on DAPT-ed PLM, while substantially reducing the training costs. The code for AdaSent is available.
Efficient Human-AI Coordination via Preparatory Language-based Convention
Authors: Cong Guan, Lichao Zhang, Chunpeng Fan, Yichen Li, Feng Chen, Lihe Li, Yunjia Tian, Lei Yuan, Yang Yu
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
Developing intelligent agents capable of seamless coordination with humans is a critical step towards achieving artificial general intelligence. Existing methods for human-AI coordination typically train an agent to coordinate with a diverse set of policies or with human models fitted from real human data. However, the massively diverse styles of human behavior present obstacles for AI systems with constrained capacity, while high quality human data may not be readily available in real-world scenarios. In this study, we observe that prior to coordination, humans engage in communication to establish conventions that specify individual roles and actions, making their coordination proceed in an orderly manner. Building upon this observation, we propose employing the large language model (LLM) to develop an action plan (or equivalently, a convention) that effectively guides both human and AI. By inputting task requirements, human preferences, the number of agents, and other pertinent information into the LLM, it can generate a comprehensive convention that facilitates a clear understanding of tasks and responsibilities for all parties involved. Furthermore, we demonstrate that decomposing the convention formulation problem into sub-problems with multiple new sessions being sequentially employed and human feedback, will yield a more efficient coordination convention. Experimental evaluations conducted in the Overcooked-AI environment, utilizing a human proxy model, highlight the superior performance of our proposed method compared to existing learning-based approaches. When coordinating with real humans, our method achieves better alignment with human preferences and an average performance improvement of 15% compared to the state-of-the-art.
A cost-benefit source-receptor framework for implementation of Blue-Green flood risk management
Authors: Christos Iliadis, Vassilis Glenis, Chris Kilsby
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
As floods are a major and growing source of risk in urban areas, there is a necessity to improve flood risk management frameworks and civil protection through planning interventions that modify surface flow pathways and introduce storage. Despite the complexity of densely urbanised areas, modern flood models can represent urban features and flow characteristics to help researchers, local authorities, and insurance companies to develop and improve efficient flood risk frameworks to achieve resilience in cities. A cost-benefit driven source-receptor flood risk framework is developed in this study to identify (1) locations contributing to surface flooding (sources), (2) buildings and locations at high flood risk (receptors), (3) the cost-benefit nexus between the source and the receptor, and finally (4) ways to mitigate flooding at the receptor by adding Blue-Green Infrastructure (BGI) in critical locations. The analysis is based on five steps to identify the source and the receptor in a study area based on the flood exposure of buildings, damages arising from flooding and available green spaces with the best potential to add sustainable and resilient solutions to reduce flooding. The framework was developed using the detailed hydrodynamic model CityCAT in a case study of the city centre of Newcastle upon Tyne, UK. The novelty of this analysis is that firstly, multiple storm magnitudes (i.e. small and large floods) are used combined with a method to locate the areas and the buildings at flood risk and a prioritized set of best places to add interventions upstream and downstream. Secondly, planning decisions are informed by considering the benefit from reduced damages to properties and the cost to construct resilient BGI options rather than a restricted hydraulic analysis considering only flood depths and storages in isolation from real-world economics.
NEO-KD: Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks
Authors: Seokil Ham, Jungwuk Park, Dong-Jun Han, Jaekyun Moon
Abstract
While multi-exit neural networks are regarded as a promising solution for making efficient inference via early exits, combating adversarial attacks remains a challenging problem. In multi-exit networks, due to the high dependency among different submodels, an adversarial example targeting a specific exit not only degrades the performance of the target exit but also reduces the performance of all other exits concurrently. This makes multi-exit networks highly vulnerable to simple adversarial attacks. In this paper, we propose NEO-KD, a knowledge-distillation-based adversarial training strategy that tackles this fundamental challenge based on two key contributions. NEO-KD first resorts to neighbor knowledge distillation to guide the output of the adversarial examples to tend to the ensemble outputs of neighbor exits of clean data. NEO-KD also employs exit-wise orthogonal knowledge distillation for reducing adversarial transferability across different submodels. The result is a significantly improved robustness against adversarial attacks. Experimental results on various datasets/models show that our method achieves the best adversarial accuracy with reduced computation budgets, compared to the baselines relying on existing adversarial training or knowledge distillation techniques for multi-exit networks.
Untangling Graphs on Surfaces
Authors: Éric Colin de Verdière, Vincent Despré, Loïc Dubois
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)
Abstract
Consider a graph drawn on a surface (for example, the plane minus a finite set of obstacle points), possibly with crossings. We provide an algorithm to decide whether such a drawing can be untangled, namely, if one can slide the vertices and edges of the graph on the surface (avoiding the obstacles) to remove all crossings; in other words, whether the drawing is homotopic to an embedding. While the problem boils down to planarity testing when the surface is the sphere or the disk (or equivalently the plane without any obstacle), the other cases have never been studied before, except when the input graph is a cycle, in an abundant literature in topology and more recently by Despr\'e and Lazarus [SoCG 2017, J. ACM 2019]. Our algorithm runs in O(m + poly(g+b) n log n) time, where g >= 0 and b >= 0 are the genus and the number of boundary components of the input orientable surface S, and n is the size of the input graph drawing, lying on some fixed graph of size m cellularly embedded on S. We use various techniques from two-dimensional computational topology and from the theory of hyperbolic surfaces. Most notably, we introduce reducing triangulations, a novel discrete analog of hyperbolic surfaces in the spirit of systems of quads by Lazarus and Rivaud [FOCS 2012] and Erickson and Whittlesey [SODA 2013], which have the additional benefit that reduced paths are unique and stable upon reversal; they are likely of independent interest. Tailored data structures are needed to achieve certain homotopy tests efficiently on these triangulations. As a key subroutine, we rely on an algorithm to test the weak simplicity of a graph drawn on a surface by Akitaya, Fulek, and T\'oth [SODA 2018, TALG 2019].
Abstract
Artificial Intelligence (AI) has achieved significant advancements in technology and research with the development over several decades, and is widely used in many areas including computing vision, natural language processing, time-series analysis, speech synthesis, etc. During the age of deep learning, especially with the arise of Large Language Models, a large majority of researchers' attention is paid on pursuing new state-of-the-art (SOTA) results, resulting in ever increasing of model size and computational complexity. The needs for high computing power brings higher carbon emission and undermines research fairness by preventing small or medium-sized research institutions and companies with limited funding in participating in research. To tackle the challenges of computing resources and environmental impact of AI, Green Computing has become a hot research topic. In this survey, we give a systematic overview of the technologies used in Green Computing. We propose the framework of Green Computing and devide it into four key components: (1) Measures of Greenness, (2) Energy-Efficient AI, (3) Energy-Efficient Computing Systems and (4) AI Use Cases for Sustainability. For each components, we discuss the research progress made and the commonly used techniques to optimize the AI efficiency. We conclude that this new research direction has the potential to address the conflicts between resource constraints and AI development. We encourage more researchers to put attention on this direction and make AI more environmental friendly.
Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design
Abstract
Multi-cellular robot design aims to create robots comprised of numerous cells that can be efficiently controlled to perform diverse tasks. Previous research has demonstrated the ability to generate robots for various tasks, but these approaches often optimize robots directly in the vast design space, resulting in robots with complicated morphologies that are hard to control. In response, this paper presents a novel coarse-to-fine method for designing multi-cellular robots. Initially, this strategy seeks optimal coarse-grained robots and progressively refines them. To mitigate the challenge of determining the precise refinement juncture during the coarse-to-fine transition, we introduce the Hyperbolic Embeddings for Robot Design (HERD) framework. HERD unifies robots of various granularity within a shared hyperbolic space and leverages a refined Cross-Entropy Method for optimization. This framework enables our method to autonomously identify areas of exploration in hyperbolic space and concentrate on regions demonstrating promise. Finally, the extensive empirical studies on various challenging tasks sourced from EvoGym show our approach's superior efficiency and generalization capability.
Intriguing Properties of Data Attribution on Diffusion Models
Authors: Xiaosen Zheng, Tianyu Pang, Chao Du, Jing Jiang, Min Lin
Abstract
Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK.
Abstract
Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the astronomical amount of model parameters, which requires a demand for large memory capacity and high memory bandwidth. In this paper, we propose an effective approach that can make the deployment of LLMs more efficiently. We support an automatic INT4 weight-only quantization flow and design a special LLM runtime with highly-optimized kernels to accelerate the LLM inference on CPUs. We demonstrate the general applicability of our approach on popular LLMs including Llama2, Llama, GPT-NeoX, and showcase the extreme inference efficiency on CPUs. The code is publicly available at: https://github.com/intel/intel-extension-for-transformers.
Experimental Validation of a Grid-Aware Optimal Control of Hybrid AC/DC Microgrids
Authors: Willem Lambrichts, Jules Mace, Mario Paolone
Abstract
This paper presents the experimental validation of a grid-aware real-time control method for hybrid AC/DC microgrids. The optimal control is leveraged by the voltage sensitivity coefficients (SC) that are computed analytically using the close-form expression proposed in the authors' previous work. The SCs are based on the unified power flow model for hybrid AC/DC grids that accounts for the AC grid, DC grid, and the Interfacing Converters (IC), which can operate in different control modes, e.g. voltage or power control. The SCs are used to express the grid constraints in the optimal control problem in a fully linear way and, therefore, allow for second- to subsecond control actions. The validation of the model is performed on the hybrid AC/DC grid, available at the EPFL. The network consists of 18 AC nodes, 8 DC nodes, and 4 converters to interface the AC and DC network. The network hosts multiple controllable and uncontrollable resources. The SC-based optimal control is validated in a generic experiment. It is shown that the real-time control is able to control the ICs optimally to redirect power through the DC grid, to avoid grid constraint violations while providing reactive power support to the upper layer AC grid. Furthermore, the computational time of the optimal control is analysed to validate its application in critical real-time applications.
Tackling the Abstraction and Reasoning Corpus (ARC) with Object-centric Models and the MDL Principle
Abstract
The Abstraction and Reasoning Corpus (ARC) is a challenging benchmark, introduced to foster AI research towards human-level intelligence. It is a collection of unique tasks about generating colored grids, specified by a few examples only. In contrast to the transformation-based programs of existing work, we introduce object-centric models that are in line with the natural programs produced by humans. Our models can not only perform predictions, but also provide joint descriptions for input/output pairs. The Minimum Description Length (MDL) principle is used to efficiently search the large model space. A diverse range of tasks are solved, and the learned models are similar to the natural programs. We demonstrate the generality of our approach by applying it to a different domain.
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Abstract
LLaVA-Interactive is a research prototype for multimodal human-AI interaction. The system can have multi-turn dialogues with human users by taking multimodal user inputs and generating multimodal responses. Importantly, LLaVA-Interactive goes beyond language prompt, where visual prompt is enabled to align human intents in the interaction. The development of LLaVA-Interactive is extremely cost-efficient as the system combines three multimodal skills of pre-built AI models without additional model training: visual chat of LLaVA, image segmentation from SEEM, as well as image generation and editing from GLIGEN. A diverse set of application scenarios is presented to demonstrate the promises of LLaVA-Interactive and to inspire future research in multimodal interactive systems.
Revealing CNN Architectures via Side-Channel Analysis in Dataflow-based Inference Accelerators
Abstract
Convolution Neural Networks (CNNs) are widely used in various domains. Recent advances in dataflow-based CNN accelerators have enabled CNN inference in resource-constrained edge devices. These dataflow accelerators utilize inherent data reuse of convolution layers to process CNN models efficiently. Concealing the architecture of CNN models is critical for privacy and security. This paper evaluates memory-based side-channel information to recover CNN architectures from dataflow-based CNN inference accelerators. The proposed attack exploits spatial and temporal data reuse of the dataflow mapping on CNN accelerators and architectural hints to recover the structure of CNN models. Experimental results demonstrate that our proposed side-channel attack can recover the structures of popular CNN models, namely Lenet, Alexnet, and VGGnet16.
Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value
Authors: Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)
Abstract
We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of some game, and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm, which solves a convex optimization problem with linear constraints and then performs random perturbation, to obtain a modification plan with a near-optimal cost.
Structure Learning with Adaptive Random Neighborhood Informed MCMC
Authors: Alberto Caron, Xitong Liang, Samuel Livingstone, Jim Griffin
Abstract
In this paper, we introduce a novel MCMC sampler, PARNI-DAG, for a fully-Bayesian approach to the problem of structure learning under observational data. Under the assumption of causal sufficiency, the algorithm allows for approximate sampling directly from the posterior distribution on Directed Acyclic Graphs (DAGs). PARNI-DAG performs efficient sampling of DAGs via locally informed, adaptive random neighborhood proposal that results in better mixing properties. In addition, to ensure better scalability with the number of nodes, we couple PARNI-DAG with a pre-tuning procedure of the sampler's parameters that exploits a skeleton graph derived through some constraint-based or scoring-based algorithms. Thanks to these novel features, PARNI-DAG quickly converges to high-probability regions and is less likely to get stuck in local modes in the presence of high correlation between nodes in high-dimensional settings. After introducing the technical novelties in PARNI-DAG, we empirically demonstrate its mixing efficiency and accuracy in learning DAG structures on a variety of experiments.
Understanding the Issues and Causes in WebAssembly Application Development: A Mining-based Study
Authors: Muhammad Waseem, Teerath Das, Aakash Ahmad, Peng Liang, Tommi Mikkonen
Abstract
WebAssembly (Wasm) is a binary instruction format designed for secure and efficient execution within sandboxed environments - predominantly web apps and browsers - to facilitate performance, security, and flexibility of web programming languages. In recent years, Wasm has gained significant attention from academic research community and industrial development projects to engineer high-performance web applications. Despite the offered benefits, developers encounter a multitude of issues rooted in Wasm (e.g., faults, errors, failures) and are often unaware of their root-causes that impact the development of web applications. Wasm developers require knowledge, documented as empirically rooted guidelines, patterns, documents etc., that help them to understand, analyse, and resolve the issues that currently lacks in existing research and practice. To this end, we conducted an empirical study that mines and documents practitioners' knowledge expressed as 385 issues from 12 open-source Wasm projects deployed on GitHub and 354 question-answer posts via Stack Overflow. Our study led to the first-of-its-kind taxonomies of issues faced by developers and their underlying causes in Wasm-based applications. Issues faced by developers arise from 'Infrastructure, Integration and Compatibility Aspects' (28.16%), 'Language Features and Documentation Errors' (18.00%), along with 'Code Implementation and Build failures' (13.83%). The results indicate that 'Syntactic and Semantic Errors' (25.77%), 'Configuration and Compatibility Constraints' (20.1%), and 'Operational Limitations' (12.98%) are the principal causes of these issues. The study provides a taxonomical classification of issues and their causes, offering empirically derived guidelines, that can inform researchers and developers to systematically design, develop, and refactor Wasm-based applications.
Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving
Authors: Zhan Ling, Yunhao Fang, Xuanlin Li, Tongzhou Mu, Mingu Lee, Reza Pourreza, Roland Memisevic, Hao Su
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space. In this work, we unleash LLMs' creative potential for exploring multiple diverse problem solving strategies by framing an LLM as a hierarchical policy via in-context learning. This policy comprises of a visionary leader that proposes multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instruction. The follower uses each of the leader's directives as a guide and samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal. Additionally, we propose an effective and efficient tournament-based approach to select among these explored solution groups to reach the final answer. Our approach produces meaningful and inspiring hints, enhances problem-solving strategy exploration, and improves the final answer accuracy on challenging problems in the MATH dataset. Code will be released at https://github.com/lz1oceani/LLM-As-Hierarchical-Policy.
Decision Support Framework for Home Health Caregiver Allocation: A Case Study of HHC Agency in Tennessee, USA
Authors: Seyed Mohammad Ebrahim Sharifnia, Faezeh Bagheri, Rupy Sawhney, John E. Kobza, Enrique Macias De Anda, Mostafa Hajiaghaei-Keshteli, Michael Mirrielees
Abstract
Population aging is a global challenge, leading to increased demand for healthcare and social services for the elderly. Home Health Care (HHC) emerges as a vital solution, specifically designed to serve this population segment. Given the surging demand for HHC, it's essential to coordinate and regulate caregiver allocation efficiently. This is crucial for both budget-optimized planning and ensuring the delivery of high-quality care. This research addresses a key question faced by home health agencies (HHAs): "How can caregiver allocation be optimized, especially when caregivers prefer flexibility in their visiting sequences?". While earlier studies proposed rigid visiting sequences, our study introduces a decision support framework that allocates caregivers through a hybrid method that considers the flexibility in visiting sequences and aims to reduce travel mileage, increase the number of visits per planning period, and maintain the continuity of care - a critical metric for patient satisfaction. Utilizing data from an HHA in Tennessee, United States, our approach led to an impressive reduction in average travel mileage (up to 42% depending on discipline) without imposing restrictions on caregivers. Furthermore, the proposed framework is used for caregivers' supply analysis to provide valuable insights into caregiver resource management.
Abstract
We bound the smoothed running time of the FLIP algorithm for local Max-Cut as a function of $\alpha$, the arboricity of the input graph. We show that, with high probability, the following holds (where $n$ is the number of nodes and $\phi$ is the smoothing parameter): 1) When $\alpha = O(\sqrt{\log n})$ FLIP terminates in $\phi poly(n)$ iterations. Previous to our results the only graph families for which FLIP was known to achieve a smoothed polynomial running time were complete graphs and graphs with logarithmic maximum degree. 2) For arbitrary values of $\alpha$ we get a running time of $\phi n^{O(\frac{\alpha}{\log n} + \log \alpha)}$. This improves over the best known running time for general graphs of $\phi n^{O(\sqrt{ \log n })}$ for $\alpha = o(\log^{1.5} n)$. Specifically, when $\alpha = O(\log n)$ we get a significantly faster running time of $\phi n^{O(\log \log n)}$.
Design, Modeling, and Control of a Low-Cost and Rapid Response Soft-Growing Manipulator for Orchard Operations
Authors: Ryan Dorosh, Justin Allen, Zixuan He, Christopher Ninatanta, Jack Coleman, Jack Spieker, Ethan Tuck, Jordan Kurtz, Qin Zhang, Matthew D. Whiting, Jiecai Luo, Manoj Karkee, Ming Luo
Abstract
Tree fruit growers around the world are facing labor shortages for critical operations, including harvest and pruning. There is a great interest in developing robotic solutions for these labor-intensive tasks, but current efforts have been prohibitively costly, slow, or require a reconfiguration of the orchard in order to function. In this paper, we introduce an alternative approach to robotics using a novel and low-cost soft-growing robotic platform. Our platform features the ability to extend up to 1.2 m linearly at a maximum speed of 0.27 m/s. The soft-growing robotic arm can operate with a terminal payload of up to 1.4 kg (4.4 N), more than sufficient for carrying an apple. This platform decouples linear and steering motions to simplify path planning and the controller design for targeting. We anticipate our platform being relatively simple to maintain compared to rigid robotic arms. Herein we also describe and experimentally verify the platform's kinematic model, including the prediction of the relationship between the steering angle and the angular positions of the three steering motors. Information from the model enables the position controller to guide the end effector to the targeted positions faster and with higher stability than without this information. Overall, our research show promise for using soft-growing robotic platforms in orchard operations.
Domain decomposition-based coupling of physics-informed neural networks via the Schwarz alternating method
Authors: Will Snyder, Irina Tezaur, Christopher Wentland
Abstract
Physics-informed neural networks (PINNs) are appealing data-driven tools for solving and inferring solutions to nonlinear partial differential equations (PDEs). Unlike traditional neural networks (NNs), which train only on solution data, a PINN incorporates a PDE's residual into its loss function and trains to minimize the said residual at a set of collocation points in the solution domain. This paper explores the use of the Schwarz alternating method as a means to couple PINNs with each other and with conventional numerical models (i.e., full order models, or FOMs, obtained via the finite element, finite difference or finite volume methods) following a decomposition of the physical domain. It is well-known that training a PINN can be difficult when the PDE solution has steep gradients. We investigate herein the use of domain decomposition and the Schwarz alternating method as a means to accelerate the PINN training phase. Within this context, we explore different approaches for imposing Dirichlet boundary conditions within each subdomain PINN: weakly through the loss and/or strongly through a solution transformation. As a numerical example, we consider the one-dimensional steady state advection-diffusion equation in the advection-dominated (high Peclet) regime. Our results suggest that the convergence of the Schwarz method is strongly linked to the choice of boundary condition implementation within the PINNs being coupled. Surprisingly, strong enforcement of the Schwarz boundary conditions does not always lead to a faster convergence of the method. While it is not clear from our preliminary study that the PINN-PINN coupling via the Schwarz alternating method accelerates PINN convergence in the advection-dominated regime, it reveals that PINN training can be improved substantially for Peclet numbers as high as 1e6 by performing a PINN-FOM coupling.
EdgeDis: Enabling Fast, Economical, and Reliable Data Dissemination for Mobile Edge Computing
Authors: Bo Li, Qiang He, Feifei Chen, Lingjuan Lyu, Athman Bouguettaya, Yun Yang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
Abstract
Mobile edge computing (MEC) enables web data caching in close geographic proximity to end users. Popular data can be cached on edge servers located less than hundreds of meters away from end users. This ensures bounded latency guarantees for various latency-sensitive web applications. However, transmitting a large volume of data out of the cloud onto many geographically-distributed web servers individually can be expensive. In addition, web content dissemination may be interrupted by various intentional and accidental events in the volatile MEC environment, which undermines dissemination efficiency and subsequently incurs extra transmission costs. To tackle the above challenges, we present a novel scheme named EdgeDis that coordinates data dissemination by distributed consensus among those servers. We analyze EdgeDis's validity theoretically and evaluate its performance experimentally. Results demonstrate that compared with baseline and state-of-the-art schemes, EdgeDis: 1) is 5.97x - 7.52x faster; 2) reduces dissemination costs by 48.21% to 91.87%; and 3) reduces performance loss caused by dissemination failures by up to 97.30% in time and 96.35% in costs.
Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
Authors: Min Jae Jung, Seung Dae Han, Joohee Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically, we propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN by introducing Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL). The former adapts CLIP, which performs zero-shot classification, to re-score the classification scores of a detector using image-class similarities, the latter is modified classification loss considering the punishment for fake backgrounds as well as confusing categories on a generalized few-shot object detection dataset. Extensive experiments on MS-COCO and PASCAL VOC show that the proposed RISF substantially outperforms the state-of-the-art approaches. The code will be available.
Gaze-based Learning from Demonstration In Surgical Robotics
Authors: A.E. Abdelaal, S.N. Zaman, P.Y Chen, T. Suzuki, J. Ingleton
Abstract
Surgical robotics is a rising field in medical technology and advanced robotics. Robot assisted surgery, or robotic surgery, allows surgeons to perform complicated surgical tasks with more precision, automation, and flexibility than is possible for traditional surgical approaches. The main type of robot assisted surgery is minimally invasive surgery, which could be automated and result in a faster healing time for the patient. The surgical robot we are particularly interested in is the da Vinci surgical system, which is developed and manufactured by Intuitive Surgical. In the current iteration of the system, the endoscopic camera arm on the da Vinci robot has to be manually controlled and calibrated by the surgeon during a surgical task, which interrupts the flow of the operation. The main goal of this capstone project is to automate the motion of the camera arm using a probabilistic model based on surgeon eye gaze data and da Vinci robot kinematic data.
Federated Topic Model and Model Pruning Based on Variational Autoencoder
Authors: Chengjie Ma, Yawen Li, Meiyu Liang, Ang Li
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Abstract
Topic modeling has emerged as a valuable tool for discovering patterns and topics within large collections of documents. However, when cross-analysis involves multiple parties, data privacy becomes a critical concern. Federated topic modeling has been developed to address this issue, allowing multiple parties to jointly train models while protecting pri-vacy. However, there are communication and performance challenges in the federated sce-nario. In order to solve the above problems, this paper proposes a method to establish a federated topic model while ensuring the privacy of each node, and use neural network model pruning to accelerate the model, where the client periodically sends the model neu-ron cumulative gradients and model weights to the server, and the server prunes the model. To address different requirements, two different methods are proposed to determine the model pruning rate. The first method involves slow pruning throughout the entire model training process, which has limited acceleration effect on the model training process, but can ensure that the pruned model achieves higher accuracy. This can significantly reduce the model inference time during the inference process. The second strategy is to quickly reach the target pruning rate in the early stage of model training in order to accelerate the model training speed, and then continue to train the model with a smaller model size after reaching the target pruning rate. This approach may lose more useful information but can complete the model training faster. Experimental results show that the federated topic model pruning based on the variational autoencoder proposed in this paper can greatly accelerate the model training speed while ensuring the model's performance.
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Authors: Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
As the size of pre-trained speech recognition models increases, running these large models in low-latency or resource-constrained environments becomes challenging. In this work, we leverage pseudo-labelling to assemble a large-scale open-source dataset which we use to distill the Whisper model into a smaller variant, called Distil-Whisper. Using a simple word error rate (WER) heuristic, we select only the highest quality pseudo-labels for training. The distilled model is 5.8 times faster with 51% fewer parameters, while performing to within 1% WER on out-of-distribution test data in a zero-shot transfer setting. Distil-Whisper maintains the robustness of the Whisper model to difficult acoustic conditions, while being less prone to hallucination errors on long-form audio. Distil-Whisper is designed to be paired with Whisper for speculative decoding, yielding a 2 times speed-up while mathematically ensuring the same outputs as the original model. To facilitate further research in this domain, we make our training code, inference code and models publicly accessible.
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
Authors: Runa Eschenhagen, Alexander Immer, Richard E. Turner, Frank Schneider, Philipp Hennig
Abstract
The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimisation method, has shown promise to speed up neural network training and thereby reduce computational costs. However, there is currently no framework to apply it to generic architectures, specifically ones with linear weight-sharing layers. In this work, we identify two different settings of linear weight-sharing layers which motivate two flavours of K-FAC -- $\textit{expand}$ and $\textit{reduce}$. We show that they are exact for deep linear networks with weight-sharing in their respective setting. Notably, K-FAC-reduce is generally faster than K-FAC-expand, which we leverage to speed up automatic hyperparameter selection via optimising the marginal likelihood for a Wide ResNet. Finally, we observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer. However, both variations are able to reach a fixed validation metric target in $50$-$75\%$ of the number of steps of a first-order reference run, which translates into a comparable improvement in wall-clock time. This highlights the potential of applying K-FAC to modern neural network architectures.
Keyword: mobile
Assessing Mobile Application Privacy: A Quantitative Framework for Privacy Measurement
Authors: Joao Marono, Catarina Silva, Joao P. Barraca, Vitor Cunha, Paulo Salvador
Abstract
The proliferation of mobile applications and the subsequent sharing of personal data with service and application providers have given rise to substantial privacy concerns. Application marketplaces have introduced mechanisms to conform to regulations and provide individuals with control over their data. However, a notable absence persists regarding clear indications, labels or scores elucidating the privacy implications of these applications. In response to this challenge, this paper introduces a privacy quantification framework. The purpose of this framework is to systematically evaluate the level of privacy risk when using particular Android applications. The main goal is to provide individuals with qualitative labels to make informed decisions about their privacy. This work aims to contribute to a digital environment that prioritizes privacy, promotes informed decision-making, and endorses the privacy-preserving design principles incorporation.
Large-Scale Multi-Robot Assembly Planning for Autonomous Manufacturing
Authors: Kyle Brown, Dylan M. Asmar, Mac Schwager, Mykel J. Kochenderfer
Abstract
Mobile autonomous robots have the potential to revolutionize manufacturing processes. However, employing large robot fleets in manufacturing requires addressing challenges including collision-free movement in a shared workspace, effective multi-robot collaboration to manipulate and transport large payloads, complex task allocation due to coupled manufacturing processes, and spatial planning for parallel assembly and transportation of nested subassemblies. We propose a full algorithmic stack for large-scale multi-robot assembly planning that addresses these challenges and can synthesize construction plans for complex assemblies with thousands of parts in a matter of minutes. Our approach takes in a CAD-like product specification and automatically plans a full-stack assembly procedure for a group of robots to manufacture the product. We propose an algorithmic stack that comprises: (i) an iterative radial layout optimization procedure to define a global staging layout for the manufacturing facility, (ii) a graph-repair mixed-integer program formulation and a modified greedy task allocation algorithm to optimally allocate robots and robot sub-teams to assembly and transport tasks, (iii) a geometric heuristic and a hill-climbing algorithm to plan collaborative carrying configurations of robot sub-teams, and (iv) a distributed control policy that enables robots to execute the assembly motion plan collision-free. We also present an open-source multi-robot manufacturing simulator implemented in Julia as a resource to the research community, to test our algorithms and to facilitate multi-robot manufacturing research more broadly. Our empirical results demonstrate the scalability and effectiveness of our approach by generating plans to manufacture a LEGO model of a Saturn V launch vehicle with 1845 parts, 306 subassemblies, and 250 robots in under three minutes on a standard laptop computer.
EdgeDis: Enabling Fast, Economical, and Reliable Data Dissemination for Mobile Edge Computing
Authors: Bo Li, Qiang He, Feifei Chen, Lingjuan Lyu, Athman Bouguettaya, Yun Yang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
Abstract
Mobile edge computing (MEC) enables web data caching in close geographic proximity to end users. Popular data can be cached on edge servers located less than hundreds of meters away from end users. This ensures bounded latency guarantees for various latency-sensitive web applications. However, transmitting a large volume of data out of the cloud onto many geographically-distributed web servers individually can be expensive. In addition, web content dissemination may be interrupted by various intentional and accidental events in the volatile MEC environment, which undermines dissemination efficiency and subsequently incurs extra transmission costs. To tackle the above challenges, we present a novel scheme named EdgeDis that coordinates data dissemination by distributed consensus among those servers. We analyze EdgeDis's validity theoretically and evaluate its performance experimentally. Results demonstrate that compared with baseline and state-of-the-art schemes, EdgeDis: 1) is 5.97x - 7.52x faster; 2) reduces dissemination costs by 48.21% to 91.87%; and 3) reduces performance loss caused by dissemination failures by up to 97.30% in time and 96.35% in costs.
Keyword: pruning
Design, Modeling, and Control of a Low-Cost and Rapid Response Soft-Growing Manipulator for Orchard Operations
Authors: Ryan Dorosh, Justin Allen, Zixuan He, Christopher Ninatanta, Jack Coleman, Jack Spieker, Ethan Tuck, Jordan Kurtz, Qin Zhang, Matthew D. Whiting, Jiecai Luo, Manoj Karkee, Ming Luo
Abstract
Tree fruit growers around the world are facing labor shortages for critical operations, including harvest and pruning. There is a great interest in developing robotic solutions for these labor-intensive tasks, but current efforts have been prohibitively costly, slow, or require a reconfiguration of the orchard in order to function. In this paper, we introduce an alternative approach to robotics using a novel and low-cost soft-growing robotic platform. Our platform features the ability to extend up to 1.2 m linearly at a maximum speed of 0.27 m/s. The soft-growing robotic arm can operate with a terminal payload of up to 1.4 kg (4.4 N), more than sufficient for carrying an apple. This platform decouples linear and steering motions to simplify path planning and the controller design for targeting. We anticipate our platform being relatively simple to maintain compared to rigid robotic arms. Herein we also describe and experimentally verify the platform's kinematic model, including the prediction of the relationship between the steering angle and the angular positions of the three steering motors. Information from the model enables the position controller to guide the end effector to the targeted positions faster and with higher stability than without this information. Overall, our research show promise for using soft-growing robotic platforms in orchard operations.
Federated Topic Model and Model Pruning Based on Variational Autoencoder
Authors: Chengjie Ma, Yawen Li, Meiyu Liang, Ang Li
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Abstract
Topic modeling has emerged as a valuable tool for discovering patterns and topics within large collections of documents. However, when cross-analysis involves multiple parties, data privacy becomes a critical concern. Federated topic modeling has been developed to address this issue, allowing multiple parties to jointly train models while protecting pri-vacy. However, there are communication and performance challenges in the federated sce-nario. In order to solve the above problems, this paper proposes a method to establish a federated topic model while ensuring the privacy of each node, and use neural network model pruning to accelerate the model, where the client periodically sends the model neu-ron cumulative gradients and model weights to the server, and the server prunes the model. To address different requirements, two different methods are proposed to determine the model pruning rate. The first method involves slow pruning throughout the entire model training process, which has limited acceleration effect on the model training process, but can ensure that the pruned model achieves higher accuracy. This can significantly reduce the model inference time during the inference process. The second strategy is to quickly reach the target pruning rate in the early stage of model training in order to accelerate the model training speed, and then continue to train the model with a smaller model size after reaching the target pruning rate. This approach may lose more useful information but can complete the model training faster. Experimental results show that the federated topic model pruning based on the variational autoencoder proposed in this paper can greatly accelerate the model training speed while ensuring the model's performance.
LLMRec: Large Language Models with Graph Augmentation for Recommendation
Abstract
The problem of data sparsity has long been a challenge in recommendation systems, and previous studies have attempted to address this issue by incorporating side information. However, this approach often introduces side effects such as noise, availability issues, and low data quality, which in turn hinder the accurate modeling of user preferences and adversely impact recommendation performance. In light of the recent advancements in large language models (LLMs), which possess extensive knowledge bases and strong reasoning capabilities, we propose a novel framework called LLMRec that enhances recommender systems by employing three simple yet effective LLM-based graph augmentation strategies. Our approach leverages the rich content available within online platforms (e.g., Netflix, MovieLens) to augment the interaction graph in three ways: (i) reinforcing user-item interaction egde, (ii) enhancing the understanding of item node attributes, and (iii) conducting user node profiling, intuitively from the natural language perspective. By employing these strategies, we address the challenges posed by sparse implicit feedback and low-quality side information in recommenders. Besides, to ensure the quality of the augmentation, we develop a denoised data robustification mechanism that includes techniques of noisy implicit feedback pruning and MAE-based feature enhancement that help refine the augmented data and improve its reliability. Furthermore, we provide theoretical analysis to support the effectiveness of LLMRec and clarify the benefits of our method in facilitating model optimization. Experimental results on benchmark datasets demonstrate the superiority of our LLM-based augmentation approach over state-of-the-art techniques. To ensure reproducibility, we have made our code and augmented data publicly available at: https://github.com/HKUDS/LLMRec.git
Keyword: diffusion
Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion
Abstract
Recent progress in text-to-image (TTI) systems, such as StableDiffusion, Imagen, and DALL-E 2, have made it possible to create realistic images with simple text prompts. It is tempting to use these systems to eliminate the manual task of obtaining natural images for training a new machine learning classifier. However, in all of the experiments performed to date, classifiers trained solely with synthetic images perform poorly at inference, despite the images used for training appearing realistic. Examining this apparent incongruity in detail gives insight into the limitations of the underlying image generation processes. Through the lens of diversity in image creation vs.accuracy of what is created, we dissect the differences in semantic mismatches in what is modeled in synthetic vs. natural images. This will elucidate the roles of the image-languag emodel, CLIP, and the image generation model, diffusion. We find four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept. We further present surprising insights into the geometry of CLIP embeddings.
Convolution Quadrature for the quasilinear subdiffusion equation
Authors: Maria López Fernández, Łukasz Płociniczak
Abstract
We construct a Convolution Quadrature (CQ) scheme for the quasilinear subdiffusion equation and supply it with the fast and oblivious implementation. In particular we find a condition for the CQ to be admissible and discretize the spatial part of the equation with the Finite Element Method. We prove the unconditional stability and convergence of the scheme and find a bound on the error. As a passing result, we also obtain a discrete Gronwall inequality for the CQ, which is a crucial ingredient of our convergence proof based on the energy method. The paper is concluded with numerical examples verifying convergence and computation time reduction when using fast and oblivious quadrature.
Score Normalization for a Faster Diffusion Exponential Integrator Sampler
Abstract
Recently, zhang et al have proposed the Diffusion Exponential Integrator Sampler (DEIS) for fast generation of samples from Diffusion Models. It leverages the semi-linear nature of the probability flow ordinary differential equation (ODE) in order to greatly reduce integration error and improve generation quality at low numbers of function evaluations (NFEs). Key to this approach is the score function reparameterisation, which reduces the integration error incurred from using a fixed score function estimate over each integration step. The original authors use the default parameterisation used by models trained for noise prediction -- multiply the score by the standard deviation of the conditional forward noising distribution. We find that although the mean absolute value of this score parameterisation is close to constant for a large portion of the reverse sampling process, it changes rapidly at the end of sampling. As a simple fix, we propose to instead reparameterise the score (at inference) by dividing it by the average absolute value of previous score estimates at that time step collected from offline high NFE generations. We find that our score normalisation (DEIS-SN) consistently improves FID compared to vanilla DEIS, showing an FID improvement from 6.44 to 5.57 at 10 NFEs for our CIFAR-10 experiments. Our code is available at https://github.com/mtkresearch/Diffusion-DEIS-SN.
Domain decomposition-based coupling of physics-informed neural networks via the Schwarz alternating method
Authors: Will Snyder, Irina Tezaur, Christopher Wentland
Abstract
Physics-informed neural networks (PINNs) are appealing data-driven tools for solving and inferring solutions to nonlinear partial differential equations (PDEs). Unlike traditional neural networks (NNs), which train only on solution data, a PINN incorporates a PDE's residual into its loss function and trains to minimize the said residual at a set of collocation points in the solution domain. This paper explores the use of the Schwarz alternating method as a means to couple PINNs with each other and with conventional numerical models (i.e., full order models, or FOMs, obtained via the finite element, finite difference or finite volume methods) following a decomposition of the physical domain. It is well-known that training a PINN can be difficult when the PDE solution has steep gradients. We investigate herein the use of domain decomposition and the Schwarz alternating method as a means to accelerate the PINN training phase. Within this context, we explore different approaches for imposing Dirichlet boundary conditions within each subdomain PINN: weakly through the loss and/or strongly through a solution transformation. As a numerical example, we consider the one-dimensional steady state advection-diffusion equation in the advection-dominated (high Peclet) regime. Our results suggest that the convergence of the Schwarz method is strongly linked to the choice of boundary condition implementation within the PINNs being coupled. Surprisingly, strong enforcement of the Schwarz boundary conditions does not always lead to a faster convergence of the method. While it is not clear from our preliminary study that the PINN-PINN coupling via the Schwarz alternating method accelerates PINN convergence in the advection-dominated regime, it reveals that PINN training can be improved substantially for Peclet numbers as high as 1e6 by performing a PINN-FOM coupling.
Space Narrative: Generating Images and 3D Scenes of Chinese Garden from Text using Deep Learning
Authors: Jiaxi Shi1, Hao Hua1
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract
The consistent mapping from poems to paintings is essential for the research and restoration of traditional Chinese gardens. But the lack of firsthand ma-terial is a great challenge to the reconstruction work. In this paper, we pro-pose a method to generate garden paintings based on text descriptions using deep learning method. Our image-text pair dataset consists of more than one thousand Ming Dynasty Garden paintings and their inscriptions and post-scripts. A latent text-to-image diffusion model learns the mapping from de-scriptive texts to garden paintings of the Ming Dynasty, and then the text description of Jichang Garden guides the model to generate new garden paintings. The cosine similarity between the guide text and the generated image is the evaluation criterion for the generated images. Our dataset is used to fine-tune the pre-trained diffusion model using Low-Rank Adapta-tion of Large Language Models (LoRA). We also transformed the generated images into a panorama and created a free-roam scene in Unity 3D. Our post-trained model is capable of generating garden images in the style of Ming Dynasty landscape paintings based on textual descriptions. The gener-ated images are compatible with three-dimensional presentation in Unity 3D.
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation
Authors: Yuxiang Bao, Di Qiu, Guoliang Kang, Baochang Zhang, Bo Jin, Kaiye Wang, Pengfei Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to encourage the temporal consistency. However, in those works, temporal inconsistency issue may not be thoroughly solved, rendering the fidelity of generated videos limited.%The current state of the art cross-frame attention method aims at maintaining fine-grained visual details across frames, but it is still challenged by the temporal coherence problem. In this paper, we find the bottleneck lies in the unconstrained query tokens and propose a new zero-shot video-to-video translation framework, named \textit{LatentWarp}. Our approach is simple: to constrain the query tokens to be temporally consistent, we further incorporate a warping operation in the latent space to constrain the query tokens. Specifically, based on the optical flow obtained from the original video, we warp the generated latent features of last frame to align with the current frame during the denoising process. As a result, the corresponding regions across the adjacent frames can share closely-related query tokens and attention outputs, which can further improve latent-level consistency to enhance visual temporal coherence of generated videos. Extensive experiment results demonstrate the superiority of \textit{LatentWarp} in achieving video-to-video translation with temporal coherence.
Structure-Preserving Time Discretization of Port-Hamiltonian Systems via Discrete Gradient Pairs
Abstract
We discuss structure-preserving time discretization for nonlinear port-Hamiltonian systems with state-dependent mass matrix. Such systems occur, for instance, in the context of structure-preserving nonlinear model order reduction for port-Hamiltonian systems and, in this context, structure-preserving time discretization is crucial for preserving some of the properties of the time-continuous reduced-order model. For this purpose, we introduce a new class of time discretization schemes which is based on so-called discrete gradient pairs and leads to an exact power balance on the time-discrete level. Moreover, for the special case of a pointwise symmetric and positive definite mass matrix, we present an explicit construction of a discrete gradient pair. Finally, we illustrate the theoretical findings by means of a numerical example, where the time-continuous system is a nonlinear reduced-order model for an advection-diffusion problem.
Dual Conditioned Diffusion Models for Out-Of-Distribution Detection: Application to Fetal Ultrasound Videos
Authors: Divyanshu Mishra, He Zhao, Pramit Saha, Aris T. Papageorghiou, J.Alison Noble
Abstract
Out-of-distribution (OOD) detection is essential to improve the reliability of machine learning models by detecting samples that do not belong to the training distribution. Detecting OOD samples effectively in certain tasks can pose a challenge because of the substantial heterogeneity within the in-distribution (ID), and the high structural similarity between ID and OOD classes. For instance, when detecting heart views in fetal ultrasound videos there is a high structural similarity between the heart and other anatomies such as the abdomen, and large in-distribution variance as a heart has 5 distinct views and structural variations within each view. To detect OOD samples in this context, the resulting model should generalise to the intra-anatomy variations while rejecting similar OOD samples. In this paper, we introduce dual-conditioned diffusion models (DCDM) where we condition the model on in-distribution class information and latent features of the input image for reconstruction-based OOD detection. This constrains the generative manifold of the model to generate images structurally and semantically similar to those within the in-distribution. The proposed model outperforms reference methods with a 12% improvement in accuracy, 22% higher precision, and an 8% better F1 score.
Abstract
We propose Diffusion Model Variational Inference (DMVI), a novel method for automated approximate inference in probabilistic programming languages (PPLs). DMVI utilizes diffusion models as variational approximations to the true posterior distribution by deriving a novel bound to the marginal likelihood objective used in Bayesian modelling. DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model. We evaluate DMVI on a set of common Bayesian models and show that its posterior inferences are in general more accurate than those of contemporary methods used in PPLs while having a similar computational cost and requiring less manual tuning.
Intriguing Properties of Data Attribution on Diffusion Models
Authors: Xiaosen Zheng, Tianyu Pang, Chao Du, Jing Jiang, Min Lin
Abstract
Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK.
Controllable Music Production with Diffusion Models and Guidance Gradients
Authors: Mark Levy, Bruno Di Giorgi, Floris Weers, Angelos Katharopoulos, Tom Nickson
Abstract
We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic characteristics to existing audio clips. We achieve this by applying guidance at sampling time in a simple framework that supports both reconstruction and classification losses, or any combination of the two. This approach ensures that generated audio can match its surrounding context, or conform to a class distribution or latent representation specified relative to any suitable pre-trained classifier or embedding model.
De-Diffusion Makes Text a Strong Cross-Modal Interface
Abstract
We demonstrate text as a strong cross-modal interface. Rather than relying on deep embeddings to connect image and language as the interface representation, our approach represents an image as text, from which we enjoy the interpretability and flexibility inherent to natural language. We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding. The encoder is trained to transform an input image into text, which is then fed into the fixed text-to-image diffusion decoder to reconstruct the original input -- a process we term De-Diffusion. Experiments validate both the precision and comprehensiveness of De-Diffusion text representing images, such that it can be readily ingested by off-the-shelf text-to-image tools and LLMs for diverse multi-modal tasks. For example, a single De-Diffusion model can generalize to provide transferable prompts for different text-to-image tools, and also achieves a new state of the art on open-ended vision-language tasks by simply prompting large language models with few-shot examples.
Keyword: adaptive
Adaptive Control of Euler-Lagrange Systems under Time-varying State Constraints without a Priori Bounded Uncertainty
Abstract
In this article, a novel adaptive controller is designed for Euler-Lagrangian systems under predefined time-varying state constraints. The proposed controller could achieve this objective without a priori knowledge of system parameters and, crucially, of state-dependent uncertainties. The closed-loop stability is verified using the Lyapunov method, while the overall efficacy of the proposed scheme is verified using a simulated robotic arm compared to the state of the art.
ChipNeMo: Domain-Adapted LLMs for Chip Design
Authors: Mingjie Liu, Teo Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Brucek Khailany, Kishor Kunal, Xiaowei Li, Hao Liu, Stuart Oberman, Sujeet Omar, Sreedhar Pratty, Ambar Sarkar, Zhengjiang Shao, Hanfei Sun, Pratik P Suthar, Varun Tej, Kaizhe Xu, Haoxing Ren
Abstract
ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: custom tokenizers, domain-adaptive continued pretraining, supervised fine-tuning (SFT) with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our results show that these domain adaptation techniques enable significant LLM performance improvements over general-purpose base models across the three evaluated applications, enabling up to 5x model size reduction with similar or better performance on a range of design tasks. Our findings also indicate that there's still room for improvement between our current results and ideal outcomes. We believe that further investigation of domain-adapted LLM approaches will help close this gap in the future.
1DFormer: Learning 1D Landmark Representations via Transformer for Facial Landmark Tracking
Authors: Shi Yin, Shijie Huan, Defu Lian, Shangfei Wang, Jinshui Hu, Tao Guo, Bing Yin, Baocai Yin, Cong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance on locating facial landmarks. However, previous methods ignored to make deep explorations on the good potentials of 1D landmark representations for sequential and structural modeling of multiple landmarks to track facial landmarks. To address this limitation, we propose a Transformer architecture, namely 1DFormer, which learns informative 1D landmark representations by capturing the dynamic and the geometric patterns of landmarks via token communications in both temporal and spatial dimensions for facial landmark tracking. For temporal modeling, we propose a recurrent token mixing mechanism, an axis-landmark-positional embedding mechanism, as well as a confidence-enhanced multi-head attention mechanism to adaptively and robustly embed long-term landmark dynamics into their 1D representations; for structure modeling, we design intra-group and inter-group structure modeling mechanisms to encode the component-level as well as global-level facial structure patterns as a refinement for the 1D representations of landmarks through token communications in the spatial dimension via 1D convolutional layers. Experimental results on the 300VW and the TF databases show that 1DFormer successfully models the long-range sequential patterns as well as the inherent facial structures to learn informative 1D representations of landmark sequences, and achieves state-of-the-art performance on facial landmark tracking.
Semantic Representation Learning of Scientific Literature based on Adaptive Feature and Graph Neural Network
Abstract
Because most of the scientific literature data is unmarked, it makes semantic representation learning based on unsupervised graph become crucial. At the same time, in order to enrich the features of scientific literature, a learning method of semantic representation of scientific literature based on adaptive features and graph neural network is proposed. By introducing the adaptive feature method, the features of scientific literature are considered globally and locally. The graph attention mechanism is used to sum the features of scientific literature with citation relationship, and give each scientific literature different feature weights, so as to better express the correlation between the features of different scientific literature. In addition, an unsupervised graph neural network semantic representation learning method is proposed. By comparing the mutual information between the positive and negative local semantic representation of scientific literature and the global graph semantic representation in the potential space, the graph neural network can capture the local and global information, thus improving the learning ability of the semantic representation of scientific literature. The experimental results show that the proposed learning method of semantic representation of scientific literature based on adaptive feature and graph neural network is competitive on the basis of scientific literature classification, and has achieved good results.
Robust Graph Clustering via Meta Weighting for Noisy Graphs
Abstract
How can we find meaningful clusters in a graph robustly against noise edges? Graph clustering (i.e., dividing nodes into groups of similar ones) is a fundamental problem in graph analysis with applications in various fields. Recent studies have demonstrated that graph neural network (GNN) based approaches yield promising results for graph clustering. However, we observe that their performance degenerates significantly on graphs with noise edges, which are prevalent in practice. In this work, we propose MetaGC for robust GNN-based graph clustering. MetaGC employs a decomposable clustering loss function, which can be rephrased as a sum of losses over node pairs. We add a learnable weight to each node pair, and MetaGC adaptively adjusts the weights of node pairs using meta-weighting so that the weights of meaningful node pairs increase and the weights of less-meaningful ones (e.g., noise edges) decrease. We show empirically that MetaGC learns weights as intended and consequently outperforms the state-of-the-art GNN-based competitors, even when they are equipped with separate denoising schemes, on five real-world graphs under varying levels of noise. Our code and datasets are available at https://github.com/HyeonsooJo/MetaGC.
Adversarially Robust Distributed Count Tracking via Partial Differential Privacy
Abstract
We study the distributed tracking model, also known as distributed functional monitoring. This model involves $k$ sites each receiving a stream of items and communicating with the central server. The server's task is to track a function of all items received thus far continuously, with minimum communication cost. For count tracking, it is known that there is a $\sqrt{k}$ gap in communication between deterministic and randomized algorithms. However, existing randomized algorithms assume an "oblivious adversary" who constructs the entire input streams before the algorithm starts. Here we consider adaptive adversaries who can choose new items based on previous answers from the algorithm. Deterministic algorithms are trivially robust to adaptive adversaries, while randomized ones may not. Therefore, we investigate whether the $\sqrt{k}$ advantage of randomized algorithms is from randomness itself or the oblivious adversary assumption. We provide an affirmative answer to this question by giving a robust algorithm with optimal communication. Existing robustification techniques do not yield optimal bounds due to the inherent challenges of the distributed nature of the problem. To address this, we extend the differential privacy framework by introducing "partial differential privacy" and proving a new generalization theorem. This theorem may have broader applications beyond robust count tracking, making it of independent interest.
Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems
Abstract
Sequential recommender systems (SRS) have gained widespread popularity in recommendation due to their ability to effectively capture dynamic user preferences. One default setting in the current SRS is to uniformly consider each historical behavior as a positive interaction. Actually, this setting has the potential to yield sub-optimal performance, as each item makes a distinct contribution to the user's interest. For example, purchased items should be given more importance than clicked ones. Hence, we propose a general automatic sampling framework, named AutoSAM, to non-uniformly treat historical behaviors. Specifically, AutoSAM augments the standard sequential recommendation architecture with an additional sampler layer to adaptively learn the skew distribution of the raw input, and then sample informative sub-sets to build more generalizable SRS. To overcome the challenges of non-differentiable sampling actions and also introduce multiple decision factors for sampling, we further introduce a novel reinforcement learning based method to guide the training of the sampler. We theoretically design multi-objective sampling rewards including Future Prediction and Sequence Perplexity, and then optimize the whole framework in an end-to-end manner by combining the policy gradient. We conduct extensive experiments on benchmark recommender models and four real-world datasets. The experimental results demonstrate the effectiveness of the proposed approach. We will make our code publicly available after the acceptance.
Augmenting deep neural networks with symbolic knowledge: Towards trustworthy and interpretable AI for education
Authors: Danial Hooshyar, Roger Azevedo, Yeongwook Yang
Abstract
Artificial neural networks (ANNs) have shown to be amongst the most important artificial intelligence (AI) techniques in educational applications, providing adaptive educational services. However, their educational potential is limited in practice due to three major challenges: i) difficulty in incorporating symbolic educational knowledge (e.g., causal relationships, and practitioners' knowledge) in their development, ii) learning and reflecting biases, and iii) lack of interpretability. Given the high-risk nature of education, the integration of educational knowledge into ANNs becomes crucial for developing AI applications that adhere to essential educational restrictions, and provide interpretability over the predictions. This research argues that the neural-symbolic family of AI has the potential to address the named challenges. To this end, it adapts a neural-symbolic AI framework and accordingly develops an approach called NSAI, that injects and extracts educational knowledge into and from deep neural networks, for modelling learners computational thinking. Our findings reveal that the NSAI approach has better generalizability compared to deep neural networks trained merely on training data, as well as training data augmented by SMOTE and autoencoder methods. More importantly, unlike the other models, the NSAI approach prioritises robust representations that capture causal relationships between input features and output labels, ensuring safety in learning to avoid spurious correlations and control biases in training data. Furthermore, the NSAI approach enables the extraction of rules from the learned network, facilitating interpretation and reasoning about the path to predictions, as well as refining the initial educational knowledge. These findings imply that neural-symbolic AI can overcome the limitations of ANNs in education, enabling trustworthy and interpretable applications.
AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification
Abstract
Recent work has found that few-shot sentence classification based on pre-trained Sentence Encoders (SEs) is efficient, robust, and effective. In this work, we investigate strategies for domain-specialization in the context of few-shot sentence classification with SEs. We first establish that unsupervised Domain-Adaptive Pre-Training (DAPT) of a base Pre-trained Language Model (PLM) (i.e., not an SE) substantially improves the accuracy of few-shot sentence classification by up to 8.4 points. However, applying DAPT on SEs, on the one hand, disrupts the effects of their (general-domain) Sentence Embedding Pre-Training (SEPT). On the other hand, applying general-domain SEPT on top of a domain-adapted base PLM (i.e., after DAPT) is effective but inefficient, since the computationally expensive SEPT needs to be executed on top of a DAPT-ed PLM of each domain. As a solution, we propose AdaSent, which decouples SEPT from DAPT by training a SEPT adapter on the base PLM. The adapter can be inserted into DAPT-ed PLMs from any domain. We demonstrate AdaSent's effectiveness in extensive experiments on 17 different few-shot sentence classification datasets. AdaSent matches or surpasses the performance of full SEPT on DAPT-ed PLM, while substantially reducing the training costs. The code for AdaSent is available.
Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion
Authors: Zhanwen Liu, Nan Yang, Yang Wang, Yuke Li, Xiangmo Zhao, Fei-Yue Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Traffic object detection under variable illumination is challenging due to the information loss caused by the limited dynamic range of conventional frame-based cameras. To address this issue, we introduce bio-inspired event cameras and propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream to compensate for the lost information in images through cross-modality fusion, enabling the network to obtain illumination-robust representations for traffic object detection. Specifically, to mitigate the sparsity or blurriness issues arising from diverse motion states of traffic objects in fixed-interval event sampling methods, we propose the Reliable Structure Generation Network (RSGNet) to generate Speed Invariant Frames (SIF), ensuring the integrity and sharpness of object structures. Next, we design a novel Adaptive Feature Complement Module (AFCM) which guides the adaptive fusion of two modality features to compensate for the information loss in the images by perceiving the global lightness distribution of the images, thereby generating illumination-robust representations. Finally, considering the lack of large-scale and high-quality annotations in the existing event-based object detection datasets, we build a DSEC-Det dataset, which consists of 53 sequences with 63,931 images and more than 208,000 labels for 8 classes. Extensive experimental results demonstrate that our proposed SFNet can overcome the perceptual boundaries of conventional cameras and outperform the frame-based method by 8.0% in mAP50 and 5.9% in mAP50:95. Our code and dataset will be available at https://github.com/YN-Yang/SFNet.
Improving Robustness for Vision Transformer with a Simple Dynamic Scanning Augmentation
Abstract
Vision Transformer (ViT) has demonstrated promising performance in computer vision tasks, comparable to state-of-the-art neural networks. Yet, this new type of deep neural network architecture is vulnerable to adversarial attacks limiting its capabilities in terms of robustness. This article presents a novel contribution aimed at further improving the accuracy and robustness of ViT, particularly in the face of adversarial attacks. We propose an augmentation technique called `Dynamic Scanning Augmentation' that leverages dynamic input sequences to adaptively focus on different patches, thereby maintaining performance and robustness. Our detailed investigations reveal that this adaptability to the input sequence induces significant changes in the attention mechanism of ViT, even for the same image. We introduce four variations of Dynamic Scanning Augmentation, outperforming ViT in terms of both robustness to adversarial attacks and accuracy against natural images, with one variant showing comparable results. By integrating our augmentation technique, we observe a substantial increase in ViT's robustness, improving it from $17\%$ to $92\%$ measured across different types of adversarial attacks. These findings, together with other comprehensive tests, indicate that Dynamic Scanning Augmentation enhances accuracy and robustness by promoting a more adaptive type of attention. In conclusion, this work contributes to the ongoing research on Vision Transformers by introducing Dynamic Scanning Augmentation as a technique for improving the accuracy and robustness of ViT. The observed results highlight the potential of this approach in advancing computer vision tasks and merit further exploration in future studies.
Structure Learning with Adaptive Random Neighborhood Informed MCMC
Authors: Alberto Caron, Xitong Liang, Samuel Livingstone, Jim Griffin
Abstract
In this paper, we introduce a novel MCMC sampler, PARNI-DAG, for a fully-Bayesian approach to the problem of structure learning under observational data. Under the assumption of causal sufficiency, the algorithm allows for approximate sampling directly from the posterior distribution on Directed Acyclic Graphs (DAGs). PARNI-DAG performs efficient sampling of DAGs via locally informed, adaptive random neighborhood proposal that results in better mixing properties. In addition, to ensure better scalability with the number of nodes, we couple PARNI-DAG with a pre-tuning procedure of the sampler's parameters that exploits a skeleton graph derived through some constraint-based or scoring-based algorithms. Thanks to these novel features, PARNI-DAG quickly converges to high-probability regions and is less likely to get stuck in local modes in the presence of high correlation between nodes in high-dimensional settings. After introducing the technical novelties in PARNI-DAG, we empirically demonstrate its mixing efficiency and accuracy in learning DAG structures on a variety of experiments.
Adaptive Threshold Selection for Set Membership State Estimation with Quantized Measurements
Authors: Marco Casini, Andrea Garulli, Antonio Vicino
Abstract
State estimation for discrete-time linear systems with quantized measurements is addressed. By exploiting the set-theoretic nature of the information provided by the quantizer, the problem is cast in the set membership estimation setting. Motivated by the possibility of suitably tuning the quantizer thresholds in sensor networks, the optimal design of adaptive quantizers is formulated in terms of the minimization of the radius of information associated to the state estimation problem. The optimal solution is derived for first-order systems and the result is exploited to design adaptive quantizers for generic systems, minimizing the size of the feasible output signal set. Then, the minimum number of sensor thresholds for which the adaptive quantizers guarantee asymptotic boundedness of the state estimation uncertainty is established. Threshold adaptation mechanisms based on several types of outer approximations of the feasible state set are also proposed. The effectiveness of the designed adaptive quantizers is demonstrated on numerical tests involving a specific case study and randomly generated systems, highlighting the trade off between the resulting estimation uncertainty and the computational burden required by recursive set approximations.
Keyword: quantization
The bottleneck and ceiling effects in quantized tracking control of heterogeneous multi-agent systems under DoS attacks
Abstract
In this paper, we investigate tracking control of heterogeneous multi-agent systems under Denial-of-Service (DoS) attacks and state quantization. Dynamic quantized mechanisms are designed for inter-follower communication and leader-follower communication. Zooming-in and out factors, and data rates of both mechanisms for preventing quantizer saturation are provided. Our results show that by tuning the inter-follower quantized controller, one cannot improve the resilience beyond a level determined by the data rate of leader-follower quantized communication, i.e., the ceiling effect. Otherwise, overflow of followers' state quantizer can occur. On the other hand, if one selects a "large" data rate for leader-follower quantized communication, then the inter-follower quantized communication determines the resilience, and further increasing the data rate for leader-follower quantized communication cannot improve the resilience, i.e., the bottleneck effect. Simulation examples are provided to justify the results of our paper.
Abstract
Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the astronomical amount of model parameters, which requires a demand for large memory capacity and high memory bandwidth. In this paper, we propose an effective approach that can make the deployment of LLMs more efficiently. We support an automatic INT4 weight-only quantization flow and design a special LLM runtime with highly-optimized kernels to accelerate the LLM inference on CPUs. We demonstrate the general applicability of our approach on popular LLMs including Llama2, Llama, GPT-NeoX, and showcase the extreme inference efficiency on CPUs. The code is publicly available at: https://github.com/intel/intel-extension-for-transformers.
Keyword: efficient
Expressive Modeling Is Insufficient for Offline RL: A Tractable Inference Perspective
FairWASP: Fast and Optimal Fair Wasserstein Pre-processing
Stochastic Time-Optimal Trajectory Planning for Connected and Automated Vehicles in Mixed-Traffic Merging Scenarios
Rethinking the Cloudonomics of Efficient I/O for Data-Intensive Analytics Applications
Consistent Video-to-Video Transfer Using Synthetic Dataset
DistDNAS: Search Efficient Feature Interactions within 2 Hours
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding
Towards Omni-supervised Referring Expression Segmentation
AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification
Efficient Human-AI Coordination via Preparatory Language-based Convention
A cost-benefit source-receptor framework for implementation of Blue-Green flood risk management
NEO-KD: Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks
Untangling Graphs on Surfaces
On the Opportunities of Green Computing: A Survey
Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design
Intriguing Properties of Data Attribution on Diffusion Models
Efficient LLM Inference on CPUs
Experimental Validation of a Grid-Aware Optimal Control of Hybrid AC/DC Microgrids
Tackling the Abstraction and Reasoning Corpus (ARC) with Object-centric Models and the MDL Principle
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Revealing CNN Architectures via Side-Channel Analysis in Dataflow-based Inference Accelerators
Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value
Structure Learning with Adaptive Random Neighborhood Informed MCMC
Understanding the Issues and Causes in WebAssembly Application Development: A Mining-based Study
Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving
Decision Support Framework for Home Health Caregiver Allocation: A Case Study of HHC Agency in Tennessee, USA
Keyword: faster
Local Max-Cut on Sparse Graphs
Design, Modeling, and Control of a Low-Cost and Rapid Response Soft-Growing Manipulator for Orchard Operations
Domain decomposition-based coupling of physics-informed neural networks via the Schwarz alternating method
EdgeDis: Enabling Fast, Economical, and Reliable Data Dissemination for Mobile Edge Computing
Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
Gaze-based Learning from Demonstration In Surgical Robotics
Federated Topic Model and Model Pruning Based on Variational Autoencoder
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
Keyword: mobile
Assessing Mobile Application Privacy: A Quantitative Framework for Privacy Measurement
Large-Scale Multi-Robot Assembly Planning for Autonomous Manufacturing
EdgeDis: Enabling Fast, Economical, and Reliable Data Dissemination for Mobile Edge Computing
Keyword: pruning
Design, Modeling, and Control of a Low-Cost and Rapid Response Soft-Growing Manipulator for Orchard Operations
Federated Topic Model and Model Pruning Based on Variational Autoencoder
LLMRec: Large Language Models with Graph Augmentation for Recommendation
Keyword: diffusion
Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion
Convolution Quadrature for the quasilinear subdiffusion equation
Score Normalization for a Faster Diffusion Exponential Integrator Sampler
Domain decomposition-based coupling of physics-informed neural networks via the Schwarz alternating method
Space Narrative: Generating Images and 3D Scenes of Chinese Garden from Text using Deep Learning
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation
Structure-Preserving Time Discretization of Port-Hamiltonian Systems via Discrete Gradient Pairs
Dual Conditioned Diffusion Models for Out-Of-Distribution Detection: Application to Fetal Ultrasound Videos
Diffusion models for probabilistic programming
Intriguing Properties of Data Attribution on Diffusion Models
Controllable Music Production with Diffusion Models and Guidance Gradients
De-Diffusion Makes Text a Strong Cross-Modal Interface
Keyword: adaptive
Adaptive Control of Euler-Lagrange Systems under Time-varying State Constraints without a Priori Bounded Uncertainty
ChipNeMo: Domain-Adapted LLMs for Chip Design
1DFormer: Learning 1D Landmark Representations via Transformer for Facial Landmark Tracking
Semantic Representation Learning of Scientific Literature based on Adaptive Feature and Graph Neural Network
Robust Graph Clustering via Meta Weighting for Noisy Graphs
Adversarially Robust Distributed Count Tracking via Partial Differential Privacy
Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems
Augmenting deep neural networks with symbolic knowledge: Towards trustworthy and interpretable AI for education
AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification
Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion
Improving Robustness for Vision Transformer with a Simple Dynamic Scanning Augmentation
Structure Learning with Adaptive Random Neighborhood Informed MCMC
Adaptive Threshold Selection for Set Membership State Estimation with Quantized Measurements
Keyword: quantization
The bottleneck and ceiling effects in quantized tracking control of heterogeneous multi-agent systems under DoS attacks
Efficient LLM Inference on CPUs