New submissions for Fri, 7 Jul 23

Keyword: efficient

STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting

Authors: Lincan Li, Kaixiang Yang, Fengji Luo, Jichao Bi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.02507
Pdf link: https://arxiv.org/pdf/2307.02507
Abstract Efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains to be a challenging task. In considering of the dilemma, this work employs the advanced contrastive learning and proposes a novel Spatial-Temporal Synchronous Contextual Contrastive Learning (STS-CCL) model. First, we elaborate the basic and strong augmentation methods for spatiotemporal graph data, which not only perturb the data in terms of graph structure and temporal characteristics, but also employ a learning-based dynamic graph view generator for adaptive augmentation. Second, we introduce a Spatial-Temporal Synchronous Contrastive Module (STS-CM) to simultaneously capture the decent spatial-temporal dependencies and realize graph-level contrasting. To further discriminate node individuals in negative filtering, a Semantic Contextual Contrastive method is designed based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering. Finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. Extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. The proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issue.
ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking
Authors: Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02508
Pdf link: https://arxiv.org/pdf/2307.02508
Abstract The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object tracking and segmentation. In this study, we convert the bounding boxes to masks in reference frames with the help of the Segment Anything Model (SAM) and Alpha-Refine, and then propagate the masks to the current frame, transforming the task from Video Object Tracking (VOT) to video object segmentation (VOS). Furthermore, we introduce MSDeAOT, a variant of the AOT series that incorporates transformers at multiple feature scales. MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8. As a testament to the effectiveness of our design, we achieved the 1st place in the EPIC-KITCHENS TREK-150 Object Tracking Challenge.
Secure-by-Construction Synthesis for Control Systems
Authors: Bingzhuo Zhong, Siyuan Liu, Marco Caccamo, Majid Zamani
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.02564
Pdf link: https://arxiv.org/pdf/2307.02564
Abstract In this paper, we present the synthesis of secure-by-construction controllers that address safety and security properties simultaneously in cyber-physical systems. Our focus is on studying a specific security property called opacity, which characterizes the system's ability to maintain plausible deniability of its secret behavior in the presence of an intruder. These controllers are synthesized based on a concept of so-called (augmented) control barrier functions, which we introduce and discuss in detail. We propose conditions that facilitate the construction of the desired (augmented) control barrier functions and their corresponding secure-by-construction controllers. To compute these functions, we propose an iterative scheme that leverages iterative sum-of-square programming techniques. This approach enables efficient computation of these functions, particularly for polynomial systems. Moreover, we demonstrate the flexibility of our approach by incorporating user-defined cost functions into the construction of secure-by-construction controllers. Finally, we validate the effectiveness of our results through two case studies, illustrating the practical applicability and benefits of our proposed approach.
Successful Combination of Database Search and Snowballing for Identification of Primary Studies in Systematic Literature Studies
Authors: Claes Wohlin, Marcos Kalinowski, Katia Romero Felizardo, Emilia Mendes
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2307.02612
Pdf link: https://arxiv.org/pdf/2307.02612
Abstract Background: A good search strategy is essential for a successful systematic literature study. Historically, database searches have been the norm, which has later been complemented with snowball searches. Our conjecture is that we can perform even better searches if combining the two search approaches, referred to as a hybrid search strategy. Objective: Our main objective was to compare and evaluate a hybrid search strategy. Furthermore, we compared some alternative hybrid search strategies to assess whether it was possible to identify more cost-efficient ways of searching for relevant primary studies. Method: To compare and evaluate the hybrid search strategy, we replicated an SLR on industry-academia collaboration in software engineering. The SLR used a more traditional approach to searching for relevant articles for an SLR, while the replication was conducted using a hybrid search strategy. Results: In our evaluation, the hybrid search strategy was superior in identifying relevant primary studies. It identified 30 percent more primary studies and even more when focusing only on peer-reviewed articles. To embrace individual viewpoints when assessing research articles and minimise the risk of missing primary studies, we introduced two new concepts, wild cards and borderline articles, when conducting systematic literature studies. Conclusions: The hybrid search strategy is a strong contender for being used when conducting systematic literature studies. Furthermore, alternative hybrid search strategies may be viable if selected wisely in relation to the start set for snowballing. Finally, the two new concepts were judged as essential to cater for different individual judgements and to minimise the risk of excluding primary studies that ought to be included.
Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition
Authors: Yuwei Bao, Barrett Martin Lattimer, Joyce Chai
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.02615
Pdf link: https://arxiv.org/pdf/2307.02615
Abstract Human language acquisition is an efficient, supervised, and continual process. In this work, we took inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning. Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes, learn to filter out and extract the common information for each shared linguistic label. We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping. This procedure does not involve a fixed vocabulary size, nor a discriminative objective, and allows the models to continually learn more concepts efficiently. Our results in controlled experiments have shown the potential of this approach for efficient continual learning of grounded words.
Only Pick Once -- Multi-Object Picking Algorithms for Picking Exact Number of Objects Efficiently
Authors: Zihe Ye, Yu Sun
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.02662
Pdf link: https://arxiv.org/pdf/2307.02662
Abstract Picking up multiple objects at once is a grasping skill that makes a human worker efficient in many domains. This paper presents a system to pick a requested number of objects by only picking once (OPO). The proposed Only-Pick-Once System (OPOS) contains several graph-based algorithms that convert the layout of objects into a graph, cluster nodes in the graph, rank and select candidate clusters based on their topology. OPOS also has a multi-object picking predictor based on a convolutional neural network for estimating how many objects would be picked up with a given gripper location and orientation. This paper presents four evaluation metrics and three protocols to evaluate the proposed OPOS. The results show OPOS has very high success rates for two and three objects when only picking once. Using OPOS can significantly outperform two to three times single object picking in terms of efficiency. The results also show OPOS can generalize to unseen size and shape objects.
Scaling In-Context Demonstrations with Structured Attention
Authors: Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.02690
Pdf link: https://arxiv.org/pdf/2307.02690
Abstract The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between individual demonstrations, while making the model invariant to the permutation of demonstrations. We evaluate SAICL in a meta-training framework and show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up. SAICL also consistently outperforms a strong Fusion-in-Decoder (FiD) baseline which processes each demonstration independently. Finally, thanks to its linear nature, we demonstrate that SAICL can easily scale to hundreds of demonstrations with continuous performance gains with scaling.
Incremental Nonlinear Dynamic Inversion based Optical Flow Control for Flying Robots: An Efficient Data-driven Approach
Authors: Hann Woei Ho, Ye Zhou
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.02702
Pdf link: https://arxiv.org/pdf/2307.02702
Abstract This paper presents a novel approach for optical flow control of Micro Air Vehicles (MAVs). The task is challenging due to the nonlinearity of optical flow observables. Our proposed Incremental Nonlinear Dynamic Inversion (INDI) control scheme incorporates an efficient data-driven method to address the nonlinearity. It directly estimates the inverse of the time-varying control effectiveness in real-time, eliminating the need for the constant assumption and avoiding high computation in traditional INDI. This approach effectively handles fast-changing system dynamics commonly encountered in optical flow control, particularly height-dependent changes. We demonstrate the robustness and efficiency of the proposed control scheme in numerical simulations and also real-world flight tests: multiple landings of an MAV on a static and flat surface with various tracking setpoints, hovering and landings on moving and undulating surfaces. Despite being challenged with the presence of noisy optical flow estimates and the lateral and vertical movement of the landing surfaces, the MAV is able to successfully track or land on the surface with an exponential decay of both height and vertical velocity at almost the same time, as desired.
TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations
Authors: Dengfeng Wang, Liukai Xu, Songyuan Liu, zhi Li, Yiming Chen, Weifeng He, Xueqing Li, Yanan Su
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.02717
Pdf link: https://arxiv.org/pdf/2307.02717
Abstract Accommodating all the weights on-chip for large-scale NNs remains a great challenge for SRAM based computing-in-memory (SRAM-CIM) with limited on-chip capacity. Previous non-volatile SRAM-CIM (nvSRAM-CIM) addresses this issue by integrating high-density single-level ReRAMs on the top of high-efficiency SRAM-CIM for weight storage to eliminate the off-chip memory access. However, previous SL-nvSRAM-CIM suffers from poor scalability for an increased number of SL-ReRAMs and limited computing efficiency. To overcome these challenges, this work proposes an ultra-high-density three-level ReRAMs-assisted computing-in-nonvolatile-SRAM (TL-nvSRAM-CIM) scheme for large NN models. The clustered n-selector-n-ReRAM (cluster-nSnRs) is employed for reliable weight-restore with eliminated DC power. Furthermore, a ternary SRAM-CIM mechanism with differential computing scheme is proposed for energy-efficient ternary MAC operations while preserving high NN accuracy. The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-of-art works. Moreover, TL-nvSRAM-CIM shows up to 2.9x and 1.9x enhanced energy-efficiency, respectively, compared to the baseline designs of SRAM-CIM and ReRAM-CIM, respectively.
On efficient linear and fully decoupled finite difference method for wormhole propagation with heat transmission process on staggered grids
Authors: Xiaoli Li, Ziyan Li, Hongxing Rui
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.02727
Pdf link: https://arxiv.org/pdf/2307.02727
Abstract In this paper, we construct an efficient linear and fully decoupled finite difference scheme for wormhole propagation with heat transmission process on staggered grids, which only requires solving a sequence of linear elliptic equations at each time step. We first derive the positivity preserving properties for the discrete porosity and its difference quotient in time, and then obtain optimal error estimates for the velocity, pressure, concentration, porosity and temperature in different norms rigorously and carefully by establishing several auxiliary lemmas for the highly coupled nonlinear system. Numerical experiments in two- and three-dimensional cases are provided to verify our theoretical results and illustrate the capabilities of the constructed method.
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Authors: Yuheng Zha, Yichi Yang, Ruichen Li, Zhiting Hu
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.02729
Pdf link: https://arxiv.org/pdf/2307.02729
Abstract Large language models (LLMs), typically designed as a function of next-word prediction, have excelled across extensive NLP tasks. Despite the generality, next-word prediction is often not an efficient formulation for many of the tasks, demanding an extreme scale of model parameters (10s or 100s of billions) and sometimes yielding suboptimal performance. In practice, it is often desirable to build more efficient models -- despite being less versatile, they still apply to a substantial subset of problems, delivering on par or even superior performance with much smaller model sizes. In this paper, we propose text alignment as an efficient unified model for a wide range of crucial tasks involving text entailment, similarity, question answering (and answerability), factual consistency, and so forth. Given a pair of texts, the model measures the degree of alignment between their information. We instantiate an alignment model (Align) through lightweight finetuning of RoBERTa (355M parameters) using 5.9M examples from 28 datasets. Despite its compact size, extensive experiments show the model's efficiency and strong performance: (1) On over 20 datasets of aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models that have around 2x or 10x more parameters; the single unified model also outperforms task-specific models finetuned on individual datasets; (2) When applied to evaluate factual consistency of language generation on 23 datasets, our model improves over various baselines, including the much larger GPT-3.5 (ChatGPT) and sometimes even GPT-4; (3) The lightweight model can also serve as an add-on component for LLMs such as GPT-3.5 in question answering tasks, improving the average exact match (EM) score by 17.94 and F1 score by 15.05 through identifying unanswerable questions.
Dynamic Multi-time Scale User Admission and Resource Allocation for Semantic Extraction in MEC Systems
Authors: Yuanpeng Zheng, Tiankui Zhang, Jonathan Loo
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.02748
Pdf link: https://arxiv.org/pdf/2307.02748
Abstract This paper investigates the semantic extraction task-oriented dynamic multi-time scale user admission and resourceallocation in mobile edge computing (MEC) systems. Amid prevalence artifi cial intelligence applications in various industries,the offloading of semantic extraction tasks which are mainlycomposed of convolutional neural networks of computer vision isa great challenge for communication bandwidth and computing capacity allocation in MEC systems. Considering the stochasticnature of the semantic extraction tasks, we formulate a stochastic optimization problem by modeling it as the dynamic arrival of tasks in the temporal domain. We jointly optimize the system revenue and cost which are represented as user admission in the long term and resource allocation in the short term respectively. To handle the proposed stochastic optimization problem, we decompose it into short-time-scale subproblems and a long-time-scale subproblem by using the Lyapunov optimization technique. After that, the short-time-scale optimization variables of resource allocation, including user association, bandwidth allocation, and computing capacity allocation are obtained in closed form. The user admission optimization on long-time scales is solved by a heuristic iteration method. Then, the multi-time scale user admission and resource allocation algorithm is proposed for dynamic semantic extraction task computing in MEC systems. Simulation results demonstrate that, compared with the benchmarks, the proposed algorithm improves the performance of user admission and resource allocation efficiently and achieves a flexible trade-off between system revenue and cost at multi-time scales and considering semantic extraction tasks.
Large Language Models Empowered Autonomous Edge AI for Connected Intelligence
Authors: Yifei Shen, Jiawei Shao, Xinjie Zhang, Zehong Lin, Hao Pan, Dongsheng Li, Jun Zhang, Khaled B. Letaief
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.02779
Pdf link: https://arxiv.org/pdf/2307.02779
Abstract The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge AI emerges as a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network edge. In this article, we introduce an autonomous edge AI system that automatically organizes, adapts, and optimizes itself to meet users' diverse requirements. The system employs a cloud-edge-client hierarchical architecture, where the large language model, i.e., Generative Pretrained Transformer (GPT), resides in the cloud, and other AI models are co-deployed on devices and edge servers. By leveraging the powerful abilities of GPT in language understanding, planning, and code generation, we present a versatile framework that efficiently coordinates edge AI models to cater to users' personal demands while automatically generating code to train new models via edge federated learning. Experimental results demonstrate the system's remarkable ability to accurately comprehend user demands, efficiently execute AI models with minimal cost, and effectively create high-performance AI models through federated learning.
Shortest Beer Path Queries based on Graph Decomposition
Authors: Tesshu Hanaka, Hirotaka Ono, Kunihiko Sadakane, Kosuke Sadakane
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2307.02787
Pdf link: https://arxiv.org/pdf/2307.02787
Abstract Given a directed edge-weighted graph $G=(V, E)$ with beer vertices $B\subseteq V$, a beer path between two vertices $u$ and $v$ is a path between $u$ and $v$ that visits at least one beer vertex in $B$, and the beer distance between two vertices is the shortest length of beer paths. We consider \emph{indexing problems} on beer paths, that is, a graph is given a priori, and we construct some data structures (called indexes) for the graph. Then later, we are given two vertices, and we find the beer distance or beer path between them using the data structure. For such a scheme, efficient algorithms using indexes for the beer distance and beer path queries have been proposed for outerplanar graphs and interval graphs. For example, Bacic et al. (2021) present indexes with size $O(n)$ for outerplanar graphs and an algorithm using them that answers the beer distance between given two vertices in $O(\alpha(n))$ time, where $\alpha(\cdot)$ is the inverse Ackermann function; the performance is shown to be optimal. This paper proposes indexing data structures and algorithms for beer path queries on general graphs based on two types of graph decomposition: the tree decomposition and the triconnected component decomposition. We propose indexes with size $O(m+nr^2)$ based on the triconnected component decomposition, where $r$ is the size of the largest triconnected component. For a given query $u,v\in V$, our algorithm using the indexes can output the beer distance in query time $O(\alpha(m))$. In particular, our indexing data structures and algorithms achieve the optimal performance (the space and the query time) for series-parallel graphs, which is a wider class of outerplanar graphs.
What Should Data Science Education Do with Large Language Models?
Authors: Xinming Tu, James Zou, Weijie J. Su, Linjun Zhang
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.02792
Pdf link: https://arxiv.org/pdf/2307.02792
Abstract The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art tools can streamline complex processes. As a result, it reshapes the role of data scientists. We argue that LLMs are transforming the responsibilities of data scientists, shifting their focus from hands-on coding, data-wrangling and conducting standard analyses to assessing and managing analyses performed by these automated AIs. This evolution of roles is reminiscent of the transition from a software engineer to a product manager. We illustrate this transition with concrete data science case studies using LLMs in this paper. These developments necessitate a meaningful evolution in data science education. Pedagogy must now place greater emphasis on cultivating diverse skillsets among students, such as LLM-informed creativity, critical thinking, AI-guided programming. LLMs can also play a significant role in the classroom as interactive teaching and learning tools, contributing to personalized education. This paper discusses the opportunities, resources and open challenges for each of these directions. As with any transformative technology, integrating LLMs into education calls for careful consideration. While LLMs can perform repetitive tasks efficiently, it's crucial to remember that their role is to supplement human intelligence and creativity, not to replace it. Therefore, the new era of data science education should balance the benefits of LLMs while fostering complementary human expertise and innovations. In conclusion, the rise of LLMs heralds a transformative period for data science and its education. This paper seeks to shed light on the emerging trends, potential opportunities, and challenges accompanying this paradigm shift, hoping to spark further discourse and investigation into this exciting, uncharted territory.
Evaluating raw waveforms with deep learning frameworks for speech emotion recognition
Authors: Zeynep Hilal Kilimci, Ulku Bayraktar, Ayhan Kucukmanisa
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2307.02820
Pdf link: https://arxiv.org/pdf/2307.02820
Abstract Speech emotion recognition is a challenging task in speech processing field. For this reason, feature extraction process has a crucial importance to demonstrate and process the speech signals. In this work, we represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage for the recognition of emotions utilizing six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To demonstrate the contribution of proposed model, the performance of traditional feature extraction techniques namely, mel-scale spectogram, mel-frequency cepstral coefficients, are blended with machine learning algorithms, ensemble learning methods, deep and hybrid deep learning techniques. Support vector machine, decision tree, naive Bayes, random forests models are evaluated as machine learning algorithms while majority voting and stacking methods are assessed as ensemble learning techniques. Moreover, convolutional neural networks, long short-term memory networks, and hybrid CNN- LSTM model are evaluated as deep learning techniques and compared with machine learning and ensemble learning methods. To demonstrate the effectiveness of proposed model, the comparison with state-of-the-art studies are carried out. Based on the experiment results, CNN model excels existent approaches with 95.86% of accuracy for TESS+RAVDESS data set using raw audio files, thence determining the new state-of-the-art. The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS with CNN model, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in speaker-independent audio categorization problems.
Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation
Authors: Yuanjing Feng, Lei Xie, Jingqiang Wang, Jianzhong He, Fei Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.02825
Pdf link: https://arxiv.org/pdf/2307.02825
Abstract Tractography traces the peak directions extracted from fiber orientation distribution (FOD) suffering from ambiguous spatial correspondences between diffusion directions and fiber geometry, which is prone to producing erroneous tracks while missing true positive connections. The peaks-based tractography methods 'locally' reconstructed streamlines in 'single to single' manner, thus lacking of global information about the trend of the whole fiber bundle. In this work, we propose a novel tractography method based on a bundle-specific tractogram distribution function by using a higher-order streamline differential equation, which reconstructs the streamline bundles in 'cluster to cluster' manner. A unified framework for any higher-order streamline differential equation is presented to describe the fiber bundles with disjoint streamlines defined based on the diffusion tensor vector field. At the global level, the tractography process is simplified as the estimation of bundle-specific tractogram distribution (BTD) coefficients by minimizing the energy optimization model, and is used to characterize the relations between BTD and diffusion tensor vector under the prior guidance by introducing the tractogram bundle information to provide anatomic priors. Experiments are performed on simulated Hough, Sine, Circle data, ISMRM 2015 Tractography Challenge data, FiberCup data, and in vivo data from the Human Connectome Project (HCP) data for qualitative and quantitative evaluation. The results demonstrate that our approach can reconstruct the complex global fiber bundles directly. BTD reduces the error deviation and accumulation at the local level and shows better results in reconstructing long-range, twisting, and large fanning tracts.
Generative Zero-Shot Prompt Learning for Cross-Domain Slot Filling with Inverse Prompting
Authors: Xuefeng Li, Liwen Wang, Guanting Dong, Keqing He, Jinzheng Zhao, Hao Lei, Jiachi Liu, Weiran Xu
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.02830
Pdf link: https://arxiv.org/pdf/2307.02830
Abstract Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the unlabeled target domain. Existing models either encode slot descriptions and examples or design handcrafted question templates using heuristic rules, suffering from poor generalization capability or robustness. In this paper, we propose a generative zero-shot prompt learning framework for cross-domain slot filling, both improving generalization and robustness than previous work. Besides, we introduce a novel inverse prompting strategy to distinguish different slot types to avoid the multiple prediction problem, and an efficient prompt-tuning strategy to boost higher performance by only training fewer prompt parameters. Experiments and analysis demonstrate the effectiveness of our proposed framework, especially huge improvements (+13.44% F1) on the unseen slots.
Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation
Authors: Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.02842
Pdf link: https://arxiv.org/pdf/2307.02842
Abstract Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K})$ regret, where $\alpha$ is the risk level, $d$ is the dimension of state-action features, $H$ is the length of each episode, and $K$ is the number of episodes. We also establish a matching lower bound $\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$ to validate the optimality of ICVaR-L with respect to $d$ and $K$. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret, where $D$ is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.
TDLE: 2-D LiDAR Exploration With Hierarchical Planning Using Regional Division
Authors: Xuyang Zhao, Chengpu Yu, Erpei Xu, Yixuan Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.02852
Pdf link: https://arxiv.org/pdf/2307.02852
Abstract Exploration systems are critical for enhancing the autonomy of robots. Due to the unpredictability of the future planning space, existing methods either adopt an inefficient greedy strategy or require a lot of resources to obtain a global solution. In this work, we address the challenge of obtaining global exploration routes with minimal computing resources. A hierarchical planning framework dynamically divides the planning space into subregions and arranges their orders to provide global guidance for exploration. Indicators that are compatible with the subregion order are used to choose specific exploration targets, thereby considering estimates of spatial structure and extending the planning space to unknown regions. Extensive simulations and field tests demonstrate the efficacy of our method in comparison to existing 2D LiDAR-based approaches. Our code has been made public for further investigation.
Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization
Authors: Anh Mai, Matteo Brucateo, Azza Abouzied, Peter J.Haas, Alexandra Meliou
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.02860
Pdf link: https://arxiv.org/pdf/2307.02860
Abstract A package query returns a package -- a multiset of tuples -- that maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important first step toward supporting prescriptive analytics scalably inside a relational database, it struggles when the data size grows beyond a few hundred million tuples or when the constraints become very tight. In this paper, we present Progressive Shading, a novel algorithm for processing package queries that can scale efficiently to billions of tuples and gracefully handle tight constraints. Progressive Shading solves a sequence of optimization problems over a hierarchy of relations, each resulting from an ever-finer partitioning of the original tuples into homogeneous groups until the original relation is obtained. This strategy avoids the premature discarding of high-quality tuples that can occur with SketchRefine. Our novel partitioning scheme, Dynamic Low Variance, can handle very large relations with multiple attributes and can dynamically adapt to both concentrated and spread-out sets of attribute values, provably outperforming traditional partitioning schemes such as KD-Tree. We further optimize our system by replacing our off-the-shelf optimization software with customized ILP and LP solvers, called Dual Reducer and Parallel Dual Simplex respectively, that are highly accurate and orders of magnitude faster.
Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain
Authors: Marc Zeller, Thomas Waschulzik, Reiner Schmid, Claus Bahlmann
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.02867
Pdf link: https://arxiv.org/pdf/2307.02867
Abstract Traditional automation technologies alone are not sufficient to enable driverless operation of trains (called Grade of Automation (GoA) 4) on non-restricted infrastructure. The required perception tasks are nowadays realized using Machine Learning (ML) and thus need to be developed and deployed reliably and efficiently. One important aspect to achieve this is to use an MLOps process for tackling improved reproducibility, traceability, collaboration, and continuous adaptation of a driverless operation to changing conditions. MLOps mixes ML application development and operation (Ops) and enables high frequency software releases and continuous innovation based on the feedback from operations. In this paper, we outline a safe MLOps process for the continuous development and safety assurance of ML-based systems in the railway domain. It integrates system engineering, safety assurance, and the ML life-cycle in a comprehensive workflow. We present the individual stages of the process and their interactions. Moreover, we describe relevant challenges to automate the different stages of the safe MLOps process.
MomentDiff: Generative Video Moment Retrieval from Random to Real
Authors: Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02869
Pdf link: https://arxiv.org/pdf/2307.02869
Abstract Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description. To achieve this goal, we provide a generative diffusion-based framework called MomentDiff, which simulates a typical human retrieval process from random browsing to gradual localization. Specifically, we first diffuse the real span to random noise, and learn to denoise the random noise to the original span with the guidance of similarity between text and video. This allows the model to learn a mapping from arbitrary random locations to real moments, enabling the ability to locate segments from random initialization. Once trained, MomentDiff could sample random temporal segments as initial guesses and iteratively refine them to generate an accurate temporal boundary. Different from discriminative works (e.g., based on learnable proposals or queries), MomentDiff with random initialized spans could resist the temporal location biases from datasets. To evaluate the influence of the temporal location biases, we propose two anti-bias datasets with location distribution shifts, named Charades-STA-Len and Charades-STA-Mom. The experimental results demonstrate that our efficient framework consistently outperforms state-of-the-art methods on three public benchmarks, and exhibits better generalization and robustness on the proposed anti-bias datasets. The code, model, and anti-bias evaluation datasets are available at https://github.com/IMCCretrieval/MomentDiff.
Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight
Authors: Jiacheng Guo, Minshuo Chen, Huan Wang, Caiming Xiong, Mengdi Wang, Yu Bai
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.02884
Pdf link: https://arxiv.org/pdf/2307.02884
Abstract This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be \emph{different} instead of \emph{linearly independent} as required in revealing POMDPs.
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations
Authors: Sergio F. Chevtchenko, Yeshwanth Bethi, Teresa B. Ludermir, Saeed Afshar
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.02947
Pdf link: https://arxiv.org/pdf/2307.02947
Abstract Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. However, implementing RL in hardware-efficient and bio-inspired ways remains a challenge. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations. The proposed model incorporates multi-layered event-based clustering, with the addition of Temporal Difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model's performance. A tabular actor-critic algorithm with eligibility traces and a state-of-the-art Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our network consistently outperforms the tabular approach and successfully discovers stable control policies on classic RL environments: mountain car, cart-pole, and acrobot. The proposed model offers an appealing trade-off in terms of computational and hardware implementation requirements. The model does not require an external memory buffer nor a global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcasted TD-error signal. Thus, this work contributes to the development of more hardware-efficient RL solutions.
A Simple $(1-ε)$-Approximation Semi-Streaming Algorithm for Maximum (Weighted) Matching
Authors: Sepehr Assadi
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.02968
Pdf link: https://arxiv.org/pdf/2307.02968
Abstract We present a simple semi-streaming algorithm for $(1-\epsilon)$-approximation of bipartite matching in $O(\log{!(n)}/\epsilon)$ passes. This matches the performance of state-of-the-art "$\epsilon$-efficient" algorithms, while being considerably simpler. The algorithm relies on a "white-box" application of the multiplicative weight update method with a self-contained primal-dual analysis that can be of independent interest. To show case this, we use the same ideas, alongside standard tools from matching theory, to present an equally simple semi-streaming algorithm for $(1-\epsilon)$-approximation of weighted matchings in general (not necessarily bipartite) graphs, again in $O(\log{!(n)}/\epsilon)$ passes.
DPM: Clustering Sensitive Data through Separation
Authors: Yara Schütt, Johannes Liebenow, Tanya Braun, Marcel Gehrke, Florian Thaeter, Esfandiar Mohammadi
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.02969
Pdf link: https://arxiv.org/pdf/2307.02969
Abstract Privacy-preserving clustering groups data points in an unsupervised manner whilst ensuring that sensitive information remains protected. Previous privacy-preserving clustering focused on identifying concentration of point clouds. In this paper, we take another path and focus on identifying appropriate separators that split a data set. We introduce the novel differentially private clustering algorithm DPM that searches for accurate data point separators in a differentially private manner. DPM addresses two key challenges for finding accurate separators: identifying separators that are large gaps between clusters instead of small gaps within a cluster and, to efficiently spend the privacy budget, prioritising separators that split the data into large subparts. Using the differentially private Exponential Mechanism, DPM randomly chooses cluster separators with provably high utility: For a data set $D$, if there is a wide low-density separator in the central $60\%$ quantile, DPM finds that separator with probability $1 - \exp(-\sqrt{|D|})$. Our experimental evaluation demonstrates that DPM achieves significant improvements in terms of the clustering metric inertia. With the inertia results of the non-private KMeans++ as a baseline, for $\varepsilon = 1$ and $\delta=10^{-5}$ DPM improves upon the difference to the baseline by up to $50\%$ for a synthetic data set and by up to $62\%$ for a real-world data set compared to a state-of-the-art clustering algorithm by Chang and Kamath.
Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution
Authors: Yuting Lu, Lingtong Min, Binglu Wang, Le Zheng, Xiaoxu Wang, Yongqiang Zhao, Teng Long
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02974
Pdf link: https://arxiv.org/pdf/2307.02974
Abstract Remote sensing image super-resolution (RSISR) plays a vital role in enhancing spatial detials and improving the quality of satellite imagery. Recently, Transformer-based models have shown competitive performance in RSISR. To mitigate the quadratic computational complexity resulting from global self-attention, various methods constrain attention to a local window, enhancing its efficiency. Consequently, the receptive fields in a single attention layer are inadequate, leading to insufficient context modeling. Furthermore, while most transform-based approaches reuse shallow features through skip connections, relying solely on these connections treats shallow and deep features equally, impeding the model's ability to characterize them. To address these issues, we propose a novel transformer architecture called Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) for RSISR. Our proposed model effectively enhances global cognition and understanding of the entire image, facilitating efficient integration of features cross-stages. The model incorporates cross-spatial pixel integration attention (CSPIA) to introduce contextual information into a local window, while cross-stage feature fusion attention (CSFFA) adaptively fuses features from the previous stage to improve feature expression in line with the requirements of the current stage. We conducted comprehensive experiments on multiple benchmark datasets, demonstrating the superior performance of our proposed SPIFFNet in terms of both quantitative metrics and visual quality when compared to state-of-the-art methods.
Efficient Semiring-Weighted Earley Parsing
Authors: Andreas Opedal, Ran Zmigrod, Tim Vieira, Ryan Cotterell, Jason Eisner
Subjects: Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS); Formal Languages and Automata Theory (cs.FL)
Arxiv link: https://arxiv.org/abs/2307.02982
Pdf link: https://arxiv.org/pdf/2307.02982
Abstract This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups. Our presentation includes a known worst-case runtime improvement from Earley's $O (N^3|G||R|)$, which is unworkable for the large grammars that arise in natural language processing, to $O (N^3|G|)$, which matches the runtime of CKY on a binarized version of the grammar $G$. Here $N$ is the length of the sentence, $|R|$ is the number of productions in $G$, and $|G|$ is the total length of those productions. We also provide a version that achieves runtime of $O (N^3|M|)$ with $|M| \leq |G|$ when the grammar is represented compactly as a single finite-state automaton $M$ (this is partly novel). We carefully treat the generalization to semiring-weighted deduction, preprocessing the grammar like Stolcke (1995) to eliminate deduction cycles, and further generalize Stolcke's method to compute the weights of sentence prefixes. We also provide implementation details for efficient execution, ensuring that on a preprocessed grammar, the semiring-weighted versions of our methods have the same asymptotic runtime and space requirements as the unweighted methods, including sub-cubic runtime on some grammars.
Improving Retrieval-Augmented Large Language Models via Data Importance Learning
Authors: Xiaozhong Lyu, Stefan Grafberger, Samantha Biegel, Shaopeng Wei, Meng Cao, Sebastian Schelter, Ce Zhang
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2307.03027
Pdf link: https://arxiv.org/pdf/2307.03027
Abstract Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of retrieved data points. There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function. We further proposed an even more efficient ({\epsilon}, {\delta})-approximation algorithm. Our experimental results illustrate that we can enhance the performance of large language models by only pruning or reweighting the retrieval corpus, without requiring further training. For some tasks, this even allows a small model (e.g., GPT-JT), augmented with a search engine API, to outperform GPT-3.5 (without retrieval augmentation). Moreover, we show that weights based on multilinear extension can be computed efficiently in practice (e.g., in less than ten minutes for a corpus with 100 million elements).
Lyapunov function search method for analysis of nonlinear systems stability using genetic algorithm
Authors: A.M. Zenkin, A.A. Peregudin, A.A. Bobtsov
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.03030
Pdf link: https://arxiv.org/pdf/2307.03030
Abstract This paper considers a wide class of smooth continuous dynamic nonlinear systems (control objects) with a measurable vector of state. The problem is to find a special function (Lyapunov function), which in the framework of the second Lyapunov method guarantees asymptotic stability for the above described class of nonlinear systems. It is well known that the search for a Lyapunov function is the "cornerstone" of mathematical stability theory. Methods for selecting or finding the Lyapunov function to analyze the stability of closed linear stationary systems, as well as for nonlinear objects with explicit linear dynamic and nonlinear static parts, have been well studied (see works by Lurie, Yakubovich, Popov, and many others). However, universal approaches to the search for the Lyapunov function for a more general class of nonlinear systems have not yet been identified. There is a large variety of methods for finding the Lyapunov function for nonlinear systems, but they all operate within the constraints imposed on the structure of the control object. In this paper we propose another approach, which allows to give specialists in the field of automatic control theory a new tool/mechanism of Lyapunov function search for stability analysis of smooth continuous dynamic nonlinear systems with measurable state vector. The essence of proposed approach consists in representation of some function through sum of nonlinear terms, which are elements of object's state vector, multiplied by unknown coefficients, raised to positive degrees. Then the unknown coefficients are selected using genetic algorithm, which should provide the function with all necessary conditions for Lyapunov function (in the framework of the second Lyapunov method).
Art Authentication with Vision Transformers
Authors: Ludovica Schaerf, Carina Popovici, Eric Postma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.03039
Pdf link: https://arxiv.org/pdf/2307.03039
Abstract In recent years, Transformers, initially developed for language, have been successfully applied to visual tasks. Vision Transformers have been shown to push the state-of-the-art in a wide range of tasks, including image classification, object detection, and semantic segmentation. While ample research has shown promising results in art attribution and art authentication tasks using Convolutional Neural Networks, this paper examines if the superiority of Vision Transformers extends to art authentication, improving, thus, the reliability of computer-based authentication of artworks. Using a carefully compiled dataset of authentic paintings by Vincent van Gogh and two contrast datasets, we compare the art authentication performances of Swin Transformers with those of EfficientNet. Using a standard contrast set containing imitations and proxies (works by painters with styles closely related to van Gogh), we find that EfficientNet achieves the best performance overall. With a contrast set that only consists of imitations, we find the Swin Transformer to be superior to EfficientNet by achieving an authentication accuracy of over 85%. These results lead us to conclude that Vision Transformers represent a strong and promising contender in art authentication, particularly in enhancing the computer-based ability to detect artistic imitations.
Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain
Authors: Aryo Gema, Luke Daines, Pasquale Minervini, Beatrice Alex
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.03042
Pdf link: https://arxiv.org/pdf/2307.03042
Abstract Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. However, this approach is increasingly proven to be impractical owing to the substantial computational requirements associated with training such large language models. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) techniques offer a viable solution by selectively fine-tuning a small subset of additional parameters, significantly reducing the computational requirements for domain adaptation. In this study, we propose Clinical LLaMA-LoRA, a PEFT adapter layer built upon the open-sourced LLaMA model. Clinical LLaMA-LoRA is trained using clinical notes obtained from the MIMIC-IV database, thereby creating a specialised adapter designed for the clinical domain. Additionally, we propose a two-step PEFT framework which fuses Clinical LLaMA-LoRA with Downstream LLaMA-LoRA, another PEFT adapter specialised for downstream tasks. We evaluate this framework on multiple clinical outcome prediction datasets, comparing it to clinically trained language models. Our proposed framework achieves a state-of-the-art AUROC score averaged across all clinical downstream tasks. We observe substantial improvements of 6-9% AUROC score in the large-scale multilabel classification tasks, such as diagnoses and procedures classification.
Origin-Destination Travel Time Oracle for Map-based Services
Authors: Yan Lin, Huaiyu Wan, Jilin Hu, Shengnan Guo, Bin Yang, Youfang Lin, Christian S. Jensen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.03048
Pdf link: https://arxiv.org/pdf/2307.03048
Abstract Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle~(ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.
Generalizing Backpropagation for Gradient-Based Interpretability
Authors: Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alexander Warstadt, Ryan Cotterell
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.03056
Pdf link: https://arxiv.org/pdf/2307.03056
Abstract Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs. While these methods can indicate which input features may be important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.
Learning Constrained Corner Node Trajectories of a Tether Net System for Space Debris Capture
Authors: Feng Liu, Achira Boonrath, Prajit KrisshnaKumar, Elenora M. Botta, Souma Chowdhury
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.03061
Pdf link: https://arxiv.org/pdf/2307.03061
Abstract The earth's orbit is becoming increasingly crowded with debris that poses significant safety risks to the operation of existing and new spacecraft and satellites. The active tether-net system, which consists of a flexible net with maneuverable corner nodes launched from a small autonomous spacecraft, is a promising solution for capturing and disposing of such space debris. The requirement of autonomous operation and the need to generalize over scenarios with debris scenarios in different rotational rates makes the capture process significantly challenging. The space debris could rotate about multiple axes, which, along with sensing/estimation and actuation uncertainties, calls for a robust, generalizable approach to guiding the net launch and flight - one that can guarantee robust capture. This paper proposes a decentralized actuation system combined with reinforcement learning for planning and controlling this tether-net system. In this new system, four microsatellites with cold gas type thrusters act as the corner nodes of the net and can thus help control or correct the flight of the net after launch. The microsatellites pull the net to complete the task of approaching and capturing the space debris. The proposed method uses a RL framework that integrates a proximal policy optimization to find the optimal solution based on the dynamics simulation of the net and the microsatellites performed in Vortex Studio. The RL framework finds the optimal trajectory that is both fuel-efficient and ensures a desired level of capture quality.
Querying Data Exchange Settings Beyond Positive Queries
Authors: Marco Calautti, Sergio Greco, Cristian Molinaro, Irina Trubitsyna
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.03071
Pdf link: https://arxiv.org/pdf/2307.03071
Abstract Data exchange, the problem of transferring data from a source schema to a target schema, has been studied for several years. The semantics of answering positive queries over the target schema has been defined in early work, but little attention has been paid to more general queries. A few proposals of semantics for more general queries exist but they either do not properly extend the standard semantics under positive queries, giving rise to counterintuitive answers, or they make query answering undecidable even for the most important data exchange settings, e.g., with weakly-acyclic dependencies. The goal of this paper is to provide a new semantics for data exchange that is able to deal with general queries. At the same time, we want our semantics to coincide with the classical one when focusing on positive queries, and to not trade-off too much in terms of complexity of query answering. We show that query answering is undecidable in general under the new semantics, but it is $\co\NP\complete$ when the dependencies are weakly-acyclic. Moreover, in the latter case, we show that exact answers under our semantics can be computed by means of logic programs with choice, thus exploiting existing efficient systems. For more efficient computations, we also show that our semantics allows for the construction of a representative target instance, similar in spirit to a universal solution, that can be exploited for computing approximate answers in polynomial time. Under consideration in Theory and Practice of Logic Programming (TPLP).
OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models
Authors: Shengding Hu, Ning Ding, Weilin Zhao, Xingtai Lv, Zhen Zhang, Zhiyuan Liu, Maosong Sun
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.03084
Pdf link: https://arxiv.org/pdf/2307.03084
Abstract The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning. To address this, many studies explore parameter-efficient tuning methods, also framed as "delta tuning", which updates only a small subset of parameters, known as "delta modules", while keeping the backbone model's parameters fixed. However, the practicality and flexibility of delta tuning have been limited due to existing implementations that directly modify the code of the backbone PTMs and hard-code specific delta tuning methods for each PTM. In this paper, we present OpenDelta, an open-source library that overcomes these limitations by providing a plug-and-play implementation of various delta tuning methods. Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs. OpenDelta is designed to be simple, modular, and extensible, providing a comprehensive platform for researchers and practitioners to adapt large PTMs efficiently.
Efficient Domain Adaptation of Sentence Embeddings using Adapters
Authors: Tim Schopf, Dennis Schneider, Florian Matthes
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.03104
Pdf link: https://arxiv.org/pdf/2307.03104
Abstract Sentence embeddings enable us to capture the semantic similarity of short texts. Most sentence embedding models are trained for general semantic textual similarity (STS) tasks. Therefore, to use sentence embeddings in a particular domain, the model must be adapted to it in order to achieve good results. Usually, this is done by fine-tuning the entire sentence embedding model for the domain of interest. While this approach yields state-of-the-art results, all of the model's weights are updated during fine-tuning, making this method resource-intensive. Therefore, instead of fine-tuning entire sentence embedding models for each target domain individually, we propose to train lightweight adapters. These domain-specific adapters do not require fine-tuning all underlying sentence embedding model parameters. Instead, we only train a small number of additional parameters while keeping the weights of the underlying sentence embedding model fixed. Training domain-specific adapters allows always using the same base model and only exchanging the domain-specific adapters to adapt sentence embeddings to a specific domain. We show that using adapters for parameter-efficient domain adaptation of sentence embeddings yields competitive performance within 1% of a domain-adapted, entirely fine-tuned sentence embedding model while only training approximately 3.6% of the parameters.
LISSNAS: Locality-based Iterative Search Space Shrinkage for Neural Architecture Search
Authors: Bhavna Gopal, Arjun Sridhar, Tunhou Zhang, Yiran Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.03110
Pdf link: https://arxiv.org/pdf/2307.03110
Abstract Search spaces hallmark the advancement of Neural Architecture Search (NAS). Large and complex search spaces with versatile building operators and structures provide more opportunities to brew promising architectures, yet pose severe challenges on efficient exploration and exploitation. Subsequently, several search space shrinkage methods optimize by selecting a single sub-region that contains some well-performing networks. Small performance and efficiency gains are observed with these methods but such techniques leave room for significantly improved search performance and are ineffective at retaining architectural diversity. We propose LISSNAS, an automated algorithm that shrinks a large space into a diverse, small search space with SOTA search performance. Our approach leverages locality, the relationship between structural and performance similarity, to efficiently extract many pockets of well-performing networks. We showcase our method on an array of search spaces spanning various sizes and datasets. We accentuate the effectiveness of our shrunk spaces when used in one-shot search by achieving the best Top-1 accuracy in two different search spaces. Our method achieves a SOTA Top-1 accuracy of 77.6\% in ImageNet under mobile constraints, best-in-class Kendal-Tau, architectural diversity, and search space size.
JSONoid: Monoid-based Enrichment for Configurable and Scalable Data-Driven Schema Discovery
Authors: Michael J. Mior
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.03113
Pdf link: https://arxiv.org/pdf/2307.03113
Abstract Schema discovery is an important aspect to working with data in formats such as JSON. Unlike relational databases, JSON data sets often do not have associated structural information. Consumers of such datasets are often left to browse through data in an attempt to observe commonalities in structure across documents to construct suitable code for data processing. However, this process is time-consuming and error-prone. Existing distributed approaches to mining schemas present a significant usability advantage as they provide useful metadata for large data sources. However, depending on the data source, ad hoc queries for estimating other properties to help with crafting an efficient data pipeline can be expensive. We propose JSONoid, a distributed schema discovery process augmented with additional metadata in the form of monoid data structures that are easily maintainable in a distributed setting. JSONoid subsumes several existing approaches to distributed schema discovery with similar performance. Our approach also adds significant useful additional information about data values to discovered schemas with linear scalability.
Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance
Authors: Yuchen Fang, Zhenggang Tang, Kan Ren, Weiqing Liu, Li Zhao, Jiang Bian, Dongsheng Li, Weinan Zhang, Yong Yu, Tie-Yan Liu
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2307.03119
Pdf link: https://arxiv.org/pdf/2307.03119
Abstract Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.
Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks
Authors: Valerio Arnaboldi, Mattia Giovanni Campana, Franca Delmastro
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.03126
Pdf link: https://arxiv.org/pdf/2307.03126
Abstract Wi-Fi Direct is a promising technology for the support of device-to-device communications (D2D) on commercial mobile devices. However, the standard as-it-is is not sufficient to support the real deployment of networking solutions entirely based on D2D such as opportunistic networks. In fact, WiFi Direct presents some characteristics that could limit the autonomous creation of D2D connections among users' personal devices. Specifically, the standard explicitly requires the user's authorization to establish a connection between two or more devices, and it provides a limited support for inter-group communication. In some cases, this might lead to the creation of isolated groups of nodes which cannot communicate among each other. In this paper, we propose a novel middleware-layer protocol for the efficient configuration and management of WiFi Direct groups (WiFi Direct Group Manager, WFD-GM) to enable autonomous connections and inter-group communication. This enables opportunistic networks in real conditions (e.g., variable mobility and network size). WFD-GM defines a context function that takes into account heterogeneous parameters for the creation of the best group configuration in a specific time window, including an index of nodes' stability and power levels. We evaluate the protocol performances by simulating three reference scenarios including different mobility models, geographical areas and number of nodes. Simulations are also supported by experimental results related to the evaluation in a real testbed of the involved context parameters. We compare WFD-GM with the state-of-the-art solutions and we show that it performs significantly better than a Baseline approach in scenarios with medium/low mobility, and it is comparable with it in case of high mobility, without introducing additional overhead.
VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering
Authors: Zijun Yao, Yuanyong Chen, Xin Lv, Shulin Cao, Amy Xin, Jifan Yu, Hailong Jin, Jianjun Xu, Peng Zhang, Lei Hou, Juanzi Li
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2307.03130
Pdf link: https://arxiv.org/pdf/2307.03130
Abstract We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answering (KBQA) system that integrates human into the loop to edit and debug the knowledge base (KB) queries. VisKoP not only provides a neural program induction module, which converts natural language questions into knowledge oriented program language (KoPL), but also maps KoPL programs into graphical elements. KoPL programs can be edited with simple graphical operators, such as dragging to add knowledge operators and slot filling to designate operator arguments. Moreover, VisKoP provides auto-completion for its knowledge base schema and users can easily debug the KoPL program by checking its intermediate results. To facilitate the practical KBQA on a million-entity-level KB, we design a highly efficient KoPL execution engine for the back-end. Experiment results show that VisKoP is highly efficient and user interaction can fix a large portion of wrong KoPL programs to acquire the correct answer. The VisKoP online demo https://demoviskop.xlore.cn (Stable release of this paper) and https://viskop.xlore.cn (Beta release with new features), highly efficient KoPL engine https://pypi.org/project/kopl-engine, and screencast video https://youtu.be/zAbJtxFPTXo are now publicly available.
Risk-Averse Trajectory Optimization via Sample Average Approximation
Authors: Thomas Lew, Riccardo Bonalli, Marco Pavone
Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2307.03167
Pdf link: https://arxiv.org/pdf/2307.03167
Abstract Trajectory optimization under uncertainty underpins a wide range of applications in robotics. However, existing methods are limited in terms of reasoning about sources of epistemic and aleatoric uncertainty, space and time correlations, nonlinear dynamics, and non-convex constraints. In this work, we first introduce a continuous-time planning formulation with an average-value-at-risk constraint over the entire planning horizon. Then, we propose a sample-based approximation that unlocks an efficient, general-purpose, and time-consistent algorithm for risk-averse trajectory optimization. We prove that the method is asymptotically optimal and derive finite-sample error bounds. Simulations demonstrate the high speed and reliability of the approach on problems with stochasticity in nonlinear dynamics, obstacle fields, interactions, and terrain parameters.
LEO: Learning Efficient Orderings for Multiobjective Binary Decision Diagrams
Authors: Rahul Patel, Elias B. Khalil
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.03171
Pdf link: https://arxiv.org/pdf/2307.03171
Abstract Approaches based on Binary decision diagrams (BDDs) have recently achieved state-of-the-art results for multiobjective integer programming problems. The variable ordering used in constructing BDDs can have a significant impact on their size and on the quality of bounds derived from relaxed or restricted BDDs for single-objective optimization problems. We first showcase a similar impact of variable ordering on the Pareto frontier (PF) enumeration time for the multiobjective knapsack problem, suggesting the need for deriving variable ordering methods that improve the scalability of the multiobjective BDD approach. To that end, we derive a novel parameter configuration space based on variable scoring functions which are linear in a small set of interpretable and easy-to-compute variable features. We show how the configuration space can be efficiently explored using black-box optimization, circumventing the curse of dimensionality (in the number of variables and objectives), and finding good orderings that reduce the PF enumeration time. However, black-box optimization approaches incur a computational overhead that outweighs the reduction in time due to good variable ordering. To alleviate this issue, we propose LEO, a supervised learning approach for finding efficient variable orderings that reduce the enumeration time. Experiments on benchmark sets from the knapsack problem with 3-7 objectives and up to 80 variables show that LEO is ~30-300% and ~10-200% faster at PF enumeration than common ordering strategies and algorithm configuration. Our code and instances are available at https://github.com/khalil-research/leo.
Markov Persuasion Processes with Endogenous Agent Beliefs
Authors: Krishnamurthy Iyer, Haifeng Xu, You Zu
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
Arxiv link: https://arxiv.org/abs/2307.03181
Pdf link: https://arxiv.org/pdf/2307.03181
Abstract We consider a dynamic Bayesian persuasion setting where a single long-lived sender persuades a stream of ``short-lived'' agents (receivers) by sharing information about a payoff-relevant state. The state transitions are Markovian and the sender seeks to maximize the long-run average reward by committing to a (possibly history-dependent) signaling mechanism. While most previous studies of Markov persuasion consider exogenous agent beliefs that are independent of the chain, we study a more natural variant with endogenous agent beliefs that depend on the chain's realized history. A key challenge to analyze such settings is to model the agents' partial knowledge about the history information. We analyze a Markov persuasion process (MPP) under various information models that differ in the amount of information the receivers have about the history of the process. Specifically, we formulate a general partial-information model where each receiver observes the history with an $\ell$ period lag. Our technical contribution start with analyzing two benchmark models, i.e., the full-history information model and the no-history information model. We establish an ordering of the sender's payoff as a function of the informativeness of agent's information model (with no-history as the least informative), and develop efficient algorithms to compute optimal solutions for these two benchmarks. For general $\ell$, we present the technical challenges in finding an optimal signaling mechanism, where even determining the right dependency on the history becomes difficult. To bypass the difficulties, we use a robustness framework to design a "simple" \emph{history-independent} signaling mechanism that approximately achieves optimal payoff when $\ell$ is reasonably large.
Keyword: faster

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference
Authors: Luciano Del Corro, Allie Del Giorno, Sahaj Agarwal, Bin Yu, Ahmed Awadallah, Subhabrata Mukherjee
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.02628
Pdf link: https://arxiv.org/pdf/2307.02628
Abstract Autoregressive large language models (LLMs) have made remarkable progress in various natural language generation tasks. However, they incur high computation cost and latency resulting from the autoregressive token-by-token generation. To address this issue, several approaches have been proposed to reduce computational cost using early-exit strategies. These strategies enable faster text generation using reduced computation without applying the full computation graph to each token. While existing token-level early exit methods show promising results for online inference, they cannot be readily applied for batch inferencing and Key-Value caching. This is because they have to wait until the last token in a batch exits before they can stop computing. This severely limits the practical application of such techniques. In this paper, we propose a simple and effective token-level early exit method, SkipDecode, designed to work seamlessly with batch inferencing and KV caching. It overcomes prior constraints by setting up a singular exit point for every token in a batch at each sequence position. It also guarantees a monotonic decrease in exit points, thereby eliminating the need to recompute KV Caches for preceding tokens. Rather than terminating computation prematurely as in prior works, our approach bypasses lower to middle layers, devoting most of the computational resources to upper layers, allowing later tokens to benefit from the compute expenditure by earlier tokens. Our experimental results show that SkipDecode can obtain 2x to 5x inference speedups with negligible regression across a variety of tasks. This is achieved using OPT models of 1.3 billion and 6.7 billion parameters, all the while being directly compatible with batching and KV caching optimization techniques.
Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization
Authors: Anh Mai, Matteo Brucateo, Azza Abouzied, Peter J.Haas, Alexandra Meliou
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.02860
Pdf link: https://arxiv.org/pdf/2307.02860
Abstract A package query returns a package -- a multiset of tuples -- that maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important first step toward supporting prescriptive analytics scalably inside a relational database, it struggles when the data size grows beyond a few hundred million tuples or when the constraints become very tight. In this paper, we present Progressive Shading, a novel algorithm for processing package queries that can scale efficiently to billions of tuples and gracefully handle tight constraints. Progressive Shading solves a sequence of optimization problems over a hierarchy of relations, each resulting from an ever-finer partitioning of the original tuples into homogeneous groups until the original relation is obtained. This strategy avoids the premature discarding of high-quality tuples that can occur with SketchRefine. Our novel partitioning scheme, Dynamic Low Variance, can handle very large relations with multiple attributes and can dynamically adapt to both concentrated and spread-out sets of attribute values, provably outperforming traditional partitioning schemes such as KD-Tree. We further optimize our system by replacing our off-the-shelf optimization software with customized ILP and LP solvers, called Dual Reducer and Parallel Dual Simplex respectively, that are highly accurate and orders of magnitude faster.
Global q-superlinear convergence of the infinite-dimensional Newton's method for the regularized p-Stokes equations
Authors: Niko Schmidt
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.02930
Pdf link: https://arxiv.org/pdf/2307.02930
Abstract The motion of glaciers can be simulated with the p-Stokes equations. We present an algorithm that solves these equations faster than the Picard iteration. We do that by proving q-superlinear global convergence of the infinite-dimensional Newton's method with Armijo step sizes to the solution of these equations. We only have to add an arbitrarily small diffusion term for this convergence result. We also consider approximations of exact step sizes. Exact step sizes are possible because we reformulate the problem as minimizing a convex functional. Next, we prove that the additional diffusion term only causes minor differences in the solution compared to the original p-Stokes equations. Finally, we test our algorithms on a reformulation of the experiment ISMIP-HOM B. The approximation of exact step sizes for the Picard iteration and Newton's method is superior in the experiment compared to the Picard iteration. Also, Newton's method with Armijo step sizes converges faster than the Picard iteration. However, the reached accuracy of Newton's method with Armijo step sizes depends more on the resolution of the domain.
EffLiFe: Efficient Light Field Generation via Hierarchical Sparse Gradient Descent
Authors: Yijie Deng, Lei Han, Tianpeng Lin, Lin Li, Jinzhi Zhang, Lu Fang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.03017
Pdf link: https://arxiv.org/pdf/2307.03017
Abstract With the rise of Extended Reality (XR) technology, there is a growing need for real-time light field generation from sparse view inputs. Existing methods can be classified into offline techniques, which can generate high-quality novel views but at the cost of long inference/training time, and online methods, which either lack generalizability or produce unsatisfactory results. However, we have observed that the intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field generation while maintaining rendering quality. Based on this insight, we introduce EffLiFe, a novel light field optimization method, which leverages the proposed Hierarchical Sparse Gradient Descent (HSGD) to produce high-quality light fields from sparse view images in real time. Technically, the coarse MPI of a scene is first generated using a 3D CNN, and it is further sparsely optimized by focusing only on important MPI gradients in a few iterations. Nevertheless, relying solely on optimization can lead to artifacts at occlusion boundaries. Therefore, we propose an occlusion-aware iterative refinement module that removes visual artifacts in occluded regions by iteratively filtering the input. Extensive experiments demonstrate that our method achieves comparable visual quality while being 100x faster on average than state-of-the-art offline methods and delivering better performance (about 2 dB higher in PSNR) compared to other online approaches.
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Authors: Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.03135
Pdf link: https://arxiv.org/pdf/2307.03135
Abstract Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Our code will be released at https://github.com/xuanlinli17/large_vlm_distillation_ood
LEO: Learning Efficient Orderings for Multiobjective Binary Decision Diagrams
Authors: Rahul Patel, Elias B. Khalil
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.03171
Pdf link: https://arxiv.org/pdf/2307.03171
Abstract Approaches based on Binary decision diagrams (BDDs) have recently achieved state-of-the-art results for multiobjective integer programming problems. The variable ordering used in constructing BDDs can have a significant impact on their size and on the quality of bounds derived from relaxed or restricted BDDs for single-objective optimization problems. We first showcase a similar impact of variable ordering on the Pareto frontier (PF) enumeration time for the multiobjective knapsack problem, suggesting the need for deriving variable ordering methods that improve the scalability of the multiobjective BDD approach. To that end, we derive a novel parameter configuration space based on variable scoring functions which are linear in a small set of interpretable and easy-to-compute variable features. We show how the configuration space can be efficiently explored using black-box optimization, circumventing the curse of dimensionality (in the number of variables and objectives), and finding good orderings that reduce the PF enumeration time. However, black-box optimization approaches incur a computational overhead that outweighs the reduction in time due to good variable ordering. To alleviate this issue, we propose LEO, a supervised learning approach for finding efficient variable orderings that reduce the enumeration time. Experiments on benchmark sets from the knapsack problem with 3-7 objectives and up to 80 variables show that LEO is ~30-300% and ~10-200% faster at PF enumeration than common ordering strategies and algorithm configuration. Our code and instances are available at https://github.com/khalil-research/leo.
Keyword: mobile

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
Authors: Man Fai Wong, Shangxin Guo, Ching Nam Hang, Siu Wai Ho, Chee Wei Tan
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.02503
Pdf link: https://arxiv.org/pdf/2307.02503
Abstract This paper provides a comprehensive review of the literature concerning the utilization of Natural Language Processing (NLP) techniques, with a particular focus on transformer-based large language models (LLMs) trained using Big Code, within the domain of AI-assisted programming tasks. LLMs, augmented with software naturalness, have played a crucial role in facilitating AI-assisted programming applications, including code generation, code completion, code translation, code refinement, code summarization, defect detection, and clone detection. Notable examples of such applications include the GitHub Copilot powered by OpenAI's Codex and DeepMind AlphaCode. This paper presents an overview of the major LLMs and their applications in downstream tasks related to AI-assisted programming. Furthermore, it explores the challenges and opportunities associated with incorporating NLP techniques with software naturalness in these applications, with a discussion on extending AI-assisted programming capabilities to Apple's Xcode for mobile software development. This paper also presents the challenges of and opportunities for incorporating NLP techniques with software naturalness, empowering developers with advanced coding assistance and streamlining the software development process.
FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout
Authors: Irene Wang, Prashant J. Nair, Divya Mahajan
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.02623
Pdf link: https://arxiv.org/pdf/2307.02623
Abstract Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach.
Computing Offloading and Semantic Compression for Intelligent Computing Tasks in MEC Systems
Authors: Yuanpeng Zheng, Tiankui Zhang, Rong Huang, Yapeng Wang
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.02747
Pdf link: https://arxiv.org/pdf/2307.02747
Abstract This paper investigates the intelligent computing task-oriented computing offloading and semantic compression in mobile edge computing (MEC) systems. With the popularity of intelligent applications in various industries, terminals increasingly need to offload intelligent computing tasks with complex demands to MEC servers for computing, which is a great challenge for bandwidth and computing capacity allocation in MEC systems. Considering the accuracy requirement of intelligent computing tasks, we formulate an optimization problem of computing offloading and semantic compression. We jointly optimize the system utility which are represented as computing accuracy and task delay respectively to acquire the optimized system utility. To solve the proposed optimization problem, we decompose it into computing capacity allocation subproblem and compression offloading subproblem and obtain solutions through convex optimization and successive convex approximation. After that, the offloading decisions, computing capacity and compressed ratio are obtained in closed forms. We design the computing offloading and semantic compression algorithm for intelligent computing tasks in MEC systems then. Simulation results represent that our algorithm converges quickly and acquires better performance and resource utilization efficiency through the trend with total number of users and computing capacity compared with benchmarks.
Dynamic Multi-time Scale User Admission and Resource Allocation for Semantic Extraction in MEC Systems
Authors: Yuanpeng Zheng, Tiankui Zhang, Jonathan Loo
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.02748
Pdf link: https://arxiv.org/pdf/2307.02748
Abstract This paper investigates the semantic extraction task-oriented dynamic multi-time scale user admission and resourceallocation in mobile edge computing (MEC) systems. Amid prevalence artifi cial intelligence applications in various industries,the offloading of semantic extraction tasks which are mainlycomposed of convolutional neural networks of computer vision isa great challenge for communication bandwidth and computing capacity allocation in MEC systems. Considering the stochasticnature of the semantic extraction tasks, we formulate a stochastic optimization problem by modeling it as the dynamic arrival of tasks in the temporal domain. We jointly optimize the system revenue and cost which are represented as user admission in the long term and resource allocation in the short term respectively. To handle the proposed stochastic optimization problem, we decompose it into short-time-scale subproblems and a long-time-scale subproblem by using the Lyapunov optimization technique. After that, the short-time-scale optimization variables of resource allocation, including user association, bandwidth allocation, and computing capacity allocation are obtained in closed form. The user admission optimization on long-time scales is solved by a heuristic iteration method. Then, the multi-time scale user admission and resource allocation algorithm is proposed for dynamic semantic extraction task computing in MEC systems. Simulation results demonstrate that, compared with the benchmarks, the proposed algorithm improves the performance of user admission and resource allocation efficiently and achieves a flexible trade-off between system revenue and cost at multi-time scales and considering semantic extraction tasks.
Deep Ensemble Learning with Frame Skipping for Face Anti-Spoofing
Authors: Usman Muhammad, Md Ziaul Hoque, Mourad Oussalah, Jorma Laaksonen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02858
Pdf link: https://arxiv.org/pdf/2307.02858
Abstract Face presentation attacks, also known as spoofing attacks, pose a significant threat to biometric systems that rely on facial recognition systems, such as access control systems, mobile payments, and identity verification systems. To prevent spoofing, several video-based methods have been presented in the literature that analyze facial motion in successive video frames. However, estimating the motion between adjacent frames is a challenging task and requires high computational cost. In this paper, we reformulate the face anti-spoofing task as a motion prediction problem and introduce a deep ensemble learning model with a frame skipping mechanism. The proposed frame skipping is based on a uniform sampling approach where the original video is divided into fixed size video clips. In this way, every nth frame of the clip is selected to ensure that the temporal patterns can easily be perceived during the training of three different recurrent neural networks (RNNs). Motivated by the performance of each RNNs, a meta-model is developed to improve the overall recognition performance by combining the predictions of the individual RNNs. Extensive experiments were conducted on four datasets, and state-of-the-art performance is reported for MSU-MFSD (3.12\%), Replay-Attack (11.19\%), and OULU-NPU (12.23\%) using half total error rate (HTER) in the most challenging cross-dataset test scenario.
Towards accurate instance segmentation in large-scale LiDAR point clouds
Authors: Binbin Xiang, Torben Peters, Theodora Kontogianni, Frawa Vetterli, Stefano Puliti, Rasmus Astrup, Konrad Schindler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02877
Pdf link: https://arxiv.org/pdf/2307.02877
Abstract Panoptic segmentation is the combination of semantic and instance segmentation: assign the points in a 3D point cloud to semantic categories and partition them into distinct object instances. It has many obvious applications for outdoor scene understanding, from city mapping to forest management. Existing methods struggle to segment nearby instances of the same semantic category, like adjacent pieces of street furniture or neighbouring trees, which limits their usability for inventory- or management-type applications that rely on object instances. This study explores the steps of the panoptic segmentation pipeline concerned with clustering points into object instances, with the goal to alleviate that bottleneck. We find that a carefully designed clustering strategy, which leverages multiple types of learned point embeddings, significantly improves instance segmentation. Experiments on the NPM3D urban mobile mapping dataset and the FOR-instance forest dataset demonstrate the effectiveness and versatility of the proposed strategy.
Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge
Authors: Georg Rutishauser, Francesco Conti, Luca Benini
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.02894
Pdf link: https://arxiv.org/pdf/2307.02894
Abstract Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.
Transfer Learning for the Efficient Detection of COVID-19 from Smartphone Audio Data
Authors: Mattia Giovanni Campana, Franca Delmastro, Elena Pagani
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2307.02975
Pdf link: https://arxiv.org/pdf/2307.02975
Abstract Disease detection from smartphone data represents an open research challenge in mobile health (m-health) systems. COVID-19 and its respiratory symptoms are an important case study in this area and their early detection is a potential real instrument to counteract the pandemic situation. The efficacy of this solution mainly depends on the performances of AI algorithms applied to the collected data and their possible implementation directly on the users' mobile devices. Considering these issues, and the limited amount of available data, in this paper we present the experimental evaluation of 3 different deep learning models, compared also with hand-crafted features, and of two main approaches of transfer learning in the considered scenario: both feature extraction and fine-tuning. Specifically, we considered VGGish, YAMNET, and L\textsuperscript{3}-Net (including 12 different configurations) evaluated through user-independent experiments on 4 different datasets (13,447 samples in total). Results clearly show the advantages of L\textsuperscript{3}-Net in all the experimental settings as it overcomes the other solutions by 12.3\% in terms of Precision-Recall AUC as features extractor, and by 10\% when the model is fine-tuned. Moreover, we note that to fine-tune only the fully-connected layers of the pre-trained models generally leads to worse performances, with an average drop of 6.6\% with respect to feature extraction. %highlighting the need for further investigations. Finally, we evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.
UAV Swarms for Joint Data Ferrying and Dynamic Cell Coverage via Optimal Transport Descent and Quadratic Assignment
Authors: Kai Cui, Lars Baumgärtner, Burak Yilmaz, Mengguang Li, Christian Fabian, Benjamin Becker, Lin Xiang, Maximilian Bauer, Heinz Koeppl
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.02988
Pdf link: https://arxiv.org/pdf/2307.02988
Abstract Both data ferrying with disruption-tolerant networking (DTN) and mobile cellular base stations constitute important techniques for UAV-aided communication in situations of crises where standard communication infrastructure is unavailable. For optimal use of a limited number of UAVs, we propose providing both DTN and a cellular base station on each UAV. Here, DTN is used for large amounts of low-priority data, while capacity-constrained cell coverage remains reserved for emergency calls or command and control. We optimize cell coverage via a novel optimal transport-based formulation using alternating minimization, while for data ferrying we periodically deliver data between dynamic clusters by solving quadratic assignment problems. In our evaluation, we consider different scenarios with varying mobility models and a wide range of flight patterns. Overall, we tractably achieve optimal cell coverage under quality-of-service costs with DTN-based data ferrying, enabling large-scale deployment of UAV swarms for crisis communication.
Role Engine Implementation for a Continuous and Collaborative Multi-Robot System
Authors: Behzad Akbari, Zikai Wang, Haibin Zhu, Lucas Wan, Ryan Adderson, Ya-Jun Pan
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2307.03103
Pdf link: https://arxiv.org/pdf/2307.03103
Abstract In situations involving teams of diverse robots, assigning appropriate roles to each robot and evaluating their performance is crucial. These roles define the specific characteristics of a robot within a given context. The stream actions exhibited by a robot based on its assigned role are referred to as the process role. Our research addresses the depiction of process roles using a multivariate probabilistic function. The main aim of this study is to develop a role engine for collaborative multi-robot systems and optimize the behavior of the robots. The role engine is designed to assign suitable roles to each robot, generate approximately optimal process roles, update them on time, and identify instances of robot malfunction or trigger replanning when necessary. The environment considered is dynamic, involving obstacles and other agents. The role engine operates hybrid, with central initiation and decentralized action, and assigns unlabeled roles to agents. We employ the Gaussian Process (GP) inference method to optimize process roles based on local constraints and constraints related to other agents. Furthermore, we propose an innovative approach that utilizes the environment's skeleton to address initialization and feasibility evaluation challenges. We successfully demonstrated the proposed approach's feasibility, and efficiency through simulation studies and real-world experiments involving diverse mobile robots.
LISSNAS: Locality-based Iterative Search Space Shrinkage for Neural Architecture Search
Authors: Bhavna Gopal, Arjun Sridhar, Tunhou Zhang, Yiran Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.03110
Pdf link: https://arxiv.org/pdf/2307.03110
Abstract Search spaces hallmark the advancement of Neural Architecture Search (NAS). Large and complex search spaces with versatile building operators and structures provide more opportunities to brew promising architectures, yet pose severe challenges on efficient exploration and exploitation. Subsequently, several search space shrinkage methods optimize by selecting a single sub-region that contains some well-performing networks. Small performance and efficiency gains are observed with these methods but such techniques leave room for significantly improved search performance and are ineffective at retaining architectural diversity. We propose LISSNAS, an automated algorithm that shrinks a large space into a diverse, small search space with SOTA search performance. Our approach leverages locality, the relationship between structural and performance similarity, to efficiently extract many pockets of well-performing networks. We showcase our method on an array of search spaces spanning various sizes and datasets. We accentuate the effectiveness of our shrunk spaces when used in one-shot search by achieving the best Top-1 accuracy in two different search spaces. Our method achieves a SOTA Top-1 accuracy of 77.6\% in ImageNet under mobile constraints, best-in-class Kendal-Tau, architectural diversity, and search space size.
Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks
Authors: Valerio Arnaboldi, Mattia Giovanni Campana, Franca Delmastro
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.03126
Pdf link: https://arxiv.org/pdf/2307.03126
Abstract Wi-Fi Direct is a promising technology for the support of device-to-device communications (D2D) on commercial mobile devices. However, the standard as-it-is is not sufficient to support the real deployment of networking solutions entirely based on D2D such as opportunistic networks. In fact, WiFi Direct presents some characteristics that could limit the autonomous creation of D2D connections among users' personal devices. Specifically, the standard explicitly requires the user's authorization to establish a connection between two or more devices, and it provides a limited support for inter-group communication. In some cases, this might lead to the creation of isolated groups of nodes which cannot communicate among each other. In this paper, we propose a novel middleware-layer protocol for the efficient configuration and management of WiFi Direct groups (WiFi Direct Group Manager, WFD-GM) to enable autonomous connections and inter-group communication. This enables opportunistic networks in real conditions (e.g., variable mobility and network size). WFD-GM defines a context function that takes into account heterogeneous parameters for the creation of the best group configuration in a specific time window, including an index of nodes' stability and power levels. We evaluate the protocol performances by simulating three reference scenarios including different mobility models, geographical areas and number of nodes. Simulations are also supported by experimental results related to the evaluation in a real testbed of the involved context parameters. We compare WFD-GM with the state-of-the-art solutions and we show that it performs significantly better than a Baseline approach in scenarios with medium/low mobility, and it is comparable with it in case of high mobility, without introducing additional overhead.
Keyword: pruning

Finding Favourite Tuples on Data Streams with Provably Few Comparisons
Authors: Guangyi Zhang, Nikolaj Tatti, Aristides Gionis
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2307.02946
Pdf link: https://arxiv.org/pdf/2307.02946
Abstract One of the most fundamental tasks in data science is to assist a user with unknown preferences in finding high-utility tuples within a large database. To accurately elicit the unknown user preferences, a widely-adopted way is by asking the user to compare pairs of tuples. In this paper, we study the problem of identifying one or more high-utility tuples by adaptively receiving user input on a minimum number of pairwise comparisons. We devise a single-pass streaming algorithm, which processes each tuple in the stream at most once, while ensuring that the memory size and the number of requested comparisons are in the worst case logarithmic in $n$, where $n$ is the number of all tuples. An important variant of the problem, which can help to reduce human error in comparisons, is to allow users to declare ties when confronted with pairs of tuples of nearly equal utility. We show that the theoretical guarantees of our method can be maintained for this important problem variant. In addition, we show how to enhance existing pruning techniques in the literature by leveraging powerful tools from mathematical programming. Finally, we systematically evaluate all proposed algorithms over both synthetic and real-life datasets, examine their scalability, and demonstrate their superior performance over existing methods.
Pruning vs Quantization: Which is Better?
Authors: Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.02973
Pdf link: https://arxiv.org/pdf/2307.02973
Abstract Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.
Improving Retrieval-Augmented Large Language Models via Data Importance Learning
Authors: Xiaozhong Lyu, Stefan Grafberger, Samantha Biegel, Shaopeng Wei, Meng Cao, Sebastian Schelter, Ce Zhang
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2307.03027
Pdf link: https://arxiv.org/pdf/2307.03027
Abstract Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of retrieved data points. There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function. We further proposed an even more efficient ({\epsilon}, {\delta})-approximation algorithm. Our experimental results illustrate that we can enhance the performance of large language models by only pruning or reweighting the retrieval corpus, without requiring further training. For some tasks, this even allows a small model (e.g., GPT-JT), augmented with a search engine API, to outperform GPT-3.5 (without retrieval augmentation). Moreover, we show that weights based on multilinear extension can be computed efficiently in practice (e.g., in less than ten minutes for a corpus with 100 million elements).
Keyword: diffusion

Diffusion Models for Computational Design at the Example of Floor Plans
Authors: Joern Ploennigs, Markus Berger
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.02511
Pdf link: https://arxiv.org/pdf/2307.02511
Abstract AI Image generators based on diffusion models are widely discussed recently for their capability to create images from simple text prompts. But, for practical use in civil engineering they need to be able to create specific construction plans for given constraints. Within this paper we explore the capabilities of those diffusion-based AI generators for computational design at the example of floor plans and identify their current limitation. We explain how the diffusion-models work and propose new diffusion models with improved semantic encoding. In several experiments we show that we can improve validity of generated floor plans from 6% to 90% and query performance for different examples. We identify short comings and derive future research challenges of those models and discuss the need to combine diffusion models with building information modelling. With this we provide key insights into the current state and future directions for diffusion models in civil engineering.
A design theory for transparency of information privacy practices
Authors: Tobias Dehling, Ali Sunyaev
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2307.02665
Pdf link: https://arxiv.org/pdf/2307.02665
Abstract The rising diffusion of information systems (IS) throughout society poses an increasingly serious threat to privacy as a social value. One approach to alleviating this threat is to establish transparency of information privacy practices (TIPP) so that consumers can better understand how their information is processed. However, the design of transparency artifacts (eg, privacy notices) has clearly not followed this approach, given the ever-increasing volume of information processing. Hence, consumers face a situation where they cannot see the 'forest for the trees' when aiming to ascertain whether information processing meets their privacy expectations. A key problem is that overly comprehensive information presentation results in information overload and is thus counterproductive for establishing TIPP. We depart from the extant design logic of transparency artifacts and develop a theoretical foundation (TIPP theory) for transparency artifact designs useful for establishing TIPP from the perspective of privacy as a social value. We present TIPP theory in two parts to capture the sociotechnical interplay. The first part translates abstract knowledge on the IS artifact and privacy into a description of social subsystems of transparency artifacts, and the second part conveys prescriptive design knowledge in form of a corresponding IS design theory. TIPP theory establishes a bridge from the complexity of the privacy concept to a metadesign for transparency artifacts that is useful to establish TIPP in any IS. In essence, transparency artifacts must accomplish more than offering comprehensive information; they must also be adaptive to the current information needs of consumers.
Applying a Color Palette with Local Control using Diffusion Models
Authors: Vaibhav Vavilala, David Forsyth
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02698
Pdf link: https://arxiv.org/pdf/2307.02698
Abstract We demonstrate two novel editing procedures in the context of fantasy card art. Palette transfer applies a specified reference palette to a given card. For fantasy art, the desired change in palette can be very large, leading to huge changes in the "look" of the art. We demonstrate that a pipeline of vector quantization; matching; and "vector dequantization" (using a diffusion model) produces successful extreme palette transfers. Segment control allows an artist to move one or more image segments, and to optionally specify the desired color of the result. The combination of these two types of edit yields valuable workflows, including: move a segment, then recolor; recolor, then force some segments to take a prescribed color. We demonstrate our methods on the challenging Yu-Gi-Oh card art dataset.
Towards Symmetry-Aware Generation of Periodic Materials
Authors: Youzhi Luo, Chengkai Liu, Shuiwang Ji
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
Arxiv link: https://arxiv.org/abs/2307.02707
Pdf link: https://arxiv.org/pdf/2307.02707
Abstract We consider the problem of generating periodic materials with deep models. While symmetry-aware molecule generation has been studied extensively, periodic materials possess different symmetries, which have not been completely captured by existing methods. In this work, we propose SyMat, a novel material generation approach that can capture physical symmetries of periodic material structures. SyMat generates atom types and lattices of materials through generating atom type sets, lattice lengths and lattice angles with a variational auto-encoder model. In addition, SyMat employs a score-based diffusion model to generate atom coordinates of materials, in which a novel symmetry-aware probabilistic model is used in the coordinate diffusion process. We show that SyMat is theoretically invariant to all symmetry transformations on materials and demonstrate that SyMat achieves promising performance on random generation and property optimization tasks.
Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
Authors: TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest K. Ryu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.02770
Pdf link: https://arxiv.org/pdf/2307.02770
Abstract Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback.
Single Image LDR to HDR Conversion using Conditional Diffusion
Authors: Dwip Dalal, Gautam Vashishtha, Prajwal Singh, Shanmuganathan Raman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.02814
Pdf link: https://arxiv.org/pdf/2307.02814
Abstract Digital imaging aims to replicate realistic scenes, but Low Dynamic Range (LDR) cameras cannot represent the wide dynamic range of real scenes, resulting in under-/overexposed images. This paper presents a deep learning-based approach for recovering intricate details from shadows and highlights while reconstructing High Dynamic Range (HDR) images. We formulate the problem as an image-to-image (I2I) translation task and propose a conditional Denoising Diffusion Probabilistic Model (DDPM) based framework using classifier-free guidance. We incorporate a deep CNN-based autoencoder in our proposed framework to enhance the quality of the latent representation of the input LDR image used for conditioning. Moreover, we introduce a new loss function for LDR-HDR translation tasks, termed Exposure Loss. This loss helps direct gradients in the opposite direction of the saturation, further improving the results' quality. By conducting comprehensive quantitative and qualitative experiments, we have effectively demonstrated the proficiency of our proposed method. The results indicate that a simple conditional diffusion-based method can replace the complex camera pipeline-based architectures.
Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation
Authors: Yuanjing Feng, Lei Xie, Jingqiang Wang, Jianzhong He, Fei Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.02825
Pdf link: https://arxiv.org/pdf/2307.02825
Abstract Tractography traces the peak directions extracted from fiber orientation distribution (FOD) suffering from ambiguous spatial correspondences between diffusion directions and fiber geometry, which is prone to producing erroneous tracks while missing true positive connections. The peaks-based tractography methods 'locally' reconstructed streamlines in 'single to single' manner, thus lacking of global information about the trend of the whole fiber bundle. In this work, we propose a novel tractography method based on a bundle-specific tractogram distribution function by using a higher-order streamline differential equation, which reconstructs the streamline bundles in 'cluster to cluster' manner. A unified framework for any higher-order streamline differential equation is presented to describe the fiber bundles with disjoint streamlines defined based on the diffusion tensor vector field. At the global level, the tractography process is simplified as the estimation of bundle-specific tractogram distribution (BTD) coefficients by minimizing the energy optimization model, and is used to characterize the relations between BTD and diffusion tensor vector under the prior guidance by introducing the tractogram bundle information to provide anatomic priors. Experiments are performed on simulated Hough, Sine, Circle data, ISMRM 2015 Tractography Challenge data, FiberCup data, and in vivo data from the Human Connectome Project (HCP) data for qualitative and quantitative evaluation. The results demonstrate that our approach can reconstruct the complex global fiber bundles directly. BTD reduces the error deviation and accumulation at the local level and shows better results in reconstructing long-range, twisting, and large fanning tracts.
A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task
Authors: Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02862
Pdf link: https://arxiv.org/pdf/2307.02862
Abstract In recent years large model trained on huge amount of cross-modality data, which is usually be termed as foundation model, achieves conspicuous accomplishment in many fields, such as image recognition and generation. Though achieving great success in their original application case, it is still unclear whether those foundation models can be applied to other different downstream tasks. In this paper, we conduct a short survey on the current methods for discriminative dense recognition tasks, which are built on the pretrained foundation model. And we also provide some preliminary experimental analysis of an existing open-vocabulary segmentation method based on Stable Diffusion, which indicates the current way of deploying diffusion model for segmentation is not optimal. This aims to provide insights for future research on adopting foundation model for downstream task.
MomentDiff: Generative Video Moment Retrieval from Random to Real
Authors: Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02869
Pdf link: https://arxiv.org/pdf/2307.02869
Abstract Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description. To achieve this goal, we provide a generative diffusion-based framework called MomentDiff, which simulates a typical human retrieval process from random browsing to gradual localization. Specifically, we first diffuse the real span to random noise, and learn to denoise the random noise to the original span with the guidance of similarity between text and video. This allows the model to learn a mapping from arbitrary random locations to real moments, enabling the ability to locate segments from random initialization. Once trained, MomentDiff could sample random temporal segments as initial guesses and iteratively refine them to generate an accurate temporal boundary. Different from discriminative works (e.g., based on learnable proposals or queries), MomentDiff with random initialized spans could resist the temporal location biases from datasets. To evaluate the influence of the temporal location biases, we propose two anti-bias datasets with location distribution shifts, named Charades-STA-Len and Charades-STA-Mom. The experimental results demonstrate that our efficient framework consistently outperforms state-of-the-art methods on three public benchmarks, and exhibits better generalization and robustness on the proposed anti-bias datasets. The code, model, and anti-bias evaluation datasets are available at https://github.com/IMCCretrieval/MomentDiff.
Probabilistic and Semantic Descriptions of Image Manifolds and Their Applications
Authors: Peter Tu, Zhaoyuan Yang, Richard Hartley, Zhiwei Xu, Jing Zhang, Dylan Campbell, Jaskirat Singh, Tianyu Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02881
Pdf link: https://arxiv.org/pdf/2307.02881
Abstract This paper begins with a description of methods for estimating probability density functions for images that reflects the observation that such data is usually constrained to lie in restricted regions of the high-dimensional image space - not every pattern of pixels is an image. It is common to say that images lie on a lower-dimensional manifold in the high-dimensional space. However, although images may lie on such lower-dimensional manifolds, it is not the case that all points on the manifold have an equal probability of being images. Images are unevenly distributed on the manifold, and our task is to devise ways to model this distribution as a probability distribution. In pursuing this goal, we consider generative models that are popular in AI and computer vision community. For our purposes, generative/probabilistic models should have the properties of 1) sample generation: it should be possible to sample from this distribution according to the modelled density function, and 2) probability computation: given a previously unseen sample from the dataset of interest, one should be able to compute the probability of the sample, at least up to a normalising constant. To this end, we investigate the use of methods such as normalising flow and diffusion models. We then show that such probabilistic descriptions can be used to construct defences against adversarial attacks. In addition to describing the manifold in terms of density, we also consider how semantic interpretations can be used to describe points on the manifold. To this end, we consider an emergent language framework which makes use of variational encoders to produce a disentangled representation of points that reside on a given manifold. Trajectories between points on a manifold can then be described in terms of evolving semantic descriptions.
Numerical Methods with Coordinate Transforms for Efficient Brownian Dynamics Simulations
Authors: Dominic Phillips, Charles Matthews, Benedict Leimkuhler
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.02913
Pdf link: https://arxiv.org/pdf/2307.02913
Abstract Many stochastic processes in the physical and biological sciences can be modelled using Brownian dynamics with multiplicative noise. However, numerical integrators for these processes can lose accuracy or even fail to converge when the diffusion term is configuration-dependent. One remedy is to construct a transform to a constant-diffusion process and sample the transformed process instead. In this work, we explain how coordinate-based and time-rescaling-based transforms can be used either individually or in combination to map a general class of variable-diffusion Brownian motion processes into constant-diffusion ones. The transforms are invertible, thus allowing recovery of the original dynamics. We motivate our methodology using examples in one dimension before then considering multivariate diffusion processes. We illustrate the benefits of the transforms through numerical simulations, demonstrating how the right combination of integrator and transform can improve computational efficiency and the order of convergence to the invariant distribution. Notably, the transforms that we derive are applicable to a class of multibody, anisotropic Stokes-Einstein diffusion that has applications in biophysical modelling.
Global q-superlinear convergence of the infinite-dimensional Newton's method for the regularized p-Stokes equations
Authors: Niko Schmidt
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.02930
Pdf link: https://arxiv.org/pdf/2307.02930
Abstract The motion of glaciers can be simulated with the p-Stokes equations. We present an algorithm that solves these equations faster than the Picard iteration. We do that by proving q-superlinear global convergence of the infinite-dimensional Newton's method with Armijo step sizes to the solution of these equations. We only have to add an arbitrarily small diffusion term for this convergence result. We also consider approximations of exact step sizes. Exact step sizes are possible because we reformulate the problem as minimizing a convex functional. Next, we prove that the additional diffusion term only causes minor differences in the solution compared to the original p-Stokes equations. Finally, we test our algorithms on a reformulation of the experiment ISMIP-HOM B. The approximation of exact step sizes for the Picard iteration and Newton's method is superior in the experiment compared to the Picard iteration. Also, Newton's method with Armijo step sizes converges faster than the Picard iteration. However, the reached accuracy of Newton's method with Armijo step sizes depends more on the resolution of the domain.
On the Cultural Gap in Text-to-Image Generation
Authors: Bingshuai Liu, Longyue Wang, Chenyang Lyu, Yong Zhang, Jinsong Su, Shuming Shi, Zhaopeng Tu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.02971
Pdf link: https://arxiv.org/pdf/2307.02971
Abstract One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-tuning data in the target culture, which is used to fine-tune a T2I model to improve cross-cultural generation. Experimental results show that our multi-modal metric provides stronger data selection performance on the C3 benchmark than existing metrics, in which the object-text alignment is crucial. We release the benchmark, data, code, and generated images to facilitate future research on culturally diverse T2I generation (https://github.com/longyuewangdcu/C3-Bench).
A computational framework for pharmaco-mechanical interactions in arterial walls using parallel monolithic domain decomposition methods
Authors: Daniel Balzani, Alexander Heinlein, Axel Klawonn, Jascha Knepper, Sharan Nurani Ramesh, Oliver Rheinbach, Lea Sassmannshausen, Klemens Uhlmann
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.02972
Pdf link: https://arxiv.org/pdf/2307.02972
Abstract A computational framework is presented to numerically simulate the effects of antihypertensive drugs, in particular calcium channel blockers, on the mechanical response of arterial walls. A stretch-dependent smooth muscle model by Uhlmann and Balzani is modified to describe the interaction of pharmacological drugs and the inhibition of smooth muscle activation. The coupled deformation-diffusion problem is then solved using the finite element software FEDDLib and overlapping Schwarz preconditioners from the Trilinos package FROSch. These preconditioners include highly scalable parallel GDSW (generalized Dryja-Smith-Widlund) and RDSW (reduced GDSW) preconditioners. Simulation results show the expected increase in the lumen diameter of an idealized artery due to the drug-induced reduction of smooth muscle contraction, as well as a decrease in the rate of arterial contraction in the presence of calcium channel blockers. Strong and weak parallel scalability of the resulting computational implementation are also analyzed.
Origin-Destination Travel Time Oracle for Map-based Services
Authors: Yan Lin, Huaiyu Wan, Jilin Hu, Shengnan Guo, Bin Yang, Youfang Lin, Christian S. Jensen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.03048
Pdf link: https://arxiv.org/pdf/2307.03048
Abstract Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle~(ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.
How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models
Authors: Zhenting Wang, Chen Chen, Yuchen Liu, Lingjuan Lyu, Dimitris Metaxas, Shiqing Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.03108
Pdf link: https://arxiv.org/pdf/2307.03108
Abstract Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized usage of data during the training process. One example is when a model trainer collects a set of images created by a particular artist and attempts to train a model capable of generating similar images without obtaining permission from the artist. To address this issue, it becomes crucial to detect unauthorized data usage. In this paper, we propose a method for detecting such unauthorized data usage by planting injected memorization into the text-to-image diffusion models trained on the protected dataset. Specifically, we modify the protected image dataset by adding unique contents on the images such as stealthy image wrapping functions that are imperceptible to human vision but can be captured and memorized by diffusion models. By analyzing whether the model has memorization for the injected content (i.e., whether the generated images are processed by the chosen post-processing function), we can detect models that had illegally utilized the unauthorized data. Our experiments conducted on Stable Diffusion and LoRA model demonstrate the effectiveness of the proposed method in detecting unauthorized data usages.
IPO-LDM: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model
Authors: Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.03177
Pdf link: https://arxiv.org/pdf/2307.03177
Abstract Generating complete 360-degree panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360-degree indoor RGB panorama outpainting model using latent diffusion models (LDM), called IPO-LDM. We introduce a new bi-modal latent diffusion structure that utilizes both RGB and depth panoramic data during training, but works surprisingly well to outpaint normal depth-free RGB images during inference. We further propose a novel technique of introducing progressive camera rotations during each diffusion denoising step, which leads to substantial improvement in achieving panorama wraparound consistency. Results show that our IPO-LDM not only significantly outperforms state-of-the-art methods on RGB panorama outpainting, but can also produce multiple and diverse well-structured results for different types of masks.
Keyword: adaptive

STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting
Authors: Lincan Li, Kaixiang Yang, Fengji Luo, Jichao Bi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.02507
Pdf link: https://arxiv.org/pdf/2307.02507
Abstract Efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains to be a challenging task. In considering of the dilemma, this work employs the advanced contrastive learning and proposes a novel Spatial-Temporal Synchronous Contextual Contrastive Learning (STS-CCL) model. First, we elaborate the basic and strong augmentation methods for spatiotemporal graph data, which not only perturb the data in terms of graph structure and temporal characteristics, but also employ a learning-based dynamic graph view generator for adaptive augmentation. Second, we introduce a Spatial-Temporal Synchronous Contrastive Module (STS-CM) to simultaneously capture the decent spatial-temporal dependencies and realize graph-level contrasting. To further discriminate node individuals in negative filtering, a Semantic Contextual Contrastive method is designed based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering. Finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. Extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. The proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issue.
TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers
Authors: Alan John Varghese, Aniruddha Bora, Mengjia Xu, George Em Karniadakis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2307.02588
Pdf link: https://arxiv.org/pdf/2307.02588
Abstract Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state ($t$) and previous context (over timestamps [$t-1, t-l$], $l$ is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp $t$. We consider diverse benchmarks with varying levels of ``novelty" as measured by the TEA plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.
FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout
Authors: Irene Wang, Prashant J. Nair, Divya Mahajan
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.02623
Pdf link: https://arxiv.org/pdf/2307.02623
Abstract Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach.
Convergence of Communications, Control, and Machine Learning for Secure and Autonomous Vehicle Navigation
Authors: Tengchan Zeng, Aidin Ferdowsi, Omid Semiari, Walid Saad, Choong Seon Hong
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.02663
Pdf link: https://arxiv.org/pdf/2307.02663
Abstract Connected and autonomous vehicles (CAVs) can reduce human errors in traffic accidents, increase road efficiency, and execute various tasks ranging from delivery to smart city surveillance. Reaping these benefits requires CAVs to autonomously navigate to target destinations. To this end, each CAV's navigation controller must leverage the information collected by sensors and wireless systems for decision-making on longitudinal and lateral movements. However, enabling autonomous navigation for CAVs requires a convergent integration of communication, control, and learning systems. The goal of this article is to explicitly expose the challenges related to this convergence and propose solutions to address them in two major use cases: Uncoordinated and coordinated CAVs. In particular, challenges related to the navigation of uncoordinated CAVs include stable path tracking, robust control against cyber-physical attacks, and adaptive navigation controller design. Meanwhile, when multiple CAVs coordinate their movements during navigation, fundamental problems such as stable formation, fast collaborative learning, and distributed intrusion detection are analyzed. For both cases, solutions using the convergence of communication theory, control theory, and machine learning are proposed to enable effective and secure CAV navigation. Preliminary simulation results are provided to show the merits of proposed solutions.
A design theory for transparency of information privacy practices
Authors: Tobias Dehling, Ali Sunyaev
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2307.02665
Pdf link: https://arxiv.org/pdf/2307.02665
Abstract The rising diffusion of information systems (IS) throughout society poses an increasingly serious threat to privacy as a social value. One approach to alleviating this threat is to establish transparency of information privacy practices (TIPP) so that consumers can better understand how their information is processed. However, the design of transparency artifacts (eg, privacy notices) has clearly not followed this approach, given the ever-increasing volume of information processing. Hence, consumers face a situation where they cannot see the 'forest for the trees' when aiming to ascertain whether information processing meets their privacy expectations. A key problem is that overly comprehensive information presentation results in information overload and is thus counterproductive for establishing TIPP. We depart from the extant design logic of transparency artifacts and develop a theoretical foundation (TIPP theory) for transparency artifact designs useful for establishing TIPP from the perspective of privacy as a social value. We present TIPP theory in two parts to capture the sociotechnical interplay. The first part translates abstract knowledge on the IS artifact and privacy into a description of social subsystems of transparency artifacts, and the second part conveys prescriptive design knowledge in form of a corresponding IS design theory. TIPP theory establishes a bridge from the complexity of the privacy concept to a metadesign for transparency artifacts that is useful to establish TIPP in any IS. In essence, transparency artifacts must accomplish more than offering comprehensive information; they must also be adaptive to the current information needs of consumers.
CFSum: A Coarse-to-Fine Contribution Network for Multimodal Summarization
Authors: Min Xiao, Junnan Zhu, Haitao Lin, Yu Zhou, Chengqing Zong
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02716
Pdf link: https://arxiv.org/pdf/2307.02716
Abstract Multimodal summarization usually suffers from the problem that the contribution of the visual modality is unclear. Existing multimodal summarization approaches focus on designing the fusion methods of different modalities, while ignoring the adaptive conditions under which visual modalities are useful. Therefore, we propose a novel Coarse-to-Fine contribution network for multimodal Summarization (CFSum) to consider different contributions of images for summarization. First, to eliminate the interference of useless images, we propose a pre-filter module to abandon useless images. Second, to make accurate use of useful images, we propose two levels of visual complement modules, word level and phrase level. Specifically, image contributions are calculated and are adopted to guide the attention of both textual and visual modalities. Experimental results have shown that CFSum significantly outperforms multiple strong baselines on the standard benchmark. Furthermore, the analysis verifies that useful images can even help generate non-visual words which are implicitly represented in the image.
When Does Confidence-Based Cascade Deferral Suffice?
Authors: Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.02764
Pdf link: https://arxiv.org/pdf/2307.02764
Abstract Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.
Semi-supervised Domain Adaptive Medical Image Segmentation through Consistency Regularized Disentangled Contrastive Learning
Authors: Hritam Basak, Zhaozheng Yin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.02798
Pdf link: https://arxiv.org/pdf/2307.02798
Abstract Although unsupervised domain adaptation (UDA) is a promising direction to alleviate domain shift, they fall short of their supervised counterparts. In this work, we investigate relatively less explored semi-supervised domain adaptation (SSDA) for medical image segmentation, where access to a few labeled target samples can improve the adaptation performance substantially. Specifically, we propose a two-stage training process. First, an encoder is pre-trained in a self-learning paradigm using a novel domain-content disentangled contrastive learning (CL) along with a pixel-level feature consistency constraint. The proposed CL enforces the encoder to learn discriminative content-specific but domain-invariant semantics on a global scale from the source and target images, whereas consistency regularization enforces the mining of local pixel-level information by maintaining spatial sensitivity. This pre-trained encoder, along with a decoder, is further fine-tuned for the downstream task, (i.e. pixel-level segmentation) using a semi-supervised setting. Furthermore, we experimentally validate that our proposed method can easily be extended for UDA settings, adding to the superiority of the proposed strategy. Upon evaluation on two domain adaptive image segmentation tasks, our proposed method outperforms the SoTA methods, both in SSDA and UDA settings. Code is available at https://github.com/hritam-98/GFDA-disentangled
Enhancing LLM with Evolutionary Fine Tuning for News Summary Generation
Authors: Le Xiao, Xiaolin Chen
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.02839
Pdf link: https://arxiv.org/pdf/2307.02839
Abstract News summary generation is an important task in the field of intelligence analysis, which can provide accurate and comprehensive information to help people better understand and respond to complex real-world events. However, traditional news summary generation methods face some challenges, which are limited by the model itself and the amount of training data, as well as the influence of text noise, making it difficult to generate reliable information accurately. In this paper, we propose a new paradigm for news summary generation using LLM with powerful natural language understanding and generative capabilities. We use LLM to extract multiple structured event patterns from the events contained in news paragraphs, evolve the event pattern population with genetic algorithm, and select the most adaptive event pattern to input into the LLM to generate news summaries. A News Summary Generator (NSG) is designed to select and evolve the event pattern populations and generate news summaries. The experimental results show that the news summary generator is able to generate accurate and reliable news summaries with some generalization ability.
In Time and Space: Towards Usable Adaptive Control for Assistive Robotic Arms
Authors: Max Pascher, Kirill Kronhardt, Felix Ferdinand Goldau, Udo Frese, Jens Gerken
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.02933
Pdf link: https://arxiv.org/pdf/2307.02933
Abstract Robotic solutions, in particular robotic arms, are becoming more frequently deployed for close collaboration with humans, for example in manufacturing or domestic care environments. These robotic arms require the user to control several Degrees-of-Freedom (DoFs) to perform tasks, primarily involving grasping and manipulating objects. Standard input devices predominantly have two DoFs, requiring time-consuming and cognitively demanding mode switches to select individual DoFs. Contemporary Adaptive DoF Mapping Controls (ADMCs) have shown to decrease the necessary number of mode switches but were up to now not able to significantly reduce the perceived workload. Users still bear the mental workload of incorporating abstract mode switching into their workflow. We address this by providing feed-forward multimodal feedback using updated recommendations of ADMC, allowing users to visually compare the current and the suggested mapping in real-time. We contrast the effectiveness of two new approaches that a) continuously recommend updated DoF combinations or b) use discrete thresholds between current robot movements and new recommendations. Both are compared in a Virtual Reality (VR) in-person study against a classic control method. Significant results for lowered task completion time, fewer mode switches, and reduced perceived workload conclusively establish that in combination with feedforward, ADMC methods can indeed outperform classic mode switching. A lack of apparent quantitative differences between Continuous and Threshold reveals the importance of user-centered customization options. Including these implications in the development process will improve usability, which is essential for successfully implementing robotic technologies with high user acceptance.
Finding Favourite Tuples on Data Streams with Provably Few Comparisons
Authors: Guangyi Zhang, Nikolaj Tatti, Aristides Gionis
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2307.02946
Pdf link: https://arxiv.org/pdf/2307.02946
Abstract One of the most fundamental tasks in data science is to assist a user with unknown preferences in finding high-utility tuples within a large database. To accurately elicit the unknown user preferences, a widely-adopted way is by asking the user to compare pairs of tuples. In this paper, we study the problem of identifying one or more high-utility tuples by adaptively receiving user input on a minimum number of pairwise comparisons. We devise a single-pass streaming algorithm, which processes each tuple in the stream at most once, while ensuring that the memory size and the number of requested comparisons are in the worst case logarithmic in $n$, where $n$ is the number of all tuples. An important variant of the problem, which can help to reduce human error in comparisons, is to allow users to declare ties when confronted with pairs of tuples of nearly equal utility. We show that the theoretical guarantees of our method can be maintained for this important problem variant. In addition, we show how to enhance existing pruning techniques in the literature by leveraging powerful tools from mathematical programming. Finally, we systematically evaluate all proposed algorithms over both synthetic and real-life datasets, examine their scalability, and demonstrate their superior performance over existing methods.
Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution
Authors: Yuting Lu, Lingtong Min, Binglu Wang, Le Zheng, Xiaoxu Wang, Yongqiang Zhao, Teng Long
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02974
Pdf link: https://arxiv.org/pdf/2307.02974
Abstract Remote sensing image super-resolution (RSISR) plays a vital role in enhancing spatial detials and improving the quality of satellite imagery. Recently, Transformer-based models have shown competitive performance in RSISR. To mitigate the quadratic computational complexity resulting from global self-attention, various methods constrain attention to a local window, enhancing its efficiency. Consequently, the receptive fields in a single attention layer are inadequate, leading to insufficient context modeling. Furthermore, while most transform-based approaches reuse shallow features through skip connections, relying solely on these connections treats shallow and deep features equally, impeding the model's ability to characterize them. To address these issues, we propose a novel transformer architecture called Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) for RSISR. Our proposed model effectively enhances global cognition and understanding of the entire image, facilitating efficient integration of features cross-stages. The model incorporates cross-spatial pixel integration attention (CSPIA) to introduce contextual information into a local window, while cross-stage feature fusion attention (CSFFA) adaptively fuses features from the previous stage to improve feature expression in line with the requirements of the current stage. We conducted comprehensive experiments on multiple benchmark datasets, demonstrating the superior performance of our proposed SPIFFNet in terms of both quantitative metrics and visual quality when compared to state-of-the-art methods.
Keyword: quantization

Applying a Color Palette with Local Control using Diffusion Models
Authors: Vaibhav Vavilala, David Forsyth
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.02698
Pdf link: https://arxiv.org/pdf/2307.02698
Abstract We demonstrate two novel editing procedures in the context of fantasy card art. Palette transfer applies a specified reference palette to a given card. For fantasy art, the desired change in palette can be very large, leading to huge changes in the "look" of the art. We demonstrate that a pipeline of vector quantization; matching; and "vector dequantization" (using a diffusion model) produces successful extreme palette transfers. Segment control allows an artist to move one or more image segments, and to optionally specify the desired color of the result. The combination of these two types of edit yields valuable workflows, including: move a segment, then recolor; recolor, then force some segments to take a prescribed color. We demonstrate our methods on the challenging Yu-Gi-Oh card art dataset.
Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge
Authors: Georg Rutishauser, Francesco Conti, Luca Benini
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.02894
Pdf link: https://arxiv.org/pdf/2307.02894
Abstract Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.
Pruning vs Quantization: Which is Better?
Authors: Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.02973
Pdf link: https://arxiv.org/pdf/2307.02973
Abstract Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.

A-suozhang / GetArxivDaily

New submissions for Fri, 7 Jul 23 #96

Keyword: efficient

STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

Secure-by-Construction Synthesis for Control Systems

Successful Combination of Database Search and Snowballing for Identification of Primary Studies in Systematic Literature Studies

Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition

Only Pick Once -- Multi-Object Picking Algorithms for Picking Exact Number of Objects Efficiently

Scaling In-Context Demonstrations with Structured Attention

Incremental Nonlinear Dynamic Inversion based Optical Flow Control for Flying Robots: An Efficient Data-driven Approach

TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations

On efficient linear and fully decoupled finite difference method for wormhole propagation with heat transmission process on staggered grids

Text Alignment Is An Efficient Unified Model for Massive NLP Tasks

Dynamic Multi-time Scale User Admission and Resource Allocation for Semantic Extraction in MEC Systems

Large Language Models Empowered Autonomous Edge AI for Connected Intelligence

Shortest Beer Path Queries based on Graph Decomposition

What Should Data Science Education Do with Large Language Models?

Evaluating raw waveforms with deep learning frameworks for speech emotion recognition

Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation

Generative Zero-Shot Prompt Learning for Cross-Domain Slot Filling with Inverse Prompting

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

TDLE: 2-D LiDAR Exploration With Hierarchical Planning Using Regional Division

Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization

Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain

MomentDiff: Generative Video Moment Retrieval from Random to Real

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

A Simple $(1-ε)$-Approximation Semi-Streaming Algorithm for Maximum (Weighted) Matching

DPM: Clustering Sensitive Data through Separation

Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution

Efficient Semiring-Weighted Earley Parsing

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

Lyapunov function search method for analysis of nonlinear systems stability using genetic algorithm

Art Authentication with Vision Transformers

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

Origin-Destination Travel Time Oracle for Map-based Services

Generalizing Backpropagation for Gradient-Based Interpretability

Learning Constrained Corner Node Trajectories of a Tether Net System for Space Debris Capture

Querying Data Exchange Settings Beyond Positive Queries

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

Efficient Domain Adaptation of Sentence Embeddings using Adapters

LISSNAS: Locality-based Iterative Search Space Shrinkage for Neural Architecture Search

JSONoid: Monoid-based Enrichment for Configurable and Scalable Data-Driven Schema Discovery

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks

VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering

Risk-Averse Trajectory Optimization via Sample Average Approximation

LEO: Learning Efficient Orderings for Multiobjective Binary Decision Diagrams

Markov Persuasion Processes with Endogenous Agent Beliefs

Keyword: faster

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization

Global q-superlinear convergence of the infinite-dimensional Newton's method for the regularized p-Stokes equations

EffLiFe: Efficient Light Field Generation via Hierarchical Sparse Gradient Descent

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

LEO: Learning Efficient Orderings for Multiobjective Binary Decision Diagrams

Keyword: mobile

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review

FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout

Computing Offloading and Semantic Compression for Intelligent Computing Tasks in MEC Systems

Dynamic Multi-time Scale User Admission and Resource Allocation for Semantic Extraction in MEC Systems

Deep Ensemble Learning with Frame Skipping for Face Anti-Spoofing

Towards accurate instance segmentation in large-scale LiDAR point clouds

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Transfer Learning for the Efficient Detection of COVID-19 from Smartphone Audio Data

UAV Swarms for Joint Data Ferrying and Dynamic Cell Coverage via Optimal Transport Descent and Quadratic Assignment

Role Engine Implementation for a Continuous and Collaborative Multi-Robot System

LISSNAS: Locality-based Iterative Search Space Shrinkage for Neural Architecture Search

Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks

Keyword: pruning

Finding Favourite Tuples on Data Streams with Provably Few Comparisons

Pruning vs Quantization: Which is Better?

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

Keyword: diffusion

Diffusion Models for Computational Design at the Example of Floor Plans

A design theory for transparency of information privacy practices

Applying a Color Palette with Local Control using Diffusion Models

Towards Symmetry-Aware Generation of Periodic Materials

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback