New submissions for Wed, 3 May 23

Keyword: efficient

Two-phase Dual COPOD Method for Anomaly Detection in Industrial Control System

Authors: Emmanuel Aboah Boateng, Jerry Bruce
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00982
Pdf link: https://arxiv.org/pdf/2305.00982
Abstract Critical infrastructures like water treatment facilities and power plants depend on industrial control systems (ICS) for monitoring and control, making them vulnerable to cyber attacks and system malfunctions. Traditional ICS anomaly detection methods lack transparency and interpretability, which make it difficult for practitioners to understand and trust the results. This paper proposes a two-phase dual Copula-based Outlier Detection (COPOD) method that addresses these challenges. The first phase removes unwanted outliers using an empirical cumulative distribution algorithm, and the second phase develops two parallel COPOD models based on the output data of phase 1. The method is based on empirical distribution functions, parameter-free, and provides interpretability by quantifying each feature's contribution to an anomaly. The method is also computationally and memory-efficient, suitable for low- and high-dimensional datasets. Experimental results demonstrate superior performance in terms of F1-score and recall on three open-source ICS datasets, enabling real-time ICS anomaly detection.
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
Authors: Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2305.01024
Pdf link: https://arxiv.org/pdf/2305.01024
Abstract General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While researchers often strive for faster performance by using large compute platforms, the increased scale of these systems can raise concerns about hardware and software reliability. In this paper, we present a design for a high-performance GEMM with algorithm-based fault tolerance for use on GPUs. We describe fault-tolerant designs for GEMM at the thread, warp, and threadblock levels, and also provide a baseline GEMM implementation that is competitive with or faster than the state-of-the-art, proprietary cuBLAS GEMM. We present a kernel fusion strategy to overlap and mitigate the memory latency due to fault tolerance with the original GEMM computation. To support a wide range of input matrix shapes and reduce development costs, we present a template-based approach for automatic code generation for both fault-tolerant and non-fault-tolerant GEMM implementations. We evaluate our work on NVIDIA Tesla T4 and A100 server GPUs. Experimental results demonstrate that our baseline GEMM presents comparable or superior performance compared to the closed-source cuBLAS. The fault-tolerant GEMM incurs only a minimal overhead (8.89\% on average) compared to cuBLAS even with hundreds of errors injected per minute. For irregularly shaped inputs, the code generator-generated kernels show remarkable speedups of $160\% \sim 183.5\%$ and $148.55\% \sim 165.12\%$ for fault-tolerant and non-fault-tolerant GEMMs, outperforming cuBLAS by up to $41.40\%$.
Hardware implementation of digital memcomputing on small-size FPGAs
Authors: Dyk Chung Nguyen, Yuan-Hang Zhang, Massimiliano Di Ventra, Yuriy V. Pershin
Subjects: Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2305.01061
Pdf link: https://arxiv.org/pdf/2305.01061
Abstract Memcomputing is a novel computing paradigm beyond the von-Neumann one. Its digital version is designed for the efficient solution of combinatorial optimization problems, which emerge in various fields of science and technology. Previously, the performance of digital memcomputing machines (DMMs) was demonstrated using software simulations of their ordinary differential equations. Here, we present the first hardware realization of a DMM algorithm on a low-cost FPGA board. In this demonstration, we have implemented a Boolean satisfiability problem solver. To optimize the use of hardware resources, the algorithm was partially parallelized. The scalability of the present implementation is explored and our FPGA-based results are compared to those obtained using a python code running on a traditional (von-Neumann) computer, showing one to two orders of magnitude speed-up in time to solution. This initial small-scale implementation is projected to state-of-the-art FPGA boards anticipating further advantages of the hardware realization of DMMs over their software emulation.
Robust Communication Complexity of Matching: EDCS Achieves 5/6 Approximation
Authors: Amir Azarmehr, Soheil Behnezhad
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.01070
Pdf link: https://arxiv.org/pdf/2305.01070
Abstract We study the robust communication complexity of maximum matching. Edges of an arbitrary $n$-vertex graph $G$ are randomly partitioned between Alice and Bob independently and uniformly. Alice has to send a single message to Bob such that Bob can find an (approximate) maximum matching of the whole graph $G$. We specifically study the best approximation ratio achievable via protocols where Alice communicates only $\widetilde{O}(n)$ bits to Bob. There has been a growing interest on the robust communication model due to its connections to the random-order streaming model. An algorithm of Assadi and Behnezhad [ICALP'21] implies a $(2/3+\epsilon_0 \sim .667)$-approximation for a small constant $0 < \epsilon_0 < 10^{-18}$, which remains the best-known approximation for general graphs. For bipartite graphs, Assadi and Behnezhad [Random'21] improved the approximation to .716 albeit with a computationally inefficient (i.e., exponential time) protocol. In this paper, we study a natural and efficient protocol implied by a random-order streaming algorithm of Bernstein [ICALP'20] which is based on edge-degree constrained subgraphs (EDCS) [Bernstein and Stein; ICALP'15]. The result of Bernstein immediately implies that this protocol achieves an (almost) $(2/3 \sim .666)$-approximation in the robust communication model. We present a new analysis, proving that it achieves a much better (almost) $(5/6 \sim .833)$-approximation. This significantly improves previous approximations both for general and bipartite graphs. We also prove that our analysis of Bernstein's protocol is tight.
Fast Path Planning Through Large Collections of Safe Boxes
Authors: Tobia Marcucci, Parth Nobel, Russ Tedrake, Stephen Boyd
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.01072
Pdf link: https://arxiv.org/pdf/2305.01072
Abstract We present a fast algorithm for the design of smooth paths (or trajectories) that are constrained to lie in a collection of axis-aligned boxes. We consider the case where the number of these safe boxes is large, and basic preprocessing of them (such as finding their intersections) can be done offline. At runtime we quickly generate a smooth path between given initial and terminal positions. Our algorithm designs trajectories that are guaranteed to be safe at all times, and it detects infeasibility whenever such a trajectory does not exist. Our algorithm is based on two subproblems that we can solve very efficiently: finding a shortest path in a weighted graph, and solving (multiple) convex optimal control problems. We demonstrate the proposed path planner on large-scale numerical examples, and we provide an efficient open-source software implementation, fastpathplanning.
Design and Evaluation of a Bioinspired Tendon-Driven 3D-Printed Robotic Eye with Active Vision Capabilities
Authors: Hamid Osooli, Mohsen Irani Rahaghi, S. Reza Ahmadzadeh
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.01076
Pdf link: https://arxiv.org/pdf/2305.01076
Abstract The field of robotics has seen significant advancements in recent years, particularly in the development of humanoid robots. One area of research that has yet to be fully explored is the design of robotic eyes. In this paper, we propose a computer-aided 3D design scheme for a robotic eye that incorporates realistic appearance, natural movements, and efficient actuation. The proposed design utilizes a tendon-driven actuation mechanism, which offers a broad range of motion capabilities. The use of the minimum number of servos for actuation, one for each agonist-antagonist pair of muscles, makes the proposed design highly efficient. Compared to existing ones in the same class, our designed robotic eye comprises aesthetic and realistic features. We evaluate the robot's performance using a vision-based controller, which demonstrates the effectiveness of the proposed design in achieving natural movement, and efficient actuation. The experiment code, toolbox, and printable 3D sketches of our design have been open-sourced.
An Update-intensive LSM-based R-tree Index
Authors: Jaewoo Shin, Jianguo Wang, Walid G. Aref
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.01087
Pdf link: https://arxiv.org/pdf/2305.01087
Abstract Many applications require update-intensive workloads on spatial objects, e.g., social-network services and shared-riding services that track moving objects. By buffering insert and delete operations in memory, the Log Structured Merge Tree (LSM) has been used widely in various systems because of its ability to handle write-heavy workloads. While the focus on LSM has been on key-value stores and their optimizations, there is a need to study how to efficiently support LSM-based {\em secondary} indexes (e.g., location-based indexes) as modern, heterogeneous data necessitates the use of secondary indexes. In this paper, we investigate the augmentation of a main-memory-based memo structure into an LSM secondary index structure to handle update-intensive workloads efficiently. We conduct this study in the context of an R-tree-based secondary index. In particular, we introduce the LSM RUM-tree that demonstrates the use of an Update Memo in an LSM-based R-tree to enhance the performance of the R-tree's insert, delete, update, and search operations. The LSM RUM-tree introduces new strategies to control the size of the Update Memo to make sure it always fits in memory for high performance. The Update Memo is a light-weight in-memory structure that is suitable for handling update-intensive workloads without introducing significant overhead. Experimental results using real spatial data demonstrate that the LSM RUM-tree achieves up to 9.6x speedup on update operations and up to 2400x speedup on query processing over existing LSM R-tree implementations.
RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models
Authors: Dave Van Veen, Cara Van Uden, Maayane Attias, Anuj Pareek, Christian Bluethgen, Malgorzata Polacin, Wah Chiu, Jean-Benoit Delbrouck, Juan Manuel Zambrano Chaves, Curtis P. Langlotz, Akshay S. Chaudhari, John Pauly
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.01146
Pdf link: https://arxiv.org/pdf/2305.01146
Abstract We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, and clinical text) and via prompting (zero-shot, in-context learning) or parameter-efficient fine-tuning (prefix tuning, LoRA). Our results on the MIMIC-III dataset consistently demonstrate best performance by maximally adapting to the task via pretraining on clinical text and parameter-efficient fine-tuning on RRS examples. Importantly, this method fine-tunes a mere 0.32% of parameters throughout the model, in contrast to end-to-end fine-tuning (100% of parameters). Additionally, we study the effect of in-context examples and out-of-distribution (OOD) training before concluding with a radiologist reader study and qualitative analysis. Our findings highlight the importance of domain adaptation in RRS and provide valuable insights toward developing effective natural language processing solutions for clinical tasks.
Unbounded Differentially Private Quantile and Maximum Estimation
Authors: David Durfee
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.01177
Pdf link: https://arxiv.org/pdf/2305.01177
Abstract In this work we consider the problem of differentially private computation of quantiles for the data, especially the highest quantiles such as maximum, but with an unbounded range for the dataset. We show that this can be done efficiently through a simple invocation of $\texttt{AboveThreshold}$, a subroutine that is iteratively called in the fundamental Sparse Vector Technique, even when there is no upper bound on the data. In particular, we show that this procedure can give more accurate and robust estimates on the highest quantiles with applications towards clipping that is essential for differentially private sum and mean estimation. In addition, we show how two invocations can handle the fully unbounded data setting. Within our study, we show that an improved analysis of $\texttt{AboveThreshold}$ can improve the privacy guarantees for the widely used Sparse Vector Technique that is of independent interest. We give a more general characterization of privacy loss for $\texttt{AboveThreshold}$ which we immediately apply to our method for improved privacy guarantees. Our algorithm only requires one $O(n)$ pass through the data, which can be unsorted, and each subsequent query takes $O(1)$ time. We empirically compare our unbounded algorithm with the state-of-the-art algorithms in the bounded setting. For inner quantiles, we find that our method often performs better on non-synthetic datasets. For the maximal quantiles, which we apply to differentially private sum computation, we find that our method performs significantly better.
LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar
Authors: Yuelang Xu, Hongwen Zhang, Lizhen Wang, Xiaochen Zhao, Han Huang, Guojun Qi, Yebin Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.01190
Pdf link: https://arxiv.org/pdf/2305.01190
Abstract Existing approaches to animatable NeRF-based head avatars are either built upon face templates or use the expression coefficients of templates as the driving signal. Despite the promising progress, their performances are heavily bound by the expression power and the tracking accuracy of the templates. In this work, we present LatentAvatar, an expressive neural head avatar driven by latent expression codes. Such latent expression codes are learned in an end-to-end and self-supervised manner without templates, enabling our method to get rid of expression and tracking issues. To achieve this, we leverage a latent head NeRF to learn the person-specific latent expression codes from a monocular portrait video, and further design a Y-shaped network to learn the shared latent expression codes of different subjects for cross-identity reenactment. By optimizing the photometric reconstruction objectives in NeRF, the latent expression codes are learned to be 3D-aware while faithfully capturing the high-frequency detailed expressions. Moreover, by learning a mapping between the latent expression code learned in shared and person-specific settings, LatentAvatar is able to perform expressive reenactment between different subjects. Experimental results show that our LatentAvatar is able to capture challenging expressions and the subtle movement of teeth and even eyeballs, which outperforms previous state-of-the-art solutions in both quantitative and qualitative comparisons. Project page: https://www.liuyebin.com/latentavatar.
Exploration of Unranked Items in Safe Online Learning to Re-Rank
Authors: Hiroaki Shiino, Kaito Ariu, Kenshi Abe, Togashi Riku
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01202
Pdf link: https://arxiv.org/pdf/2305.01202
Abstract Bandit algorithms for online learning to rank (OLTR) problems often aim to maximize long-term revenue by utilizing user feedback. From a practical point of view, however, such algorithms have a high risk of hurting user experience due to their aggressive exploration. Thus, there has been a rising demand for safe exploration in recent years. One approach to safe exploration is to gradually enhance the quality of an original ranking that is already guaranteed acceptable quality. In this paper, we propose a safe OLTR algorithm that efficiently exchanges one of the items in the current ranking with an item outside the ranking (i.e., an unranked item) to perform exploration. We select an unranked item optimistically to explore based on Kullback-Leibler upper confidence bounds (KL-UCB) and safely re-rank the items including the selected one. Through experiments, we demonstrate that the proposed algorithm improves long-term regret from baselines without any safety violation.
Chronosymbolic Learning: Efficient CHC Solving with Symbolic Reasoning and Inductive Learning
Authors: Ziyan Luo, Xujie Si
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2305.01206
Pdf link: https://arxiv.org/pdf/2305.01206
Abstract Solving Constrained Horn Clauses (CHCs) is a fundamental challenge behind a wide range of verification and analysis tasks. Data-driven approaches show great promise in improving CHC solving without the painstaking manual effort of creating and tuning various heuristics. However, a large performance gap exists between data-driven CHC solvers and symbolic reasoning-based solvers. In this work, we develop a simple but effective framework, "Chronosymbolic Learning", which unifies symbolic information and numerical data points to solve a CHC system efficiently. We also present a simple instance of Chronosymbolic Learning with a data-driven learner and a BMC-styled reasoner. Despite its great simplicity, experimental results show the efficacy and robustness of our tool. It outperforms state-of-the-art CHC solvers on a dataset consisting of 288 benchmarks, including many instances with non-linear integer arithmetics.
Rate-Compatible Polar Codes for Automorphism Ensemble Decoding
Authors: Marvin Geiselhart, Jannis Clausius, Stephan ten Brink
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2305.01214
Pdf link: https://arxiv.org/pdf/2305.01214
Abstract Recently, automorphism ensemble decoding (AED) has drawn research interest as a more computationally efficient alternative to successive cancellation list (SCL) decoding of polar codes. Although AED has demonstrated superior performance for specific code parameters, a flexible code design that can accommodate varying code rates does not yet exist. This work proposes a theoretical framework for constructing rate-compatible polar codes with a prescribed automorphism group, which is a key requirement for AED. We first prove that a one-bit granular sequence with useful automorphisms cannot exist. However, by allowing larger steps in the code dimension, flexible code sequences can be constructed. An explicit synthetic channel ranking based on the $\beta$-expansion is then proposed to ensure that all constructed codes possess the desired symmetries. Simulation results, covering a broad range of code dimensions and blocklengths, show a performance comparable to that of 5G polar codes under cyclic redundancy check (CRC)-aided SCL decoding, however, with lower complexity.
Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models
Authors: Shuai Zhao, Jinming Wen, Luu Anh Tuan, Junbo Zhao, Jie Fu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.01219
Pdf link: https://arxiv.org/pdf/2305.01219
Abstract The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose {\bf ProAttack}, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack's competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers. All data and code used in our models are publically available\footnote{\url{https://github.com/shuaizhao95/Prompt_attack}}.
Updatable Learned Indexes Meet Disk-Resident DBMS -- From Evaluations to Design Choices
Authors: Hai Lan, Zhifeng Bao, J. Shane Culpepper, Renata Borovica-Gajic
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2305.01237
Pdf link: https://arxiv.org/pdf/2305.01237
Abstract Although many updatable learned indexes have been proposed in recent years, whether they can outperform traditional approaches on disk remains unknown. In this study, we revisit and implement four state-of-the-art updatable learned indexes on disk, and compare them against the B+-tree under a wide range of settings. Through our evaluation, we make some key observations: 1) Overall, the B+-tree performs well across a range of workload types and datasets. 2) A learned index could outperform B+-tree or other learned indexes on disk for a specific workload. For example, PGM achieves the best performance in write-only workloads while LIPP significantly outperforms others in lookup-only workloads. We further conduct a detailed performance analysis to reveal the strengths and weaknesses of these learned indexes on disk. Moreover, we summarize the observed common shortcomings in five categories and propose four design principles to guide future design of on-disk, updatable learned indexes: (1) reducing the index's tree height, (2) better data structures to lower operation overheads, (3) improving the efficiency of scan operations, and (4) more efficient storage layout.
Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators
Authors: Manos Pavlidakis, Stelios Mavridis, Antony Chazapis, Giorgos Vasiliadis, Angelos Bilas
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.01291
Pdf link: https://arxiv.org/pdf/2305.01291
Abstract Today, using multiple heterogeneous accelerators efficiently from applications and high-level frameworks, such as TensorFlow and Caffe, poses significant challenges in three respects: (a) sharing accelerators, (b) allocating available resources elastically during application execution, and (c) reducing the required programming effort. In this paper, we present Arax, a runtime system that decouples applications from heterogeneous accelerators within a server. First, Arax maps application tasks dynamically to available resources, managing all required task state, memory allocations, and task dependencies. As a result, Arax can share accelerators across applications in a server and adjust the resources used by each application as load fluctuates over time. dditionally, Arax offers a simple API and includes Autotalk, a stub generator that automatically generates stub libraries for applications already written for specific accelerator types, such as NVIDIA GPUs. Consequently, Arax applications are written once without considering physical details, including the number and type of accelerators. Our results show that applications, such as Caffe, TensorFlow, and Rodinia, can run using Arax with minimum effort and low overhead compared to native execution, about 12% (geometric mean). Arax supports efficient accelerator sharing, by offering up to 20% improved execution times compared to NVIDIA MPS, which supports NVIDIA GPUs only. Arax can transparently provide elasticity, decreasing total application turn-around time by up to 2x compared to native execution without elasticity support.
Higher-Order GFDM for Linear Elliptic Operators
Authors: Heinrich Kraus, Jörg Kuhnert, Pratik Suchde
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.01320
Pdf link: https://arxiv.org/pdf/2305.01320
Abstract We present a novel approach of discretizing diffusion operators of the form $\nabla\cdot(\lambda\nabla u)$ in the context of meshfree generalized finite difference methods. Our ansatz uses properties of derived operators and combines the discrete Laplace operator with reconstruction functions approximating the diffusion coefficient $\lambda$. Provided that the reconstructions are of a sufficiently high order, we prove that the order of accuracy of the discrete Laplace operator transfers to the derived diffusion operator. We show that the new discrete diffusion operator inherits the diagonal dominance property of the discrete Laplace operator and fulfills enrichment properties. Our numerical results for elliptic and parabolic partial differential equations show that even low-order reconstructions preserve the order of the underlying discrete Laplace operator for sufficiently smooth diffusion coefficients. In experiments, we demonstrate the applicability of the new discrete diffusion operator to interface problems with point clouds not aligning to the interface and numerically prove first-order convergence.
Guaranteeing Envy-Freeness under Generalized Assignment Constraints
Authors: Siddharth Barman, Arindam Khan, Sudarshan Shyam, K. V. N. Sreenivas
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2305.01339
Pdf link: https://arxiv.org/pdf/2305.01339
Abstract We study fair division of goods under the broad class of generalized assignment constraints. In this constraint framework, the sizes and values of the goods are agent-specific, and one needs to allocate the goods among the agents fairly while further ensuring that each agent receives a bundle of total size at most the corresponding budget of the agent. Since, in such a constraint setting, it may not always be feasible to partition all the goods among the agents, we conform -- as in recent works -- to the construct of charity to designate the set of unassigned goods. For this allocation framework, we obtain existential and computational guarantees for envy-free (appropriately defined) allocation of divisible and indivisible goods, respectively, among agents with individual, additive valuations for the goods. We deem allocations to be fair by evaluating envy only with respect to feasible subsets. In particular, an allocation is said to be feasibly envy-free (FEF) iff each agent prefers its bundle over every (budget) feasible subset within any other agent's bundle (and within the charity). The current work establishes that, for divisible goods, FEF allocations are guaranteed to exist and can be computed efficiently under generalized assignment constraints. In the context of indivisible goods, FEF allocations do not necessarily exist, and hence, we consider the fairness notion of feasible envy-freeness up to any good (FEFx). We show that, under generalized assignment constraints, an FEFx allocation of indivisible goods always exists. In fact, our FEFx result resolves open problems posed in prior works. Further, for indivisible goods and under generalized assignment constraints, we provide a pseudo-polynomial time algorithm for computing FEFx allocations, and a fully polynomial-time approximation scheme (FPTAS) for computing approximate FEFx allocations.
Next-Generation Full Duplex Networking System Empowered by Reconfigurable Intelligent Surfaces
Authors: Yingyang Chen, Yuncong Li, Miaowen Wen, Duoying Zhang, Bingli Jiao, Zhiguo Ding, Theodoros A. Tsiftsis, H. Vincent Poor
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.01341
Pdf link: https://arxiv.org/pdf/2305.01341
Abstract Full duplex (FD) radio has attracted extensive attention due to its co-time and co-frequency transceiving capability. {However, the potential gain brought by FD radios is closely related to the management of self-interference (SI), which imposes high or even stringent requirements on SI cancellation (SIC) techniques. When the FD deployment evolves into next-generation mobile networking, the SI problem becomes more complicated, significantly limiting its potential gains.} In this paper, we conceive a multi-cell FD networking scheme by deploying a reconfigurable intelligent surface (RIS) at the cell boundary to configure the radio environment proactively. To achieve the full potential of the system, we aim to maximize the sum rate (SR) of multiple cells by jointly optimizing the transmit precoding (TPC) matrices at FD base stations (BSs) and users and the phase shift matrix at RIS. Since the original problem is non-convex, we reformulate and decouple it into a pair of subproblems by utilizing the relationship between the SR and minimum mean square error (MMSE). The optimal solutions of TPC matrices are obtained in closed form, while both complex circle manifold (CCM) and successive convex approximation (SCA) based algorithms are developed to resolve the phase shift matrix suboptimally. Our simulation results show that introducing an RIS into an FD networking system not only improves the overall SR significantly but also enhances the cell edge performance prominently. More importantly, we validate that the RIS deployment with optimized phase shifts can reduce the requirement for SIC and the number of BS antennas, which further reduces the hardware cost and power consumption, especially with a sufficient number of reflecting elements. As a result, the utilization of an RIS enables the originally cumbersome FD networking system to become efficient and practical.
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees
Authors: Daqian Shao, Marta Kwiatkowska
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
Arxiv link: https://arxiv.org/abs/2305.01381
Pdf link: https://arxiv.org/pdf/2305.01381
Abstract Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of the policy satisfying such specifications. Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.
Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing
Authors: Yifan Shi, Kang Wei, Li Shen, Jun Li, Xueqian Wang, Bo Yuan, Song Guo
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01387
Pdf link: https://arxiv.org/pdf/2305.01387
Abstract Federated learning (FL) is a collaborative learning paradigm for decentralized private data from mobile terminals (MTs). However, it suffers from issues in terms of communication, resource of MTs, and privacy. Existing privacy-preserving FL methods usually adopt the instance-level differential privacy (DP), which provides a rigorous privacy guarantee but with several bottlenecks: severe performance degradation, transmission overhead, and resource constraints of edge devices such as MTs. To overcome these drawbacks, we propose Fed-LTP, an efficient and privacy-enhanced FL framework with \underline{\textbf{L}}ottery \underline{\textbf{T}}icket \underline{\textbf{H}}ypothesis (LTH) and zero-concentrated D\underline{\textbf{P}} (zCDP). It generates a pruned global model on the server side and conducts sparse-to-sparse training from scratch with zCDP on the client side. On the server side, two pruning schemes are proposed: (i) the weight-based pruning (LTH) determines the pruned global model structure; (ii) the iterative pruning further shrinks the size of the pruned model's parameters. Meanwhile, the performance of Fed-LTP is also boosted via model validation based on the Laplace mechanism. On the client side, we use sparse-to-sparse training to solve the resource-constraints issue and provide tighter privacy analysis to reduce the privacy budget. We evaluate the effectiveness of Fed-LTP on several real-world datasets in both independent and identically distributed (IID) and non-IID settings. The results clearly confirm the superiority of Fed-LTP over state-of-the-art (SOTA) methods in communication, computation, and memory efficiencies while realizing a better utility-privacy trade-off.
Infrastructural Requirements and Regulatory Challenges of a Sustainable Urban Air Mobility Ecosystem
Authors: Árpád Takács, Tamás Haidegger
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.01398
Pdf link: https://arxiv.org/pdf/2305.01398
Abstract The United Nations has long put on the discussion agenda the sustainability challenges of ur- banization, which have both direct and indirect effects on future regulation strategies. Undoubtedly, most initiatives target better quality of life, improved access to services & goods and environment pro- tection. As commercial aerial urban transportation may become a feasible research goal in the near future, the connection possibilities between cities and regions scale up. It is expected that the growing number of vertical takeoff & landing vehicles used for passenger and goods transportation will change the infrastructure of the cities, and will have a significant effect on the cityscapes as well. In addition to the widely discussed regulatory and safety issues, the introduction of elevated traffic also raises environmental concerns, which influences the existing and required service and control infrastructure, and thus significantly affects sustainability. This paper provides narrated overview of the most common aspects of safety, licensing and regulations for passenger vertical takeoff & landing vehicles, and highlights the most important aspects of infrastructure planning, design and operation, which should be taken into account to maintain and efficiently operate this new way of transportation, leading to a sustainable urban air mobility ecosystem.
Get Back Here: Robust Imitation by Return-to-Distribution Planning
Authors: Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin, Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin, Robert Dadashi
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.01400
Pdf link: https://arxiv.org/pdf/2305.01400
Abstract We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version. To address the resulting distribution shift, we combine behavior cloning (BC) with a planner that is tasked to bring the agent back to states visited by the expert whenever the agent deviates from the demonstration distribution. The resulting algorithm, POIR, can be trained offline, and leverages online interactions to efficiently fine-tune its planner to improve performance over time. We test POIR on a variety of human-generated manipulation demonstrations in a realistic robotic manipulation simulator and show robustness of the learned policy to different initial state distributions and noisy dynamics.
Trade-off Between Optimal Efficiency and Envelope Correlation Coefficient for Antenna Clusters
Authors: Vojtech Neuman, Miloslav Capek, Lukas Jelinek, Anu Lehtovuori, Ville Viikari
Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.01416
Pdf link: https://arxiv.org/pdf/2305.01416
Abstract This paper introduces a theory for assessing and optimizing the multiple-input-multiple-output performance of multi-port cluster antennas in terms of efficiency, channel correlation, and power distribution. A method based on a convex optimization of feeding coefficients is extended with additional constraints allowing the user to control a ratio between the power radiated by the clusters. The formulation of the problem makes it possible to simultaneously optimize total efficiency and channel correlation with a fixed ratio between power radiated by the clusters, thus examining a trade-off between these parameters. It is shown that channel correlation, total efficiency, and allocation of radiated power are mutually conflicting parameters. The trade-offs are shown and discussed. The theory is demonstrated on a four-element antenna array and on a mobile terminal antenna.
An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots
Authors: Ke Qiu, Jingyu Zhang, Danying Sun, Rong Xiong, Haojian Lu, Yue Wang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.01458
Pdf link: https://arxiv.org/pdf/2305.01458
Abstract Piecewise constant curvature is a popular kinematics framework for continuum robots. Computing the model parameters from the desired end pose, known as the inverse kinematics problem, is fundamental in manipulation, tracking and planning tasks. In this paper, we propose an efficient multi-solution solver to address the inverse kinematics problem of 3-section constant-curvature robots by bridging both the theoretical reduction and numerical correction. We derive analytical conditions to simplify the original problem into a one-dimensional problem. Further, the equivalence of the two problems is formalised. In addition, we introduce an approximation with bounded error so that the one dimension becomes traversable while the remaining parameters analytically solvable. With the theoretical results, the global search and numerical correction are employed to implement the solver. The experiments validate the better efficiency and higher success rate of our solver than the numerical methods when one solution is required, and demonstrate the ability of obtaining multiple solutions with optimal path planning in a space with obstacles.
An Efficient Quadratic Interpolation Scheme for a Third-Order Cell-Centered Finite-Volume Method on Tetrahedral Grids
Authors: Hiroaki Nishikawa, Jeffery A. White
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2305.01466
Pdf link: https://arxiv.org/pdf/2305.01466
Abstract In this paper, we propose an efficient quadratic interpolation formula utilizing solution gradients computed and stored at nodes and demonstrate its application to a third-order cell-centered finite-volume discretization on tetrahedral grids. The proposed quadratic formula is constructed based on an efficient formula of computing a projected derivative. It is efficient in that it completely eliminates the need to compute and store second derivatives of solution variables or any other quantities, which are typically required in upgrading a second-order cell-centered unstructured-grid finite-volume discretization to third-order accuracy. Moreover, a high-order flux quadrature formula, as required for third-order accuracy, can also be simplified by utilizing the efficient projected-derivative formula, resulting in a numerical flux at a face centroid plus a curvature correction not involving second derivatives of the flux. Similarly, a source term can be integrated over a cell to high-order in the form of a source term evaluated at the cell centroid plus a curvature correction, again, not requiring second derivatives of the source term. The discretization is defined as an approximation to an integral form of a conservation law but the numerical solution is defined as a point value at a cell center, leading to another feature that there is no need to compute and store geometric moments for a quadratic polynomial to preserve a cell average. Third-order accuracy and improved second-order accuracy are demonstrated and investigated for simple but illustrative test cases in three dimensions.
Stochastic Contextual Bandits with Graph-based Contexts
Authors: Jittat Fakcharoenphol, Chayutpong Prompak
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.01470
Pdf link: https://arxiv.org/pdf/2305.01470
Abstract We naturally generalize the on-line graph prediction problem to a version of stochastic contextual bandit problems where contexts are vertices in a graph and the structure of the graph provides information on the similarity of contexts. More specifically, we are given a graph $G=(V,E)$, whose vertex set $V$ represents contexts with {\em unknown} vertex label $y$. In our stochastic contextual bandit setting, vertices with the same label share the same reward distribution. The standard notion of instance difficulties in graph label prediction is the cutsize $f$ defined to be the number of edges whose end points having different labels. For line graphs and trees we present an algorithm with regret bound of $\tilde{O}(T^{2/3}K^{1/3}f^{1/3})$ where $K$ is the number of arms. Our algorithm relies on the optimal stochastic bandit algorithm by Zimmert and Seldin~[AISTAT'19, JMLR'21]. When the best arm outperforms the other arms, the regret improves to $\tilde{O}(\sqrt{KT\cdot f})$. The regret bound in the later case is comparable to other optimal contextual bandit results in more general cases, but our algorithm is easy to analyze, runs very efficiently, and does not require an i.i.d. assumption on the input context sequence. The algorithm also works with general graphs using a standard random spanning tree reduction.
Efficient Sensitivity Analysis for Parametric Robust Markov Chains
Authors: Thom Badings, Sebastian Junges, Ahmadreza Marandi, Ufuk Topcu, Nils Jansen
Subjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2305.01473
Pdf link: https://arxiv.org/pdf/2305.01473
Abstract We provide a novel method for sensitivity analysis of parametric robust Markov chains. These models incorporate parameters and sets of probability distributions to alleviate the often unrealistic assumption that precise probabilities are available. We measure sensitivity in terms of partial derivatives with respect to the uncertain transition probabilities regarding measures such as the expected reward. As our main contribution, we present an efficient method to compute these partial derivatives. To scale our approach to models with thousands of parameters, we present an extension of this method that selects the subset of $k$ parameters with the highest partial derivative. Our methods are based on linear programming and differentiating these programs around a given value for the parameters. The experiments show the applicability of our approach on models with over a million states and thousands of parameters. Moreover, we embed the results within an iterative learning scheme that profits from having access to a dedicated sensitivity analysis.
Building Reliable Budget-Based Binary-State Networks
Authors: Wei-Chang Yeh
Subjects: Networking and Internet Architecture (cs.NI); Probability (math.PR); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2305.01488
Pdf link: https://arxiv.org/pdf/2305.01488
Abstract Everyday life is driven by various network, such as supply chains for distributing raw materials, semi-finished product goods, and final products; Internet of Things (IoT) for connecting and exchanging data; utility networks for transmitting fuel, power, water, electricity, and 4G/5G; and social networks for sharing information and connections. The binary-state network is a basic network, where the state of each component is either success or failure, i.e., the binary-state. Network reliability plays an important role in evaluating the performance of network planning, design, and management. Because more networks are being set up in the real world currently, there is a need for their reliability. It is necessary to build a reliable network within a limited budget. However, existing studies are focused on the budget limit for each minimal path (MP) in networks without considering the total budget of the entire network. We propose a novel concept to consider how to build a more reliable binary-state network under the budget limit. In addition, we propose an algorithm based on the binary-addition-tree algorithm (BAT) and stepwise vectors to solve the problem efficiently.
BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms
Authors: Ziyang Zhang, Huan Li, Yang Zhao, Changyao Lin, Jie Liu
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)
Arxiv link: https://arxiv.org/abs/2305.01519
Pdf link: https://arxiv.org/pdf/2305.01519
Abstract As deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications, it is critical for edge inference platforms to have both high-throughput and low-latency at the same time. Such edge platforms with multiple DNN models pose new challenges for scheduler designs. First, each request may have different service level objectives (SLOs) to improve quality of service (QoS). Second, the edge platforms should be able to efficiently schedule multiple heterogeneous DNN models so that system utilization can be improved. To meet these two goals, this paper proposes BCEdge, a novel learning-based scheduling framework that takes adaptive batching and concurrent execution of DNN inference services on edge platforms. We define a utility function to evaluate the trade-off between throughput and latency. The scheduler in BCEdge leverages maximum entropy-based deep reinforcement learning (DRL) to maximize utility by 1) co-optimizing batch size and 2) the number of concurrent models automatically. Our prototype implemented on different edge platforms shows that the proposed BCEdge enhances utility by up to 37.6% on average, compared to state-of-the-art solutions, while satisfying SLOs.
Unlocking the Power of Representations in Long-term Novelty-based Exploration
Authors: Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.01521
Pdf link: https://arxiv.org/pdf/2305.01521
Abstract We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!".
Faster 0-1-Knapsack via Near-Convex Min-Plus-Convolution
Authors: Karl Bringmann, Alejandro Cassis
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.01593
Pdf link: https://arxiv.org/pdf/2305.01593
Abstract We revisit the classic 0-1-Knapsack problem, in which we are given $n$ items with their weights and profits as well as a weight budget $W$, and the goal is to find a subset of items of total weight at most $W$ that maximizes the total profit. We study pseudopolynomial-time algorithms parameterized by the largest profit of any item $p{\max}$, and the largest weight of any item $w{\max}$. Our main result are algorithms for 0-1-Knapsack running in time $\tilde{O}(n\,w\max\,p\max^{2/3})$ and $\tilde{O}(n\,p\max\,w\max^{2/3})$, improving upon an algorithm in time $O(n\,p\max\,w\max)$ by Pisinger [J. Algorithms '99]. In the regime $p\max \approx w\max \approx n$ (and $W \approx \mathrm{OPT} \approx n^2$) our algorithms are the first to break the cubic barrier $n^3$. To obtain our result, we give an efficient algorithm to compute the min-plus convolution of near-convex functions. More precisely, we say that a function $f \colon [n] \mapsto \mathbf{Z}$ is $\Delta$-near convex with $\Delta \geq 1$, if there is a convex function $\breve{f}$ such that $\breve{f}(i) \leq f(i) \leq \breve{f}(i) + \Delta$ for every $i$. We design an algorithm computing the min-plus convolution of two $\Delta$-near convex functions in time $\tilde{O}(n\Delta)$. This tool can replace the usage of the prediction technique of Bateni, Hajiaghayi, Seddighin and Stein [STOC '18] in all applications we are aware of, and we believe it has wider applicability.
Augmented Electronic Ising Machine as an Effective SAT Solver
Authors: Anshujit Sharma, Matthew Burns, Andrew Hahn, Michael Huang
Subjects: Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2305.01623
Pdf link: https://arxiv.org/pdf/2305.01623
Abstract With the slowdown of improvement in conventional von Neumann systems, increasing attention is paid to novel paradigms such as Ising machines. They have very different approach to NP-complete optimization problems. Ising machines have shown great potential in solving binary optimization problems like MaxCut. In this paper, we present an analysis of these systems in satisfiability (SAT) problems. We demonstrate that, in the case of 3-SAT, a basic architecture fails to produce meaningful acceleration, thanks in no small part to the relentless progress made in conventional SAT solvers. Nevertheless, careful analysis attributes part of the failure to the lack of two important components: cubic interactions and efficient randomization heuristics. To overcome these limitations, we add proper architectural support for cubic interaction on a state-of-the-art Ising machine. More importantly, we propose a novel semantic-aware annealing schedule that makes the search-space navigation much more efficient than existing annealing heuristics. With experimental analyses, we show that such an Augmented Ising Machine for SAT (AIMS), outperforms state-of-the-art software-based, GPU-based and conventional hardware SAT solvers by orders of magnitude. We also demonstrate AIMS to be relatively robust against device variation and noise.
Sequence Modeling with Multiresolution Convolutional Memory
Authors: Jiaxin Shi, Ke Alexander Wang, Emily B. Fox
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.01638
Pdf link: https://arxiv.org/pdf/2305.01638
Abstract Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural networks, or the parameter burden of convolutional networks with many or large filters. We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our MultiresConv can be implemented with shared filters across a dilated causal convolution tree. Thus it garners the computational advantages of convolutional networks and the principled theoretical motivation of wavelet decompositions. Our MultiresLayer is straightforward to implement, requires significantly fewer parameters, and maintains at most a $\mathcal{O}(N\log N)$ memory footprint for a length $N$ sequence. Yet, by stacking such layers, our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks using CIFAR-10, ListOps, and PTB-XL datasets.
Key-Locked Rank One Editing for Text-to-Image Personalization
Authors: Yoad Tewel, Rinon Gal, Gal Chechik, Yuval Atzmon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2305.01644
Pdf link: https://arxiv.org/pdf/2305.01644
Abstract Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that "locks" new concepts' cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art. Moreover, it can span different operating points across the Pareto front without additional training. Finally, we show that Perfusion outperforms strong baselines in both qualitative and quantitative terms. Importantly, key-locking leads to novel results compared to traditional approaches, allowing to portray personalized object interactions in unprecedented ways, even in one-shot settings.
Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models
Authors: Junmo Kang, Wei Xu, Alan Ritter
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.01645
Pdf link: https://arxiv.org/pdf/2305.01645
Abstract Fine-tuning large models is highly effective, however, inference using these models can be expensive and produces carbon emissions. Knowledge distillation has been shown to be a practical solution to reduce inference costs, but the distillation process itself requires significant computational resources. Rather than buying or renting GPUs to fine-tune, then distill a large model, an NLP practitioner who needs a compact model might also choose to simply allocate an available budget to hire annotators and manually label additional fine-tuning data. In this paper, we investigate how to most efficiently use a fixed budget to build a compact model. Through our extensive experiments on six diverse NLP tasks, we find that distilling from T5-XXL (11B) to T5-Small (60M) leads to almost always a cost-efficient option compared to annotating more data to directly train a compact model (T5-Small (60M)). We further demonstrate that the optimal amount of distillation that maximizes utility varies across different budgetary scenarios.
Keyword: faster

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
Authors: Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2305.01024
Pdf link: https://arxiv.org/pdf/2305.01024
Abstract General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While researchers often strive for faster performance by using large compute platforms, the increased scale of these systems can raise concerns about hardware and software reliability. In this paper, we present a design for a high-performance GEMM with algorithm-based fault tolerance for use on GPUs. We describe fault-tolerant designs for GEMM at the thread, warp, and threadblock levels, and also provide a baseline GEMM implementation that is competitive with or faster than the state-of-the-art, proprietary cuBLAS GEMM. We present a kernel fusion strategy to overlap and mitigate the memory latency due to fault tolerance with the original GEMM computation. To support a wide range of input matrix shapes and reduce development costs, we present a template-based approach for automatic code generation for both fault-tolerant and non-fault-tolerant GEMM implementations. We evaluate our work on NVIDIA Tesla T4 and A100 server GPUs. Experimental results demonstrate that our baseline GEMM presents comparable or superior performance compared to the closed-source cuBLAS. The fault-tolerant GEMM incurs only a minimal overhead (8.89\% on average) compared to cuBLAS even with hundreds of errors injected per minute. For irregularly shaped inputs, the code generator-generated kernels show remarkable speedups of $160\% \sim 183.5\%$ and $148.55\% \sim 165.12\%$ for fault-tolerant and non-fault-tolerant GEMMs, outperforming cuBLAS by up to $41.40\%$.
Autoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems
Authors: Kevin Zeng, Michael D. Graham
Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
Arxiv link: https://arxiv.org/abs/2305.01090
Pdf link: https://arxiv.org/pdf/2305.01090
Abstract While many phenomena in physics and engineering are formally high-dimensional, their long-time dynamics often live on a lower-dimensional manifold. The present work introduces an autoencoder framework that combines implicit regularization with internal linear layers and $L_2$ regularization (weight decay) to automatically estimate the underlying dimensionality of a data set, produce an orthogonal manifold coordinate system, and provide the mapping functions between the ambient space and manifold space, allowing for out-of-sample projections. We validate our framework's ability to estimate the manifold dimension for a series of datasets from dynamical systems of varying complexities and compare to other state-of-the-art estimators. We analyze the training dynamics of the network to glean insight into the mechanism of low-rank learning and find that collectively each of the implicit regularizing layers compound the low-rank representation and even self-correct during training. Analysis of gradient descent dynamics for this architecture in the linear case reveals the role of the internal linear layers in leading to faster decay of a "collective weight variable" incorporating all layers, and the role of weight decay in breaking degeneracies and thus driving convergence along directions in which no decay would occur in its absence. We show that this framework can be naturally extended for applications of state-space modeling and forecasting by generating a data-driven dynamic model of a spatiotemporally chaotic partial differential equation using only the manifold coordinates. Finally, we demonstrate that our framework is robust to hyperparameter choices.
Faster OreFSDet : A Lightweight and Effective Few-shot Object Detector for Ore Images
Authors: Yang Zhang, Le Cheng, Yuting Peng, Chengming Xu, Yanwei Fu, Bo Wu, Guodong Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2305.01183
Pdf link: https://arxiv.org/pdf/2305.01183
Abstract For the ore particle size detection, obtaining a sizable amount of high-quality ore labeled data is time-consuming and expensive. General object detection methods often suffer from severe over-fitting with scarce labeled data. Despite their ability to eliminate over-fitting, existing few-shot object detectors encounter drawbacks such as slow detection speed and high memory requirements, making them difficult to implement in a real-world deployment scenario. To this end, we propose a lightweight and effective few-shot detector to achieve competitive performance with general object detection with only a few samples for ore images. First, the proposed support feature mining block characterizes the importance of location information in support features. Next, the relationship guidance block makes full use of support features to guide the generation of accurate candidate proposals. Finally, the dual-scale semantic aggregation module retrieves detailed features at different resolutions to contribute with the prediction process. Experimental results show that our method consistently exceeds the few-shot detectors with an excellent performance gap on all metrics. Moreover, our method achieves the smallest model size of 19MB as well as being competitive at 50 FPS detection speed compared with general object detectors. The source code is available at https://github.com/MVME-HBUT/Faster-OreFSDet.
Optimizing Guided Traversal for Fast Learned Sparse Retrieval
Authors: Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2305.01203
Pdf link: https://arxiv.org/pdf/2305.01203
Abstract Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.
The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold
Authors: Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01604
Pdf link: https://arxiv.org/pdf/2305.01604
Abstract We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
Keyword: mobile

Development of IoT Smart Greenhouse System for Hydroponic Gardens
Authors: Arcel Christian H. Austria, John Simon Fabros, Kurt Russel G. Sumilang, Jocelyn Bernardino, Anabella C. Doctor
Subjects: Systems and Control (eess.SY); Computers and Society (cs.CY); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2305.01189
Pdf link: https://arxiv.org/pdf/2305.01189
Abstract This study focused on the development of a smart greenhouse system for hydroponic gardens with the adaptation of the Internet of Things and monitored through mobile as one of the solutions towards the negative effects of the worlds booming population, never ending - shrinking of arable lands, and the effect of climate change drastically in our environments. To achieve the goal of the study, the researchers created an actual hydroponic greenhouse system with completely developing plants, and automation in examining and monitoring the water pH level, light, water, and greenhouse temperature, as well as humidity which is linked to ThingSpeak. The developed SMART Greenhouse monitoring system was tested and evaluated to confirm its reliability, functions, and usability under ISO 9126 evaluation criteria. The respondents who include casual plant owners and experts in hydroponic gardening able to test and evaluate the prototype, and the mobile application to monitor the parameters with the results of 7.77 for pH level, 83 for light, 27.94 deg C for water temperature, 27 deg C for greenhouse temperature, and 75% for humidity with a descriptive result in both software and hardware as Very Good with a mean average of 4.06 which means that the developed technology is useful and recommended. The SMART Greenhouse System for Hydroponic Garden is used as an alternative tool, solution, and innovation technique towards food shortages due to climate change, land shortages, and low farming environments. The proponents highly suggest the use of solar energy for the pump power, prototype wiring should be improved, the usage of a high-end model of Arduino to address more sensors and devices for a larger arsenal of data collected, enclosures of the device to ensure safety, and mobile application updates such as bug fixes and have an e-manual of the whole systems.
HuNavSim: A ROS 2 Human Navigation Simulator for Benchmarking Human-Aware Robot Navigation
Authors: Noé Pérez-Higueras, Roberto Otero, Fernando Caballero, Luis Merino
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.01303
Pdf link: https://arxiv.org/pdf/2305.01303
Abstract This work presents the Human Navigation Simulator (HuNavSim), a novel open-source tool for the simulation of different human-agent navigation behaviors in scenarios with mobile robots. The tool, the first programmed under the ROS 2 framework, can be employed along with different well-known robotics simulators like Gazebo. The main goal is to ease the development and evaluation of human-aware robot navigation systems in simulation. Besides a general human-navigation model, HuNavSim includes, as a novelty, a rich set of individual and realistic human navigation behaviors and a complete set of metrics for social navigation benchmarking.
Next-Generation Full Duplex Networking System Empowered by Reconfigurable Intelligent Surfaces
Authors: Yingyang Chen, Yuncong Li, Miaowen Wen, Duoying Zhang, Bingli Jiao, Zhiguo Ding, Theodoros A. Tsiftsis, H. Vincent Poor
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.01341
Pdf link: https://arxiv.org/pdf/2305.01341
Abstract Full duplex (FD) radio has attracted extensive attention due to its co-time and co-frequency transceiving capability. {However, the potential gain brought by FD radios is closely related to the management of self-interference (SI), which imposes high or even stringent requirements on SI cancellation (SIC) techniques. When the FD deployment evolves into next-generation mobile networking, the SI problem becomes more complicated, significantly limiting its potential gains.} In this paper, we conceive a multi-cell FD networking scheme by deploying a reconfigurable intelligent surface (RIS) at the cell boundary to configure the radio environment proactively. To achieve the full potential of the system, we aim to maximize the sum rate (SR) of multiple cells by jointly optimizing the transmit precoding (TPC) matrices at FD base stations (BSs) and users and the phase shift matrix at RIS. Since the original problem is non-convex, we reformulate and decouple it into a pair of subproblems by utilizing the relationship between the SR and minimum mean square error (MMSE). The optimal solutions of TPC matrices are obtained in closed form, while both complex circle manifold (CCM) and successive convex approximation (SCA) based algorithms are developed to resolve the phase shift matrix suboptimally. Our simulation results show that introducing an RIS into an FD networking system not only improves the overall SR significantly but also enhances the cell edge performance prominently. More importantly, we validate that the RIS deployment with optimized phase shifts can reduce the requirement for SIC and the number of BS antennas, which further reduces the hardware cost and power consumption, especially with a sufficient number of reflecting elements. As a result, the utilization of an RIS enables the originally cumbersome FD networking system to become efficient and practical.
Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing
Authors: Yifan Shi, Kang Wei, Li Shen, Jun Li, Xueqian Wang, Bo Yuan, Song Guo
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01387
Pdf link: https://arxiv.org/pdf/2305.01387
Abstract Federated learning (FL) is a collaborative learning paradigm for decentralized private data from mobile terminals (MTs). However, it suffers from issues in terms of communication, resource of MTs, and privacy. Existing privacy-preserving FL methods usually adopt the instance-level differential privacy (DP), which provides a rigorous privacy guarantee but with several bottlenecks: severe performance degradation, transmission overhead, and resource constraints of edge devices such as MTs. To overcome these drawbacks, we propose Fed-LTP, an efficient and privacy-enhanced FL framework with \underline{\textbf{L}}ottery \underline{\textbf{T}}icket \underline{\textbf{H}}ypothesis (LTH) and zero-concentrated D\underline{\textbf{P}} (zCDP). It generates a pruned global model on the server side and conducts sparse-to-sparse training from scratch with zCDP on the client side. On the server side, two pruning schemes are proposed: (i) the weight-based pruning (LTH) determines the pruned global model structure; (ii) the iterative pruning further shrinks the size of the pruned model's parameters. Meanwhile, the performance of Fed-LTP is also boosted via model validation based on the Laplace mechanism. On the client side, we use sparse-to-sparse training to solve the resource-constraints issue and provide tighter privacy analysis to reduce the privacy budget. We evaluate the effectiveness of Fed-LTP on several real-world datasets in both independent and identically distributed (IID) and non-IID settings. The results clearly confirm the superiority of Fed-LTP over state-of-the-art (SOTA) methods in communication, computation, and memory efficiencies while realizing a better utility-privacy trade-off.
A Mobile Quad-Arm Robot ARMS: Wheel-Legged Tripedal Mobility and Quad-Arm Manipulation
Authors: Hisayoshi Muramatsu, Keigo Kitagawa, Jun Watanabe, Ryohei Hisashiki
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.01406
Pdf link: https://arxiv.org/pdf/2305.01406
Abstract This letter proposes a mobile quad-arm robot: ARMS that unifies wheel-legged tripedal mobility, wheeled mobility, and quad-arm manipulation. The four arms have different mechanics and are designed to be general-purpose arms to enable the wheel-legged hybrid mobilities and manipulation. The three-degree-of-freedom (DOF) front arm has an active wheel, which is used for wheel-legged tripedal walking and wheel driving with passive wheels attached to the torso. The three-DOF rear arms are series elastic arms, which are used for wheel-legged tripedal walking, object grasping, and manipulation. The two-DOF upper arm is used for manipulation only; its position and orientation are determined by coordinating all arms. Each motor is controlled by an angle controller and trajectory modification with angle, angular velocity, angular acceleration, and torque constraints. ARMS was experimentally validated on the basis of the following four tasks: wheel-legged walking, wheel-driving, wheel-driving with grasping, and carrying a bag.
Trade-off Between Optimal Efficiency and Envelope Correlation Coefficient for Antenna Clusters
Authors: Vojtech Neuman, Miloslav Capek, Lukas Jelinek, Anu Lehtovuori, Ville Viikari
Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.01416
Pdf link: https://arxiv.org/pdf/2305.01416
Abstract This paper introduces a theory for assessing and optimizing the multiple-input-multiple-output performance of multi-port cluster antennas in terms of efficiency, channel correlation, and power distribution. A method based on a convex optimization of feeding coefficients is extended with additional constraints allowing the user to control a ratio between the power radiated by the clusters. The formulation of the problem makes it possible to simultaneously optimize total efficiency and channel correlation with a fixed ratio between power radiated by the clusters, thus examining a trade-off between these parameters. It is shown that channel correlation, total efficiency, and allocation of radiated power are mutually conflicting parameters. The trade-offs are shown and discussed. The theory is demonstrated on a four-element antenna array and on a mobile terminal antenna.
On the Collaborative Object Transportation Using Leader Follower Approach
Authors: Sumanta Ghosh, Subhajit Nath, Sarvesh Sortee, Lokesh Kumar, Titas Bera
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.01614
Pdf link: https://arxiv.org/pdf/2305.01614
Abstract In this paper we address the multi-agent collaborative object transportation problem in a partially known environment with obstacles under a specified goal condition. We propose a leader follower approach for two mobile manipulators collaboratively transporting an object along specified desired trajectories. The proposed approach treats the mobile manipulation system as two independent subsystems: a mobile platform and a manipulator arm and uses their kinematics model for trajectory tracking. In this work we considered that the mobile platform is subject to non-holonomic constraints, with a manipulator carrying a rigid load. The desired trajectories of the end points of the load are obtained from Probabilistic RoadMap-based planning approach. Our method combines Proportional Navigation Guidance-based approach with a proposed Stop-and-Sync algorithm to reach sufficiently close to the desired trajectory, the deviation due to the non-holonomic constraints is compensated by the manipulator arm. A leader follower approach for computing inverse kinematics solution for the position of the end-effector of the manipulator arm is proposed to maintain the load rigidity. Further, we compare the proposed approach with other approaches to analyse the efficacy of our algorithm.
Keyword: pruning

Optimizing Guided Traversal for Fast Learned Sparse Retrieval
Authors: Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2305.01203
Pdf link: https://arxiv.org/pdf/2305.01203
Abstract Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.
Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing
Authors: Yifan Shi, Kang Wei, Li Shen, Jun Li, Xueqian Wang, Bo Yuan, Song Guo
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01387
Pdf link: https://arxiv.org/pdf/2305.01387
Abstract Federated learning (FL) is a collaborative learning paradigm for decentralized private data from mobile terminals (MTs). However, it suffers from issues in terms of communication, resource of MTs, and privacy. Existing privacy-preserving FL methods usually adopt the instance-level differential privacy (DP), which provides a rigorous privacy guarantee but with several bottlenecks: severe performance degradation, transmission overhead, and resource constraints of edge devices such as MTs. To overcome these drawbacks, we propose Fed-LTP, an efficient and privacy-enhanced FL framework with \underline{\textbf{L}}ottery \underline{\textbf{T}}icket \underline{\textbf{H}}ypothesis (LTH) and zero-concentrated D\underline{\textbf{P}} (zCDP). It generates a pruned global model on the server side and conducts sparse-to-sparse training from scratch with zCDP on the client side. On the server side, two pruning schemes are proposed: (i) the weight-based pruning (LTH) determines the pruned global model structure; (ii) the iterative pruning further shrinks the size of the pruned model's parameters. Meanwhile, the performance of Fed-LTP is also boosted via model validation based on the Laplace mechanism. On the client side, we use sparse-to-sparse training to solve the resource-constraints issue and provide tighter privacy analysis to reduce the privacy budget. We evaluate the effectiveness of Fed-LTP on several real-world datasets in both independent and identically distributed (IID) and non-IID settings. The results clearly confirm the superiority of Fed-LTP over state-of-the-art (SOTA) methods in communication, computation, and memory efficiencies while realizing a better utility-privacy trade-off.
Keyword: voxel

There is no result

Keyword: lidar

A New Wave in Robotics: Survey on Recent mmWave Radar Applications in Robotics
Authors: Kyle Harlow, Hyesu Jang, Timothy D. Barfoot, Ayoung Kim, Christoffer Heckman
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.01135
Pdf link: https://arxiv.org/pdf/2305.01135
Abstract We survey the current state of millimeterwave (mmWave) radar applications in robotics with a focus on unique capabilities, and discuss future opportunities based on the state of the art. Frequency Modulated Continuous Wave (FMCW) mmWave radars operating in the 76--81GHz range are an appealing alternative to lidars, cameras and other sensors operating in the near visual spectrum. Radar has been made more widely available in new packaging classes, more convenient for robotics and its longer wavelengths have the ability to bypass visual clutter such as fog, dust, and smoke. We begin by covering radar principles as they relate to robotics. We then review the relevant new research across a broad spectrum of robotics applications beginning with motion estimation, localization, and mapping. We then cover object detection and classification, and then close with an analysis of current datasets and calibration techniques that provide entry points into radar research.
Safe Autonomous Driving in Adverse Weather: Sensor Evaluation and Performance Monitoring
Authors: Fatih Sezgin, Daniel Vriesman, Dagmar Steinhauser, Robert Lugner, Thomas Brandmeier
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.01336
Pdf link: https://arxiv.org/pdf/2305.01336
Abstract The vehicle's perception sensors radar, lidar and camera, which must work continuously and without restriction, especially with regard to automated/autonomous driving, can lose performance due to unfavourable weather conditions. This paper analyzes the sensor signals of these three sensor technologies under rain and fog as well as day and night. A data set of a driving test vehicle as an object target under different weather conditions was recorded in a controlled environment with adjustable, defined, and reproducible weather conditions. Based on the sensor performance evaluation, a method has been developed to detect sensor degradation, including determining the affected data areas and estimating how severe they are. Through this sensor monitoring, measures can be taken in subsequent algorithms to reduce the influences or to take them into account in safety and assistance systems to avoid malfunctions.
FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow
Authors: Wenchao Ding, Jieru Zhao, Yubin Chu, Haihui Huang, Tong Qin, Chunjing Xu, Yuxiang Guan, Zhongxue Gan
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.01622
Pdf link: https://arxiv.org/pdf/2305.01622
Abstract There is extensive literature on perceiving road structures by fusing various sensor inputs such as lidar point clouds and camera images using deep neural nets. Leveraging the latest advance of neural architects (such as transformers) and bird-eye-view (BEV) representation, the road cognition accuracy keeps improving. However, how to cognize the road'' for automated vehicles where there is no well-definedroads'' remains an open problem. For example, how to find paths inside intersections without HD maps is hard since there is neither an explicit definition for roads'' nor explicit features such as lane markings. The idea of this paper comes from a proverb: it becomes a way when people walk on it. Although there are noroads'' from sensor readings, there are ``roads'' from tracks of other vehicles. In this paper, we propose FlowMap, a path generation framework for automated vehicles based on traffic flows. FlowMap is built by extending our previous work RoadMap, a light-weight semantic map, with an additional traffic flow layer. A path generation algorithm on traffic flow fields (TFFs) is proposed to generate human-like paths. The proposed framework is validated using real-world driving data and is amenable to generating paths for super complicated intersections without using HD maps.
Neural LiDAR Fields for Novel View Synthesis
Authors: Shengyu Huang, Zan Gojcic, Zian Wang, Francis Williams, Yoni Kasten, Sanja Fidler, Konrad Schindler, Or Litany
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.01643
Pdf link: https://arxiv.org/pdf/2305.01643
Abstract We present Neural Fields for LiDAR (NFL), a method to optimise a neural field scene representation from LiDAR measurements, with the goal of synthesizing realistic LiDAR scans from novel viewpoints. NFL combines the rendering power of neural fields with a detailed, physically motivated model of the LiDAR sensing process, thus enabling it to accurately reproduce key sensor behaviors like beam divergence, secondary returns, and ray dropping. We evaluate NFL on synthetic and real LiDAR scans and show that it outperforms explicit reconstruct-then-simulate methods as well as other NeRF-style methods on LiDAR novel view synthesis task. Moreover, we show that the improved realism of the synthesized views narrows the domain gap to real scans and translates to better registration and semantic segmentation performance.
Keyword: diffusion

In-Context Learning Unlocked for Diffusion Models
Authors: Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.01115
Pdf link: https://arxiv.org/pdf/2305.01115
Abstract We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our model automatically understands the underlying task and performs the same task on a new query image following the text guidance. To achieve this, we propose a vision-language prompt that can model a wide range of vision-language tasks and a diffusion model that takes it as input. The diffusion model is trained jointly over six different tasks using these prompts. The resulting Prompt Diffusion model is the first diffusion-based vision-language foundation model capable of in-context learning. It demonstrates high-quality in-context generation on the trained tasks and generalizes effectively to new, unseen vision tasks with their respective prompts. Our model also shows compelling text-guided image editing results. Our framework, with code publicly available at https://github.com/Zhendong-Wang/Prompt-Diffusion, aims to facilitate research into in-context learning for computer vision.
Geometric Latent Diffusion Models for 3D Molecule Generation
Authors: Minkai Xu, Alexander Powers, Ron Dror, Stefano Ermon, Jure Leskovec
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2305.01140
Pdf link: https://arxiv.org/pdf/2305.01140
Abstract Generative models, especially diffusion models (DMs), have achieved promising results for generating feature-rich geometries and advancing foundational science problems such as molecule design. Inspired by the recent huge success of Stable (latent) Diffusion models, we propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM). GeoLDM is the first latent DM model for the molecular geometry domain, composed of autoencoders encoding structures into continuous latent codes and DMs operating in the latent space. Our key innovation is that for modeling the 3D molecular geometries, we capture its critical roto-translational equivariance constraints by building a point-structured latent space with both invariant scalars and equivariant tensors. Extensive experiments demonstrate that GeoLDM can consistently achieve better performance on multiple molecule generation benchmarks, with up to 7\% improvement for the valid percentage of large biomolecules. Results also demonstrate GeoLDM's higher capacity for controllable generation thanks to the latent modeling. Code is provided at \url{https://github.com/MinkaiXu/GeoLDM}.
Solving Inverse Problems with Score-Based Generative Priors learned from Noisy Data
Authors: Asad Aali, Marius Arvinte, Sidharth Kumar, Jonathan I. Tamir
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.01166
Pdf link: https://arxiv.org/pdf/2305.01166
Abstract We present SURE-Score: an approach for learning score-based generative models using training samples corrupted by additive Gaussian noise. When a large training set of clean samples is available, solving inverse problems via score-based (diffusion) generative models trained on the underlying fully-sampled data distribution has recently been shown to outperform end-to-end supervised deep learning. In practice, such a large collection of training data may be prohibitively expensive to acquire in the first place. In this work, we present an approach for approximately learning a score-based generative model of the clean distribution, from noisy training data. We formulate and justify a novel loss function that leverages Stein's unbiased risk estimate to jointly denoise the data and learn the score function via denoising score matching, while using only the noisy samples. We demonstrate the generality of SURE-Score by learning priors and applying posterior sampling to ill-posed inverse problems in two practical applications from different domains: compressive wireless multiple-input multiple-output channel estimation and accelerated 2D multi-coil magnetic resonance imaging reconstruction, where we demonstrate competitive reconstruction performance when learning at signal-to-noise ratio values of 0 and 10 dB, respectively.
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling
Authors: Mehmet Saygin Seyfioglu, Karim Bouyarmane, Suren Kumar, Amir Tavanaei, Ismail B. Tutar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.01257
Pdf link: https://arxiv.org/pdf/2305.01257
Abstract We introduce DreamPaint, a framework to intelligently inpaint any e-commerce product on any user-provided context image. The context image can be, for example, the user's own image for virtual try-on of clothes from the e-commerce catalog on themselves, the user's room image for virtual try-on of a piece of furniture from the e-commerce catalog in their room, etc. As opposed to previous augmented-reality (AR)-based virtual try-on methods, DreamPaint does not use, nor does it require, 3D modeling of neither the e-commerce product nor the user context. Instead, it directly uses 2D images of the product as available in product catalog database, and a 2D picture of the context, for example taken from the user's phone camera. The method relies on few-shot fine tuning a pre-trained diffusion model with the masked latents (e.g., Masked DreamBooth) of the catalog images per item, whose weights are then loaded on a pre-trained inpainting module that is capable of preserving the characteristics of the context image. DreamPaint allows to preserve both the product image and the context (environment/user) image without requiring text guidance to describe the missing part (product/context). DreamPaint also allows to intelligently infer the best 3D angle of the product to place at the desired location on the user context, even if that angle was previously unseen in the product's reference 2D images. We compare our results against both text-guided and image-guided inpainting modules and show that DreamPaint yields superior performance in both subjective human study and quantitative metrics.
Long-Term Rhythmic Video Soundtracker
Authors: Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2305.01319
Pdf link: https://arxiv.org/pdf/2305.01319
Abstract We consider the problem of generating musical soundtracks in sync with rhythmic visual cues. Most existing works rely on pre-defined music representations, leading to the incompetence of generative flexibility and complexity. Other methods directly generating video-conditioned waveforms suffer from limited scenarios, short lengths, and unstable generation quality. To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms. Specifically, our framework consists of a latent conditional diffusion probabilistic model to perform waveform synthesis. Furthermore, a series of context-aware conditioning encoders are proposed to take temporal information into consideration for a long-term generation. Notably, we extend our model's applicability from dances to multiple sports scenarios such as floor exercise and figure skating. To perform comprehensive evaluations, we establish a benchmark for rhythmic video soundtracks including the pre-processed dataset, improved evaluation metrics, and robust generative baselines. Extensive experiments show that our model generates long-term soundtracks with state-of-the-art musical quality and rhythmic correspondence. Codes are available at \url{https://github.com/OpenGVLab/LORIS}.
Higher-Order GFDM for Linear Elliptic Operators
Authors: Heinrich Kraus, Jörg Kuhnert, Pratik Suchde
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.01320
Pdf link: https://arxiv.org/pdf/2305.01320
Abstract We present a novel approach of discretizing diffusion operators of the form $\nabla\cdot(\lambda\nabla u)$ in the context of meshfree generalized finite difference methods. Our ansatz uses properties of derived operators and combines the discrete Laplace operator with reconstruction functions approximating the diffusion coefficient $\lambda$. Provided that the reconstructions are of a sufficiently high order, we prove that the order of accuracy of the discrete Laplace operator transfers to the derived diffusion operator. We show that the new discrete diffusion operator inherits the diagonal dominance property of the discrete Laplace operator and fulfills enrichment properties. Our numerical results for elliptic and parabolic partial differential equations show that even low-order reconstructions preserve the order of the underlying discrete Laplace operator for sufficiently smooth diffusion coefficients. In experiments, we demonstrate the applicability of the new discrete diffusion operator to interface problems with point clouds not aligning to the interface and numerically prove first-order convergence.
Adopting AI: How Familiarity Breeds Both Trust and Contempt
Authors: Michael C. Horowitz, Lauren Kahn, Julia Macdonald, Jacquelyn Schneider
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2305.01405
Pdf link: https://arxiv.org/pdf/2305.01405
Abstract Despite pronouncements about the inevitable diffusion of artificial intelligence and autonomous technologies, in practice it is human behavior, not technology in a vacuum, that dictates how technology seeps into -- and changes -- societies. In order to better understand how human preferences shape technological adoption and the spread of AI-enabled autonomous technologies, we look at representative adult samples of US public opinion in 2018 and 2020 on the use of four types of autonomous technologies: vehicles, surgery, weapons, and cyber defense. By focusing on these four diverse uses of AI-enabled autonomy that span transportation, medicine, and national security, we exploit the inherent variation between these AI-enabled autonomous use cases. We find that those with familiarity and expertise with AI and similar technologies were more likely to support all of the autonomous applications we tested (except weapons) than those with a limited understanding of the technology. Individuals that had already delegated the act of driving by using ride-share apps were also more positive about autonomous vehicles. However, familiarity cut both ways; individuals are also less likely to support AI-enabled technologies when applied directly to their life, especially if technology automates tasks they are already familiar with operating. Finally, opposition to AI-enabled military applications has slightly increased over time.
ContactArt: Learning 3D Interaction Priors for Category-level Articulated Object and Hand Poses Estimation
Authors: Zehao Zhu, Jiashun Wang, Yuzhe Qin, Deqing Sun, Varun Jampani, Xiaolong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.01618
Pdf link: https://arxiv.org/pdf/2305.01618
Abstract We propose a new dataset and a novel approach to learning hand-object interaction priors for hand and articulated object pose estimation. We first collect a dataset using visual teleoperation, where the human operator can directly play within a physical simulator to manipulate the articulated objects. We record the data and obtain free and accurate annotations on object poses and contact information from the simulator. Our system only requires an iPhone to record human hand motion, which can be easily scaled up and largely lower the costs of data and annotation collection. With this data, we learn 3D interaction priors including a discriminator (in a GAN) capturing the distribution of how object parts are arranged, and a diffusion model which generates the contact regions on articulated objects, guiding the hand pose estimation. Such structural and contact priors can easily transfer to real-world data with barely any domain gap. By using our data and learned priors, our method significantly improves the performance on joint hand and articulated object poses estimation over the existing state-of-the-art methods. The project is available at https://zehaozhu.github.io/ContactArt/ .
Keyword: dynamic

Attention-based Spatial-Temporal Graph Neural ODE for Traffic Prediction
Authors: Weiheng Zhong, Hadi Meidani, Jane Macfarlane
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00985
Pdf link: https://arxiv.org/pdf/2305.00985
Abstract Traffic forecasting is an important issue in intelligent traffic systems (ITS). Graph neural networks (GNNs) are effective deep learning models to capture the complex spatio-temporal dependency of traffic data, achieving ideal prediction performance. In this paper, we propose attention-based graph neural ODE (ASTGODE) that explicitly learns the dynamics of the traffic system, which makes the prediction of our machine learning model more explainable. Our model aggregates traffic patterns of different periods and has satisfactory performance on two real-world traffic data sets. The results show that our model achieves the highest accuracy of the root mean square error metric among all the existing GNN models in our experiments.
Software Runtime Monitoring with Adaptive Sampling Rate to Collect Representative Samples of Execution Traces
Authors: Jhonny Mertz, Ingrid Nunes
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2305.01039
Pdf link: https://arxiv.org/pdf/2305.01039
Abstract Monitoring software systems at runtime is key for understanding workloads, debugging, and self-adaptation. It typically involves collecting and storing observable software data, which can be analyzed online or offline. Despite the usefulness of collecting system data, it may significantly impact the system execution by delaying response times and competing with system resources. The typical approach to cope with this is to filter portions of the system to be monitored and to sample data. Although these approaches are a step towards achieving a desired trade-off between the amount of collected information and the impact on the system performance, they focus on collecting data of a particular type or may capture a sample that does not correspond to the actual system behavior. In response, we propose an adaptive runtime monitoring process to dynamically adapt the sampling rate while monitoring software systems. It includes algorithms with statistical foundations to improve the representativeness of collected samples without compromising the system performance. Our evaluation targets five applications of a widely used benchmark. It shows that the error (RMSE) of the samples collected with our approach is 9-54% lower than the main alternative strategy (sampling rate inversely proportional to the throughput), with 1-6% higher performance impact.
Right HTML, Wrong JSON: Challenges in Replaying Archived Webpages Built with Client-Side Rendering
Authors: Michele C. Weigle, Michael L. Nelson, Sawood Alam, Mark Graham
Subjects: Digital Libraries (cs.DL)
Arxiv link: https://arxiv.org/abs/2305.01071
Pdf link: https://arxiv.org/pdf/2305.01071
Abstract Many web sites are transitioning how they construct their pages. The conventional model is where the content is embedded server-side in the HTML and returned to the client in an HTTP response. Increasingly, sites are moving to a model where the initial HTTP response contains only an HTML skeleton plus JavaScript that makes API calls to a variety of servers for the content (typically in JSON format), and then builds out the DOM client-side, more easily allowing for periodically refreshing the content in a page and allowing dynamic modification of the content. This client-side rendering, now predominant in social media platforms such as Twitter and Instagram, is also being adopted by news outlets, such as CNN.com. When conventional web archiving techniques, such as crawling with Heritrix, are applied to pages that render their content client-side, the JSON responses can become out of sync with the HTML page in which it is to be embedded, resulting in temporal violations on replay. Because the violative JSON is not directly observable in the page (i.e., in the same manner a violative embedded image is), the temporal violations can be difficult to detect. We describe how the top level CNN.com page has used client-side rendering since April 2015 and the impact this has had on web archives. Between April 24, 2015 and July 21, 2016, we found almost 15,000 mementos with a temporal violation of more than 2 days between the base CNN.com HTML and the JSON responses used to deliver the content under the main story. One way to mitigate this problem is to use browser-based crawling instead of conventional crawlers like Heritrix, but browser-based crawling is currently much slower than non-browser-based tools such as Heritrix.
Autoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems
Authors: Kevin Zeng, Michael D. Graham
Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
Arxiv link: https://arxiv.org/abs/2305.01090
Pdf link: https://arxiv.org/pdf/2305.01090
Abstract While many phenomena in physics and engineering are formally high-dimensional, their long-time dynamics often live on a lower-dimensional manifold. The present work introduces an autoencoder framework that combines implicit regularization with internal linear layers and $L_2$ regularization (weight decay) to automatically estimate the underlying dimensionality of a data set, produce an orthogonal manifold coordinate system, and provide the mapping functions between the ambient space and manifold space, allowing for out-of-sample projections. We validate our framework's ability to estimate the manifold dimension for a series of datasets from dynamical systems of varying complexities and compare to other state-of-the-art estimators. We analyze the training dynamics of the network to glean insight into the mechanism of low-rank learning and find that collectively each of the implicit regularizing layers compound the low-rank representation and even self-correct during training. Analysis of gradient descent dynamics for this architecture in the linear case reveals the role of the internal linear layers in leading to faster decay of a "collective weight variable" incorporating all layers, and the role of weight decay in breaking degeneracies and thus driving convergence along directions in which no decay would occur in its absence. We show that this framework can be naturally extended for applications of state-space modeling and forecasting by generating a data-driven dynamic model of a spatiotemporally chaotic partial differential equation using only the manifold coordinates. Finally, we demonstrate that our framework is robust to hyperparameter choices.
Learning Controllable Adaptive Simulation for Multi-resolution Physics
Authors: Tailin Wu, Takashi Maruyama, Qingqing Zhao, Gordon Wetzstein, Jure Leskovec
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2305.01122
Pdf link: https://arxiv.org/pdf/2305.01122
Abstract Simulating the time evolution of physical systems is pivotal in many scientific and engineering problems. An open challenge in simulating such systems is their multi-resolution dynamics: a small fraction of the system is extremely dynamic, and requires very fine-grained resolution, while a majority of the system is changing slowly and can be modeled by coarser spatial scales. Typical learning-based surrogate models use a uniform spatial scale, which needs to resolve to the finest required scale and can waste a huge compute to achieve required accuracy. In this work, we introduce Learning controllable Adaptive simulation for Multi-resolution Physics (LAMP) as the first full deep learning-based surrogate model that jointly learns the evolution model and optimizes appropriate spatial resolutions that devote more compute to the highly dynamic regions. LAMP consists of a Graph Neural Network (GNN) for learning the forward evolution, and a GNN-based actor-critic for learning the policy of spatial refinement and coarsening. We introduce learning techniques that optimizes LAMP with weighted sum of error and computational cost as objective, allowing LAMP to adapt to varying relative importance of error vs. computation tradeoff at inference time. We evaluate our method in a 1D benchmark of nonlinear PDEs and a challenging 2D mesh-based simulation. We demonstrate that our LAMP outperforms state-of-the-art deep learning surrogate models, and can adaptively trade-off computation to improve long-term prediction error: it achieves an average of 33.7% error reduction for 1D nonlinear PDEs, and outperforms MeshGraphNets + classical Adaptive Mesh Refinement (AMR) in 2D mesh-based simulations. Project website with data and code can be found at: this http URL
Analysis of different temporal graph neural network configurations on dynamic graphs
Authors: Rishu Verma, Ashmita Bhattacharya, Sai Naveen Katla
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2305.01128
Pdf link: https://arxiv.org/pdf/2305.01128
Abstract In recent years, there has been an increasing interest in the use of graph neural networks (GNNs) for analyzing dynamic graphs, which are graphs that evolve over time. However, there is still a lack of understanding of how different temporal graph neural network (TGNs) configurations can impact the accuracy of predictions on dynamic graphs. Moreover, the hunt for benchmark datasets for these TGNs models is still ongoing. Up until recently, Pytorch Geometric Temporal came up with a few benchmark datasets but most of these datasets have not been analyzed with different TGN models to establish the state-of-the-art. Therefore, this project aims to address this gap in the literature by performing a qualitative analysis of spatial-temporal dependence structure learning on dynamic graphs, as well as a comparative study of the effectiveness of selected TGNs on node and edge prediction tasks. Additionally, an extensive ablation study will be conducted on different variants of the best-performing TGN to identify the key factors contributing to its performance. By achieving these objectives, this project will provide valuable insights into the design and optimization of TGNs for dynamic graph analysis, with potential applications in areas such as disease spread prediction, social network analysis, traffic prediction, and more. Moreover, an attempt is made to convert snapshot-based data to the event-based dataset and make it compatible with the SOTA model namely TGN to perform node regression task.
PGrad: Learning Principal Gradients For Domain Generalization
Authors: Zhe Wang, Jake Grigsby, Yanjun Qi
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01134
Pdf link: https://arxiv.org/pdf/2305.01134
Abstract Machine learning models fail to perform when facing out-of-distribution (OOD) domains, a challenging task known as domain generalization (DG). In this work, we develop a novel DG training strategy, we call PGrad, to learn a robust gradient direction, improving models' generalization ability on unseen domains. The proposed gradient aggregates the principal directions of a sampled roll-out optimization trajectory that measures the training dynamics across all training domains. PGrad's gradient design forces the DG training to ignore domain-dependent noise signals and updates all training domains with a robust direction covering main components of parameter dynamics. We further improve PGrad via bijection-based computational refinement and directional plus length-based calibrations. Our theoretical proof connects PGrad to the spectral analysis of Hessian in training neural networks. Experiments on DomainBed and WILDS benchmarks demonstrate that our approach effectively enables robust DG optimization and leads to smoothly decreased loss curves. Empirically, PGrad achieves competitive results across seven datasets, demonstrating its efficacy across both synthetic and real-world distributional shifts. Code is available at https://github.com/QData/PGrad.
Ripple Knowledge Graph Convolutional Networks For Recommendation Systems
Authors: Chen Li, Yang Cao, Ye Zhu, Debo Cheng, Chengyuan Li, Yasuhiko Morimoto
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01147
Pdf link: https://arxiv.org/pdf/2305.01147
Abstract Using knowledge graphs to assist deep learning models in making recommendation decisions has recently been proven to effectively improve the model's interpretability and accuracy. This paper introduces an end-to-end deep learning model, named RKGCN, which dynamically analyses each user's preferences and makes a recommendation of suitable items. It combines knowledge graphs on both the item side and user side to enrich their representations to maximize the utilization of the abundant information in knowledge graphs. RKGCN is able to offer more personalized and relevant recommendations in three different scenarios. The experimental results show the superior effectiveness of our model over 5 baseline models on three real-world datasets including movies, books, and music.
Early Classifying Multimodal Sequences
Authors: Alexander Cao, Jean Utke, Diego Klabjan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01151
Pdf link: https://arxiv.org/pdf/2305.01151
Abstract Often pieces of information are received sequentially over time. When did one collect enough such pieces to classify? Trading wait time for decision certainty leads to early classification problems that have recently gained attention as a means of adapting classification to more dynamic environments. However, so far results have been limited to unimodal sequences. In this pilot study, we expand into early classifying multimodal sequences by combining existing methods. We show our new method yields experimental AUC advantages of up to 8.7%.
Optimizing Guided Traversal for Fast Learned Sparse Retrieval
Authors: Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2305.01203
Pdf link: https://arxiv.org/pdf/2305.01203
Abstract Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.
Structure Aware Incremental Learning with Personalized Imitation Weights for Recommender Systems
Authors: Yuening Wang, Yingxue Zhang, Antonios Valkanas, Ruiming Tang, Chen Ma, Jianye Hao, Mark Coates
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2305.01204
Pdf link: https://arxiv.org/pdf/2305.01204
Abstract Recommender systems now consume large-scale data and play a significant role in improving user experience. Graph Neural Networks (GNNs) have emerged as one of the most effective recommender system models because they model the rich relational information. The ever-growing volume of data can make training GNNs prohibitively expensive. To address this, previous attempts propose to train the GNN models incrementally as new data blocks arrive. Feature and structure knowledge distillation techniques have been explored to allow the GNN model to train in a fast incremental fashion while alleviating the catastrophic forgetting problem. However, preserving the same amount of the historical information for all users is sub-optimal since it fails to take into account the dynamics of each user's change of preferences. For the users whose interests shift substantially, retaining too much of the old knowledge can overly constrain the model, preventing it from quickly adapting to the users' novel interests. In contrast, for users who have static preferences, model performance can benefit greatly from preserving as much of the user's long-term preferences as possible. In this work, we propose a novel training strategy that adaptively learns personalized imitation weights for each user to balance the contribution from the recent data and the amount of knowledge to be distilled from previous time periods. We demonstrate the effectiveness of learning imitation weights via a comparison on five diverse datasets for three state-of-art structure distillation based recommender systems. The performance shows consistent improvement over competitive incremental learning techninques.
Dynamic Scheduling for Federated Edge Learning with Streaming Data
Authors: Chung-Hsuan Hu, Zheng Chen, Erik G. Larsson
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2305.01238
Pdf link: https://arxiv.org/pdf/2305.01238
Abstract In this work, we consider a Federated Edge Learning (FEEL) system where training data are randomly generated over time at a set of distributed edge devices with long-term energy constraints. Due to limited communication resources and latency requirements, only a subset of devices is scheduled for participating in the local training process in every iteration. We formulate a stochastic network optimization problem for designing a dynamic scheduling policy that maximizes the time-average data importance from scheduled user sets subject to energy consumption and latency constraints. Our proposed algorithm based on the Lyapunov optimization framework outperforms alternative methods without considering time-varying data importance, especially when the generation of training data shows strong temporal correlation.
Sim2real and Digital Twins in Autonomous Driving: A Survey
Authors: Xuemin Hu, Shen Li, Tingyu Huang, Bo Tang, Long Chen
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.01263
Pdf link: https://arxiv.org/pdf/2305.01263
Abstract Safety and cost are two important concerns for the development of autonomous driving technologies. From the academic research to commercial applications of autonomous driving vehicles, sufficient simulation and real world testing are required. In general, a large scale of testing in simulation environment is conducted and then the learned driving knowledge is transferred to the real world, so how to adapt driving knowledge learned in simulation to reality becomes a critical issue. However, the virtual simulation world differs from the real world in many aspects such as lighting, textures, vehicle dynamics, and agents' behaviors, etc., which makes it difficult to bridge the gap between the virtual and real worlds. This gap is commonly referred to as the reality gap (RG). In recent years, researchers have explored various approaches to address the reality gap issue, which can be broadly classified into two categories: transferring knowledge from simulation to reality (sim2real) and learning in digital twins (DTs). In this paper, we consider the solutions through the sim2real and DTs technologies, and review important applications and innovations in the field of autonomous driving. Meanwhile, we show the state-of-the-arts from the views of algorithms, models, and simulators, and elaborate the development process from sim2real to DTs. The presentation also illustrates the far-reaching effects of the development of sim2real and DTs in autonomous driving.
Exploring vision transformer layer choosing for semantic segmentation
Authors: Fangjian Lin, Yizhe Ma, Shengwei Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.01279
Pdf link: https://arxiv.org/pdf/2305.01279
Abstract Extensive work has demonstrated the effectiveness of Vision Transformers. The plain Vision Transformer tends to obtain multi-scale features by selecting fixed layers, or the last layer of features aiming to achieve higher performance in dense prediction tasks. However, this selection is often based on manual operation. And different samples often exhibit different features at different layers (e.g., edge, structure, texture, detail, etc.). This requires us to seek a dynamic adaptive fusion method to filter different layer features. In this paper, unlike previous encoder and decoder work, we design a neck network for adaptive fusion and feature selection, called ViTController. We validate the effectiveness of our method on different datasets and models and surpass previous state-of-the-art methods. Finally, our method can also be used as a plug-in module and inserted into different networks.
Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators
Authors: Manos Pavlidakis, Stelios Mavridis, Antony Chazapis, Giorgos Vasiliadis, Angelos Bilas
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.01291
Pdf link: https://arxiv.org/pdf/2305.01291
Abstract Today, using multiple heterogeneous accelerators efficiently from applications and high-level frameworks, such as TensorFlow and Caffe, poses significant challenges in three respects: (a) sharing accelerators, (b) allocating available resources elastically during application execution, and (c) reducing the required programming effort. In this paper, we present Arax, a runtime system that decouples applications from heterogeneous accelerators within a server. First, Arax maps application tasks dynamically to available resources, managing all required task state, memory allocations, and task dependencies. As a result, Arax can share accelerators across applications in a server and adjust the resources used by each application as load fluctuates over time. dditionally, Arax offers a simple API and includes Autotalk, a stub generator that automatically generates stub libraries for applications already written for specific accelerator types, such as NVIDIA GPUs. Consequently, Arax applications are written once without considering physical details, including the number and type of accelerators. Our results show that applications, such as Caffe, TensorFlow, and Rodinia, can run using Arax with minimum effort and low overhead compared to native execution, about 12% (geometric mean). Arax supports efficient accelerator sharing, by offering up to 20% improved execution times compared to NVIDIA MPS, which supports NVIDIA GPUs only. Arax can transparently provide elasticity, decreasing total application turn-around time by up to 2x compared to native execution without elasticity support.
Validation of massively-parallel adaptive testing using dynamic control matching
Authors: Schaun Wheeler
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2305.01334
Pdf link: https://arxiv.org/pdf/2305.01334
Abstract A/B testing is a widely-used paradigm within marketing optimization because it promises identification of causal effects and because it is implemented out of the box in most messaging delivery software platforms. Modern businesses, however, often run many A/B/n tests at the same time and in parallel, and package many content variations into the same messages, not all of which are part of an explicit test. Whether as the result of many teams testing at the same time, or as part of a more sophisticated reinforcement learning (RL) approach that continuously adapts tests and test condition assignment based on previous results, dynamic parallel testing cannot be evaluated the same way traditional A/B tests are evaluated. This paper presents a method for disentangling the causal effects of the various tests under conditions of continuous test adaptation, using a matched-synthetic control group that adapts alongside the tests.
Physics-Informed Learning Using Hamiltonian Neural Networks with Output Error Noise Models
Authors: Sarvin Moradi, Nick Jaensson, Roland Tóth, Maarten Schoukens
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01338
Pdf link: https://arxiv.org/pdf/2305.01338
Abstract In order to make data-driven models of physical systems interpretable and reliable, it is essential to include prior physical knowledge in the modeling framework. Hamiltonian Neural Networks (HNNs) implement Hamiltonian theory in deep learning and form a comprehensive framework for modeling autonomous energy-conservative systems. Despite being suitable to estimate a wide range of physical system behavior from data, classical HNNs are restricted to systems without inputs and require noiseless state measurements and information on the derivative of the state to be available. To address these challenges, this paper introduces an Output Error Hamiltonian Neural Network (OE-HNN) modeling approach to address the modeling of physical systems with inputs and noisy state measurements. Furthermore, it does not require the state derivatives to be known. Instead, the OE-HNN utilizes an ODE-solver embedded in the training process, which enables the OE-HNN to learn the dynamics from noisy state measurements. In addition, extending HNNs based on the generalized Hamiltonian theory enables to include external inputs into the framework which are important for engineering applications. We demonstrate via simulation examples that the proposed OE-HNNs results in superior modeling performance compared to classical HNNs.
A Quadtree for Hyperbolic Space
Authors: Sándor Kisfaludi-Bak, Geert van Wordragen
Subjects: Computational Geometry (cs.CG)
Arxiv link: https://arxiv.org/abs/2305.01356
Pdf link: https://arxiv.org/pdf/2305.01356
Abstract We propose a data structure in d-dimensional hyperbolic space that can be considered a natural counterpart to quadtrees in Euclidean spaces. Based on this data structure we propose a so-called L-order for hyperbolic point sets, which is an extension of the Z-order defined in Euclidean spaces. We demonstrate the usefulness of our hyperbolic quadtree data structure by giving an algorithm for constant-approximate closest pair and dynamic constant-approximate nearest neighbours in hyperbolic space of constant dimension d.
Diddy: a Python toolbox for infinite discrete dynamical systems
Authors: Ville Salo, Ilkka Törmä
Subjects: Mathematical Software (cs.MS); Discrete Mathematics (cs.DM); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2305.01375
Pdf link: https://arxiv.org/pdf/2305.01375
Abstract We introduce Diddy, a collection of Python scripts for analyzing infinite discrete dynamical systems. The main focus is on generalized multidimensional shifts of finite type (SFTs). We show how Diddy can be used to easily define SFTs and cellular automata, and analyze their basic properties. We also showcase how to verify or rediscover some results from coding theory and cellular automata theory.
Get Back Here: Robust Imitation by Return-to-Distribution Planning
Authors: Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin, Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin, Robert Dadashi
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.01400
Pdf link: https://arxiv.org/pdf/2305.01400
Abstract We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version. To address the resulting distribution shift, we combine behavior cloning (BC) with a planner that is tasked to bring the agent back to states visited by the expert whenever the agent deviates from the demonstration distribution. The resulting algorithm, POIR, can be trained offline, and leverages online interactions to efficiently fine-tune its planner to improve performance over time. We test POIR on a variety of human-generated manipulation demonstrations in a realistic robotic manipulation simulator and show robustness of the learned policy to different initial state distributions and noisy dynamics.
Absolute integrability of Mercer kernels is only sufficient for RKHS stability
Authors: Mauro Bisiacco, Gianluigi Pillonetto
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01411
Pdf link: https://arxiv.org/pdf/2305.01411
Abstract Reproducing kernel Hilbert spaces (RKHSs) are special Hilbert spaces in one-to-one correspondence with positive definite maps called kernels. They are widely employed in machine learning to reconstruct unknown functions from sparse and noisy data. In the last two decades, a subclass known as stable RKHSs has been also introduced in the setting of linear system identification. Stable RKHSs contain only absolutely integrable impulse responses over the positive real line. Hence, they can be adopted as hypothesis spaces to estimate linear, time-invariant and BIBO stable dynamic systems from input-output data. Necessary and sufficient conditions for RKHS stability are available in the literature and it is known that kernel absolute integrability implies stability. Working in discrete-time, in a recent work we have proved that this latter condition is only sufficient. Working in continuous-time, it is the purpose of this note to prove that the same result holds also for Mercer kernels.
Borinot: an agile torque-controlled robot for hybrid flying and contact loco-manipulation (workshop version)
Authors: Josep Marti-Saumell, Joan Sola, Angel Santamaria-Navarro, Hugo Duarte
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.01423
Pdf link: https://arxiv.org/pdf/2305.01423
Abstract This paper introduces Borinot, an open-source flying robotic platform designed to perform hybrid agile locomotion and manipulation. This platform features a compact and powerful hexarotor that can be outfitted with torque-actuated extremities of diverse architecture, allowing for whole-body dynamic control. As a result, Borinot can perform agile tasks such as aggressive or acrobatic maneuvers with the participation of the whole-body dynamics. The extremities attached to Borinot can be utilized in various ways; during contact, they can be used as legs to create contact-based locomotion, or as arms to manipulate objects. In free flight, they can be used as tails to contribute to dynamics, mimicking the movements of many animals. This allows for any hybridization of these dynamic modes, like the jump-flight of chicken and locusts, making Borinot an ideal open-source platform for research on hybrid aerial-contact agile motion. To demonstrate the key capabilities of Borinot, we have fitted a planar 2DoF arm and implemented whole-body torque-level model-predictive-control. The result is a capable and adaptable platform that, we believe, opens up new avenues of research in the field of agile robotics.
Mixed-Integer Optimal Control via Reinforcement Learning: A Case Study on Hybrid Vehicle Energy Management
Authors: Jinming Xu, Yuan Lin
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.01461
Pdf link: https://arxiv.org/pdf/2305.01461
Abstract Many optimal control problems require the simultaneous output of continuous and discrete control variables. Such problems are usually formulated as mixed-integer optimal control (MIOC) problems, which are challenging to solve due to the complexity of the solution space. Numerical methods such as branch-and-bound are computationally expensive and unsuitable for real-time control. This paper proposes a novel continuous-discrete reinforcement learning (CDRL) algorithm, twin delayed deep deterministic actor-Q (TD3AQ), for MIOC problems. TD3AQ combines the advantages of both actor-critic and Q-learning methods, and can handle the continuous and discrete action spaces simultaneously. The proposed algorithm is evaluated on a hybrid electric vehicle (HEV) energy management problem, where real-time control of the continuous variable engine torque and discrete variable gear ratio is essential to maximize fuel economy while satisfying driving constraints. Simulation results on different drive cycles show that TD3AQ can achieve near-optimal solutions compared to dynamic programming (DP) and outperforms the state-of-the-art discrete RL algorithm Rainbow, which is adopted for MIOC by discretizing continuous actions into a finite set of discrete values.
H2 optimal model reduction on general domains
Authors: Alessandro Borghi, Tobias Breiten
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2305.01511
Pdf link: https://arxiv.org/pdf/2305.01511
Abstract Optimal model reduction for large-scale linear dynamical systems is studied. In contrast to most existing works, the systems under consideration are not required to be stable, neither in discrete nor in continuous time. As a consequence, the underlying rational transfer functions are allowed to have poles in general domains in the complex plane. In particular, this covers the case of specific conservative partial differential equations such as the linear Schr\"odinger and the undamped linear wave equation with spectra on the imaginary axis. By an appropriate modification of the classical continuous time Hardy space $\mathcal{H}_2$, a new $\mathcal{H}_2$ like optimal model reduction problem is introduced and first order optimality conditions are derived. As in the classical $\mathcal{H}_2$ case, these conditions exhibit a rational Hermite interpolation structure for which an iterative model reduction algorithm is proposed. Numerical examples demonstrate the effectiveness of the new method.
Curriculum Modeling the Dependence among Targets with Multi-task Learning for Financial Marketing
Authors: Yunpeng Weng, Xing Tang, Liang Chen, Xiuqiang He
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.01514
Pdf link: https://arxiv.org/pdf/2305.01514
Abstract Multi-task learning for various real-world applications usually involves tasks with logical sequential dependence. For example, in online marketing, the cascade behavior pattern of $impression \rightarrow click \rightarrow conversion$ is usually modeled as multiple tasks in a multi-task manner, where the sequential dependence between tasks is simply connected with an explicitly defined function or implicitly transferred information in current works. These methods alleviate the data sparsity problem for long-path sequential tasks as the positive feedback becomes sparser along with the task sequence. However, the error accumulation and negative transfer will be a severe problem for downstream tasks. Especially, at the beginning stage of training, the optimization for parameters of former tasks is not converged yet, and thus the information transferred to downstream tasks is negative. In this paper, we propose a prior information merged model (\textbf{PIMM}), which explicitly models the logical dependence among tasks with a novel prior information merged (\textbf{PIM}) module for multiple sequential dependence task learning in a curriculum manner. Specifically, the PIM randomly selects the true label information or the prior task prediction with a soft sampling strategy to transfer to the downstream task during the training. Following an easy-to-difficult curriculum paradigm, we dynamically adjust the sampling probability to ensure that the downstream task will get the effective information along with the training. The offline experimental results on both public and product datasets verify that PIMM outperforms state-of-the-art baselines. Moreover, we deploy the PIMM in a large-scale FinTech platform, and the online experiments also demonstrate the effectiveness of PIMM.
Unlocking the Power of Representations in Long-term Novelty-based Exploration
Authors: Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.01521
Pdf link: https://arxiv.org/pdf/2305.01521
Abstract We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!".
FlexEdge: Digital Twin-Enabled Task Offloading for UAV-Aided Vehicular Edge Computing
Authors: Bin Li, Wancheng Xie, Yinghui Ye, Lei Liu, Zesong Fei
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2305.01536
Pdf link: https://arxiv.org/pdf/2305.01536
Abstract Integrating unmanned aerial vehicles (UAVs) into vehicular networks have shown high potentials in affording intensive computing tasks. In this paper, we study the digital twin driven vehicular edge computing networks for adaptively computing resource management where an unmanned aerial vehicle (UAV) named FlexEdge acts as a flying server. In particular, we first formulate an energy consumption minimization problem by jointly optimizing UAV trajectory and computation resource under the practical constraints. To address such a challenging problem, we then build the computation offloading process as a Markov decision process and propose a deep reinforcement learning-based proximal policy optimization algorithm to dynamically learn the computation offloading strategy and trajectory design policy. Numerical results indicate that our proposed algorithm can achieve quick convergence rate and significantly reduce the system energy consumption.
Teaching data-driven control: from linear design to adaptive control with throttle valves
Authors: Emmanuel Witrant, Ioan DorÉ Landau, Marie-Pierre Vaillant
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.01567
Pdf link: https://arxiv.org/pdf/2305.01567
Abstract Electric throttle valves represent a challenge for control design, as their dynamics involve strong nonlinearities, characterized by an asymmetric hysteresis. Carrying experiments on multiple valves, a large variability in the characteristics of each valve and erratic steady-state behaviors can also be noticed, impairing classical model-based control strategies. Nevertheless, local data-driven linear models can be obtained and simple proportional-integral (PI) controllers, tuned individually for each valve with the appropriate data set, provide good tracking performance. As these controllers cannot be transposed from one valve to another, a robust strategy and an adaptive controller (using identification in closed-loop and controller re-design) may be necessary to propose a general method. This work aims at promoting control education on a simple yet challenging process, going from frequency analysis and linear design to an adaptive control method implemented with an online recursive algorithm.
H2CGL: Modeling Dynamics of Citation Network for Impact Prediction
Authors: Guoxiu He, Zhikai Xue, Zhuoren Jiang, Yangyang Kang, Star Zhao, Wei Lu
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.01572
Pdf link: https://arxiv.org/pdf/2305.01572
Abstract The potential impact of a paper is often quantified by how many citations it will receive. However, most commonly used models may underestimate the influence of newly published papers over time, and fail to encapsulate this dynamics of citation network into the graph. In this study, we construct hierarchical and heterogeneous graphs for target papers with an annual perspective. The constructed graphs can record the annual dynamics of target papers' scientific context information. Then, a novel graph neural network, Hierarchical and Heterogeneous Contrastive Graph Learning Model (H2CGL), is proposed to incorporate heterogeneity and dynamics of the citation network. H2CGL separately aggregates the heterogeneous information for each year and prioritizes the highly-cited papers and relationships among references, citations, and the target paper. It then employs a weighted GIN to capture dynamics between heterogeneous subgraphs over years. Moreover, it leverages contrastive learning to make the graph representations more sensitive to potential citations. Particularly, co-cited or co-citing papers of the target paper with large citation gap are taken as hard negative samples, while randomly dropping low-cited papers could generate positive samples. Extensive experimental results on two scholarly datasets demonstrate that the proposed H2CGL significantly outperforms a series of baseline approaches for both previously and freshly published papers. Additional analyses highlight the significance of the proposed modules. Our codes and settings have been released on Github (https://github.com/ECNU-Text-Computing/H2CGL)
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Authors: Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, Dimitris Bertsimas
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.01610
Pdf link: https://arxiv.org/pdf/2305.01610
Abstract Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how high-level human-interpretable features are represented within the internal neuron activations of LLMs. We train $k$-sparse linear classifiers (probes) on these internal activations to predict the presence of features in the input; by varying the value of $k$ we study the sparsity of learned representations and how this varies with model scale. With $k=1$, we localize individual neurons which are highly relevant for a particular feature, and perform a number of case studies to illustrate general properties of LLMs. In particular, we show that early layers make use of sparse combinations of neurons to represent many features in superposition, that middle layers have seemingly dedicated neurons to represent higher-level contextual features, and that increasing scale causes representational sparsity to increase on average, but there are multiple types of scaling dynamics. In all, we probe for over 100 unique features comprising 10 different categories in 7 different models spanning 70 million to 6.9 billion parameters.
AutoColor: Learned Light Power Control for Multi-Color Holograms
Authors: Yicheng Zhan, Koray Kavaklı, Hakan Urey, Qi Sun, Kaan Akşit
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2305.01611
Pdf link: https://arxiv.org/pdf/2305.01611
Abstract Multi-color holograms rely on simultaneous illumination from multiple light sources. These multi-color holograms could utilize light sources better than conventional single-color holograms and can improve the dynamic range of holographic displays. In this letter, we introduce \projectname, the first learned method for estimating the optimal light source powers required for illuminating multi-color holograms. For this purpose, we establish the first multi-color hologram dataset using synthetic images and their depth information. We generate these synthetic images using a trending pipeline combining generative, large language, and monocular depth estimation models. Finally, we train our learned model using our dataset and experimentally demonstrate that \projectname significantly decreases the number of steps required to optimize multi-color holograms from $>1000$ to $70$ iteration steps without compromising image quality.
Key-Locked Rank One Editing for Text-to-Image Personalization
Authors: Yoad Tewel, Rinon Gal, Gal Chechik, Yuval Atzmon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2305.01644
Pdf link: https://arxiv.org/pdf/2305.01644
Abstract Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that "locks" new concepts' cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art. Moreover, it can span different operating points across the Pareto front without additional training. Finally, we show that Perfusion outperforms strong baselines in both qualitative and quantitative terms. Importantly, key-locking leads to novel results compared to traditional approaches, allowing to portray personalized object interactions in unprecedented ways, even in one-shot settings.

A-suozhang / GetArxivDaily

New submissions for Wed, 3 May 23 #49

Keyword: efficient

Two-phase Dual COPOD Method for Anomaly Detection in Industrial Control System

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

Hardware implementation of digital memcomputing on small-size FPGAs

Robust Communication Complexity of Matching: EDCS Achieves 5/6 Approximation

Fast Path Planning Through Large Collections of Safe Boxes

Design and Evaluation of a Bioinspired Tendon-Driven 3D-Printed Robotic Eye with Active Vision Capabilities

An Update-intensive LSM-based R-tree Index

RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models

Unbounded Differentially Private Quantile and Maximum Estimation

LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar

Exploration of Unranked Items in Safe Online Learning to Re-Rank

Chronosymbolic Learning: Efficient CHC Solving with Symbolic Reasoning and Inductive Learning

Rate-Compatible Polar Codes for Automorphism Ensemble Decoding

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

Updatable Learned Indexes Meet Disk-Resident DBMS -- From Evaluations to Design Choices

Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators

Higher-Order GFDM for Linear Elliptic Operators

Guaranteeing Envy-Freeness under Generalized Assignment Constraints

Next-Generation Full Duplex Networking System Empowered by Reconfigurable Intelligent Surfaces

Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing

Infrastructural Requirements and Regulatory Challenges of a Sustainable Urban Air Mobility Ecosystem

Get Back Here: Robust Imitation by Return-to-Distribution Planning

Trade-off Between Optimal Efficiency and Envelope Correlation Coefficient for Antenna Clusters

An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots

An Efficient Quadratic Interpolation Scheme for a Third-Order Cell-Centered Finite-Volume Method on Tetrahedral Grids

Stochastic Contextual Bandits with Graph-based Contexts

Efficient Sensitivity Analysis for Parametric Robust Markov Chains

Building Reliable Budget-Based Binary-State Networks

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

Unlocking the Power of Representations in Long-term Novelty-based Exploration

Faster 0-1-Knapsack via Near-Convex Min-Plus-Convolution

Augmented Electronic Ising Machine as an Effective SAT Solver

Sequence Modeling with Multiresolution Convolutional Memory

Key-Locked Rank One Editing for Text-to-Image Personalization

Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models

Keyword: faster

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

Autoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems

Faster OreFSDet : A Lightweight and Effective Few-shot Object Detector for Ore Images

Optimizing Guided Traversal for Fast Learned Sparse Retrieval

The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold

Keyword: mobile

Development of IoT Smart Greenhouse System for Hydroponic Gardens

HuNavSim: A ROS 2 Human Navigation Simulator for Benchmarking Human-Aware Robot Navigation

Next-Generation Full Duplex Networking System Empowered by Reconfigurable Intelligent Surfaces

Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing

A Mobile Quad-Arm Robot ARMS: Wheel-Legged Tripedal Mobility and Quad-Arm Manipulation

Trade-off Between Optimal Efficiency and Envelope Correlation Coefficient for Antenna Clusters

On the Collaborative Object Transportation Using Leader Follower Approach

Keyword: pruning

Optimizing Guided Traversal for Fast Learned Sparse Retrieval

Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing

Keyword: voxel

Keyword: lidar

A New Wave in Robotics: Survey on Recent mmWave Radar Applications in Robotics

Safe Autonomous Driving in Adverse Weather: Sensor Evaluation and Performance Monitoring

FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow

Neural LiDAR Fields for Novel View Synthesis

Keyword: diffusion

In-Context Learning Unlocked for Diffusion Models

Geometric Latent Diffusion Models for 3D Molecule Generation

Solving Inverse Problems with Score-Based Generative Priors learned from Noisy Data

DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling

Long-Term Rhythmic Video Soundtracker

Higher-Order GFDM for Linear Elliptic Operators

Adopting AI: How Familiarity Breeds Both Trust and Contempt

ContactArt: Learning 3D Interaction Priors for Category-level Articulated Object and Hand Poses Estimation

Keyword: dynamic

Attention-based Spatial-Temporal Graph Neural ODE for Traffic Prediction

Software Runtime Monitoring with Adaptive Sampling Rate to Collect Representative Samples of Execution Traces

Right HTML, Wrong JSON: Challenges in Replaying Archived Webpages Built with Client-Side Rendering

Autoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems

Learning Controllable Adaptive Simulation for Multi-resolution Physics

Analysis of different temporal graph neural network configurations on dynamic graphs

PGrad: Learning Principal Gradients For Domain Generalization

Ripple Knowledge Graph Convolutional Networks For Recommendation Systems