New submissions for Wed, 19 Apr 23

Keyword: efficient

Model-Driven Quantum Federated Learning (QFL)

Authors: Armin Moin, Atta Badii, Moharram Challenger
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08496
Pdf link: https://arxiv.org/pdf/2304.08496
Abstract Recently, several studies have proposed frameworks for Quantum Federated Learning (QFL). For instance, the Google TensorFlow Quantum (TFQ) and TensorFlow Federated (TFF) libraries have been deployed for realizing QFL. However, developers, in the main, are not as yet familiar with Quantum Computing (QC) libraries and frameworks. A Domain-Specific Modeling Language (DSML) that provides an abstraction layer over the underlying QC and Federated Learning (FL) libraries would be beneficial. This could enable practitioners to carry out software development and data science tasks efficiently while deploying the state of the art in Quantum Machine Learning (QML). In this position paper, we propose extending existing domain-specific Model-Driven Engineering (MDE) tools for Machine Learning (ML) enabled systems, such as MontiAnna, ML-Quadrat, and GreyCat, to support QFL.
CyFormer: Accurate State-of-Health Prediction of Lithium-Ion Batteries via Cyclic Attention
Authors: Zhiqiang Nie, Jiankun Zhao, Qicheng Li, Yong Qin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.08502
Pdf link: https://arxiv.org/pdf/2304.08502
Abstract Predicting the State-of-Health (SoH) of lithium-ion batteries is a fundamental task of battery management systems on electric vehicles. It aims at estimating future SoH based on historical aging data. Most existing deep learning methods rely on filter-based feature extractors (e.g., CNN or Kalman filters) and recurrent time sequence models. Though efficient, they generally ignore cyclic features and the domain gap between training and testing batteries. To address this problem, we present CyFormer, a transformer-based cyclic time sequence model for SoH prediction. Instead of the conventional CNN-RNN structure, we adopt an encoder-decoder architecture. In the encoder, row-wise and column-wise attention blocks effectively capture intra-cycle and inter-cycle connections and extract cyclic features. In the decoder, the SoH queries cross-attend to these features to form the final predictions. We further utilize a transfer learning strategy to narrow the domain gap between the training and testing set. To be specific, we use fine-tuning to shift the model to a target working condition. Finally, we made our model more efficient by pruning. The experiment shows that our method attains an MAE of 0.75\% with only 10\% data for fine-tuning on a testing battery, surpassing prior methods by a large margin. Effective and robust, our method provides a potential solution for all cyclic time sequence prediction tasks.
Schottky Barrier MOSFET Enabled Ultra-Low Power Real-Time Neuron for Neuromorphic Computing
Authors: Shubham Patil, Jayatika Sakhuja, Ajay Kumar Singh, Anmol Biswas, Vivek Saraswat, Sandeep Kumar, Sandip Lashkare, Udayan Ganguly
Subjects: Emerging Technologies (cs.ET); Applied Physics (physics.app-ph)
Arxiv link: https://arxiv.org/abs/2304.08504
Pdf link: https://arxiv.org/pdf/2304.08504
Abstract Energy-efficient real-time synapses and neurons are essential to enable large-scale neuromorphic computing. In this paper, we propose and demonstrate the Schottky-Barrier MOSFET-based ultra-low power voltage-controlled current source to enable real-time neurons for neuromorphic computing. Schottky-Barrier MOSFET is fabricated on a Silicon-on-insulator platform with polycrystalline Silicon as the channel and Nickel/Platinum as the source/drain. The Poly-Si and Nickel make the back-to-back Schottky junction enabling ultra-low ON current required for energy-efficient neurons.
Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness
Authors: Allison Koenecke, Eric Giannella, Robb Willer, Sharad Goel
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.08530
Pdf link: https://arxiv.org/pdf/2304.08530
Abstract Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are responsive to community needs and desires. Here we examine this trade-off and concomitant preferences in the context of GetCalFresh, an online service that streamlines the application process for California's Supplementary Nutrition Assistance Program (SNAP, formerly known as food stamps). GetCalFresh runs online advertisements to raise awareness of their multilingual SNAP application service. We first demonstrate that when ads are optimized to garner the most enrollments per dollar, a disproportionately small number of Spanish speakers enroll due to relatively higher costs of non-English language advertising. Embedding these results in a survey (N = 1,532) of a diverse set of Americans, we find broad popular support for valuing equity in addition to efficiency: respondents generally preferred reducing total enrollments to facilitate increased enrollment of Spanish speakers. These results buttress recent calls to reevaluate the efficiency-centric paradigm popular in algorithmic resource allocation.
LIMIT: Learning Interfaces to Maximize Information Transfer
Authors: Benjamin A. Christie, Dylan P. Losey
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.08539
Pdf link: https://arxiv.org/pdf/2304.08539
Abstract Robots can use auditory, visual, or haptic interfaces to convey information to human users. The way these interfaces select signals is typically pre-defined by the designer: for instance, a haptic wristband might vibrate when the robot is moving and squeeze when the robot stops. But different people interpret the same signals in different ways, so that what makes sense to one person might be confusing or unintuitive to another. In this paper we introduce a unified algorithmic formalism for learning co-adaptive interfaces from scratch. Our method does not need to know the human's task (i.e., what the human is using these signals for). Instead, our insight is that interpretable interfaces should select signals that maximize correlation between the human's actions and the information the interface is trying to convey. Applying this insight we develop LIMIT: Learning Interfaces to Maximize Information Transfer. LIMIT optimizes a tractable, real-time proxy of information gain in continuous spaces. The first time a person works with our system the signals may appear random; but over repeated interactions the interface learns a one-to-one mapping between displayed signals and human responses. Our resulting approach is both personalized to the current user and not tied to any specific interface modality. We compare LIMIT to state-of-the-art baselines across controlled simulations, an online survey, and an in-person user study with auditory, visual, and haptic interfaces. Overall, our results suggest that LIMIT learns interfaces that enable users to complete the task more quickly and efficiently, and users subjectively prefer LIMIT to the alternatives. See videos here: https://youtu.be/IvQ3TM1_2fA.
GrOVe: Ownership Verification of Graph Neural Networks using Embeddings
Authors: Asim Waheed, Vasisht Duddu, N. Asokan
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.08566
Pdf link: https://arxiv.org/pdf/2304.08566
Abstract Graph neural networks (GNNs) have emerged as a state-of-the-art approach to model and draw inferences from large scale graph-structured data in various application settings such as social networking. The primary goal of a GNN is to learn an embedding for each graph node in a dataset that encodes both the node features and the local graph structure around the node. Embeddings generated by a GNN for a graph node are unique to that GNN. Prior work has shown that GNNs are prone to model extraction attacks. Model extraction attacks and defenses have been explored extensively in other non-graph settings. While detecting or preventing model extraction appears to be difficult, deterring them via effective ownership verification techniques offer a potential defense. In non-graph settings, fingerprinting models, or the data used to build them, have shown to be a promising approach toward ownership verification. We present GrOVe, a state-of-the-art GNN model fingerprinting scheme that, given a target model and a suspect model, can reliably determine if the suspect model was trained independently of the target model or if it is a surrogate of the target model obtained via model extraction. We show that GrOVe can distinguish between surrogate and independent models even when the independent model uses the same training dataset and architecture as the original target model. Using six benchmark datasets and three model architectures, we show that consistently achieves low false-positive and false-negative rates. We demonstrate that is robust against known fingerprint evasion techniques while remaining computationally efficient.
Traversing combinatorial 0/1-polytopes via optimization
Authors: Arturo Merino, Torsten Mütze
Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
Arxiv link: https://arxiv.org/abs/2304.08567
Pdf link: https://arxiv.org/pdf/2304.08567
Abstract In this paper, we present a new framework that exploits combinatorial optimization for efficiently generating a large variety of combinatorial objects based on graphs, matroids, posets and polytopes. Our method relies on a simple and versatile algorithm for computing a Hamilton path on the skeleton of any 0/1-polytope ${\rm conv}(X)$, where $X\subseteq {0,1}^n$. The algorithm uses as a black box any algorithm that solves a variant of the classical linear optimization problem $\min{w\cdot x\mid x\in X}$, and the resulting delay, i.e., the running time per visited vertex on the Hamilton path, is only by a factor of $\log n$ larger than the running time of the optimization algorithm. When $X$ encodes a particular class of combinatorial objects, then traversing the skeleton of the polytope ${\rm conv}(X)$ along a Hamilton path corresponds to listing the combinatorial objects by local change operations, i.e., we obtain Gray code listings. As concrete results of our general framework, we obtain efficient algorithms for generating all ($c$-optimal) bases in a matroid; ($c$-optimal) spanning trees, forests, ($c$-optimal) matchings in a general graph; ($c$-optimal) vertex covers, ($c$-optimal) stable sets in a bipartite graph; as well as ($c$-optimal) antichains and ideals of a poset. The delay and space required by these algorithms are polynomial in the size of the matroid, graph, or poset, respectively, and these listings correspond to Hamilton paths on the corresponding combinatorial polytopes. We also obtain an $O(t{\rm LP} \log n)$ delay algorithm for the vertex enumeration problem on 0/1-polytopes ${x\in\mathbb{R}^n\mid Ax\leq b}$, where $A\in \mathbb{R}^{m\times n}$ and $b\in\mathbb{R}^m$, and $t{\rm LP}$ is the time needed to solve the linear program $\min{w\cdot x\mid Ax\leq b}$. This improves upon the 25-year old $O(t_{\rm LP}\,n)$ delay algorithm of Bussieck and L\"ubbecke.
Diagnosing applications' I/O behavior through system call observability
Authors: Tânia Esteves, Ricardo Macedo, Rui Oliveira, João Paulo
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2304.08569
Pdf link: https://arxiv.org/pdf/2304.08569
Abstract We present DIO, a generic tool for observing inefficient and erroneous I/O interactions between applications and in-kernel storage systems that lead to performance, dependability, and correctness issues. DIO facilitates the analysis and enables near real-time visualization of complex I/O patterns for data-intensive applications generating millions of storage requests. This is achieved by non-intrusively intercepting system calls, enriching collected data with relevant context, and providing timely analysis and visualization for traced events. We demonstrate its usefulness by analyzing two production-level applications. Results show that DIO enables diagnosing resource contention in multi-threaded I/O that leads to high tail latency and erroneous file accesses that cause data loss.
Energy-Efficient Lane Changes Planning and Control for Connected Autonomous Vehicles on Urban Roads
Authors: Eunhyek Joa, Hotae Lee, Eric Yongkeun Choi, Francesco Borrelli
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.08576
Pdf link: https://arxiv.org/pdf/2304.08576
Abstract This paper presents a novel energy-efficient motion planning algorithm for Connected Autonomous Vehicles (CAVs) on urban roads. The approach consists of two components: a decision-making algorithm and an optimization-based trajectory planner. The decision-making algorithm leverages Signal Phase and Timing (SPaT) information from connected traffic lights to select a lane with the aim of reducing energy consumption. The algorithm is based on a heuristic rule which is learned from human driving data. The optimization-based trajectory planner generates a safe, smooth, and energy-efficient trajectory toward the selected lane. The proposed strategy is experimentally evaluated in a Vehicle-in-the-Loop (VIL) setting, where a real test vehicle receives SPaT information from both actual and virtual traffic lights and autonomously drives on a testing site, while the surrounding vehicles are simulated. The results demonstrate that the use of SPaT information in autonomous driving leads to improved energy efficiency, with the proposed strategy saving 37.1% energy consumption compared to a lane-keeping algorithm.
Graph Sparsification by Approximate Matrix Multiplication
Authors: Neophytos Charalambides, Alfred O. Hero III
Subjects: Numerical Analysis (math.NA); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Signal Processing (eess.SP); Spectral Theory (math.SP)
Arxiv link: https://arxiv.org/abs/2304.08581
Pdf link: https://arxiv.org/pdf/2304.08581
Abstract Graphs arising in statistical problems, signal processing, large networks, combinatorial optimization, and data analysis are often dense, which causes both computational and storage bottlenecks. One way of \textit{sparsifying} a \textit{weighted} graph, while sharing the same vertices as the original graph but reducing the number of edges, is through \textit{spectral sparsification}. We study this problem through the perspective of RandNLA. Specifically, we utilize randomized matrix multiplication to give a clean and simple analysis of how sampling according to edge weights gives a spectral approximation to graph Laplacians. Through the $CR$-MM algorithm, we attain a simple and computationally efficient sparsifier whose resulting Laplacian estimate is unbiased and of minimum variance. Furthermore, we define a new notion of \textit{additive spectral sparsifiers}, which has not been considered in the literature.
Safe Navigation and Obstacle Avoidance Using Differentiable Optimization Based Control Barrier Functions
Authors: Bolun Dai, Rooholla Khorrambakht, Prashanth Krishnamurthy, Vinícius Gonçalves, Anthony Tzes, Farshad Khorrami
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.08586
Pdf link: https://arxiv.org/pdf/2304.08586
Abstract Control barrier functions (CBFs) have been widely applied to safety-critical robotic applications. However, the construction of control barrier functions for robotic systems remains a challenging task. Recently, collision detection using differentiable optimization has provided a way to compute the minimum uniform scaling factor that results in an intersection between two convex shapes and to also compute the Jacobian of the scaling factor. In this paper, we propose a framework that uses this scaling factor, with an offset, to systematically define a CBF for obstacle avoidance tasks. We provide a theoretical analysis that proves the continuity of the proposed CBF. Empirically, we show that the proposed CBF is continuously differentiable, and the resulting optimal control problem is computationally efficient, which makes it applicable for real-time robotic control. We validate our approach, first using a 2D mobile robot example, then on the Franka-Emika Research~3 (FR3) robot manipulator both in simulation and experiment.
Revisiting Block-Diagonal SDP Relaxations for the Clique Number of the Paley Graphs
Authors: Vladimir A. Kobzar, Krishnan Mody
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Combinatorics (math.CO); Number Theory (math.NT); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2304.08615
Pdf link: https://arxiv.org/pdf/2304.08615
Abstract This work addresses the block-diagonal semidefinite program (SDP) relaxations for the clique number of the Paley graphs. The size of the maximal clique (clique number) of a graph is a classic NP-complete problem; a Paley graph is a deterministic graph where two vertices are connected if their difference is a quadratic residue modulo certain prime powers. Improving the upper bound for the Paley graph clique number for odd prime powers is an open problem in combinatorics. Moreover, since quadratic residues exhibit pseudorandom properties, Paley graphs are related to the construction of deterministic restricted isometries, an open problem in compressed sensing and sparse recovery. Recent work provides evidence that the current upper bounds can be improved by the sum-of-squares (SOS) relaxations. In particular the bounds given by the SOS relaxations of degree 4 (SOS-4) are asymptotically growing at an order smaller than square root of the prime. However computations of SOS-4 become intractable with respect to large graphs. Gvozdenovic et al. introduced a more computationally efficient block-diagonal hierarchy of SDPs that refines the SOS hierarchy. They computed the values of these SDPs of degrees 2 and 3 (L2 and L3 respectively) for the Paley graph clique numbers associated with primes p less or equal to 809. These values bound from the above the values of the corresponding SOS-4 and SOS-6 relaxations respectively. We revisit these computations and determine the values of the L2 relaxation for larger p's. Our results provide additional numerical evidence that the L2 relaxations, and therefore also the SOS-4 relaxations, are asymptotically growing at an order smaller than the square root of p.
Dynamic Vector Bin Packing for Online Resource Allocation in the Cloud
Authors: Aniket Murhekar, David Arbour, Tung Mai, Anup Rao
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.08648
Pdf link: https://arxiv.org/pdf/2304.08648
Abstract Several cloud-based applications, such as cloud gaming, rent servers to execute jobs which arrive in an online fashion. Each job has a resource demand and must be dispatched to a cloud server which has enough resources to execute the job, which departs after its completion. Under the `pay-as-you-go' billing model, the server rental cost is proportional to the total time that servers are actively running jobs. The problem of efficiently allocating a sequence of online jobs to servers without exceeding the resource capacity of any server while minimizing total server usage time can be modelled as a variant of the dynamic bin packing problem (DBP), called MinUsageTime DBP. In this work, we initiate the study of the problem with multi-dimensional resource demands (e.g. CPU/GPU usage, memory requirement, bandwidth usage, etc.), called MinUsageTime Dynamic Vector Bin Packing (DVBP). We study the competitive ratio (CR) of Any Fit packing algorithms for this problem. We show almost-tight bounds on the CR of three specific Any Fit packing algorithms, namely First Fit, Next Fit, and Move To Front. We prove that the CR of Move To Front is at most $(2\mu+1)d +1$, where $\mu$ is the ratio of the max/min item durations. For $d=1$, this significantly improves the previously known upper bound of $6\mu+7$ (Kamali & Lopez-Ortiz, 2015). We then prove the CR of First Fit and Next Fit are bounded by $(\mu+2)d+1$ and $2\mu d+1$, respectively. Next, we prove a lower bound of $(\mu+1)d$ on the CR of any Any Fit packing algorithm, an improved lower bound of $2\mu d$ for Next Fit, and a lower bound of $2\mu$ for Move To Front in the 1-D case. All our bounds improve or match the best-known bounds for the 1-D case. Finally, we experimentally study the average-case performance of these algorithms on randomly generated synthetic data, and observe that Move To Front outperforms other Any Fit packing algorithms.
An Ethereum-compatible blockchain that explicates and ensures design-level safety properties for smart contracts
Authors: Nikolaj Bjørner, Shuo Chen, Yang Chen, Zhongxin Guo, Peng Liu, Nanqing Luo
Subjects: Cryptography and Security (cs.CR); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2304.08655
Pdf link: https://arxiv.org/pdf/2304.08655
Abstract Smart contracts are crucial elements of decentralized technologies, but they face significant obstacles to trustworthiness due to security bugs and trapdoors. To address the core issue, we propose a technology that enables programmers to focus on design-level properties rather than specific low-level attack patterns. Our proposed technology, called Theorem-Carrying-Transaction (TCT), combines the benefits of runtime checking and symbolic proof. Under the TCT protocol, every transaction must carry a theorem that proves its adherence to the safety properties in the invoked contracts, and the blockchain checks the proof before executing the transaction. The unique design of TCT ensures that the theorems are provable and checkable in an efficient manner. We believe that TCT holds a great promise for enabling provably secure smart contracts in the future. As such, we call for collaboration toward this vision.
Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU
Authors: Luk Burchard, Max Xiaohang Zhao, Johannes Langguth, Aydın Buluç, Giulia Guidi
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Genomics (q-bio.GN)
Arxiv link: https://arxiv.org/abs/2304.08662
Pdf link: https://arxiv.org/pdf/2304.08662
Abstract Dedicated accelerator hardware has become essential for processing AI-based workloads, leading to the rise of novel accelerator architectures. Furthermore, fundamental differences in memory architecture and parallelism have made these accelerators targets for scientific computing. The sequence alignment problem is fundamental in bioinformatics; we have implemented the $X$-Drop algorithm, a heuristic method for pairwise alignment that reduces search space, on the Graphcore Intelligence Processor Unit (IPU) accelerator. The $X$-Drop algorithm has an irregular computational pattern, which makes it difficult to accelerate due to load balancing. Here, we introduce a graph-based partitioning and queue-based batch system to improve load balancing. Our implementation achieves $10\times$ speedup over a state-of-the-art GPU implementation and up to $4.65\times$ compared to CPU. In addition, we introduce a memory-restricted $X$-Drop algorithm that reduces memory footprint by $55\times$ and efficiently uses the IPU's limited low-latency SRAM. This optimization further improves the strong scaling performance by $3.6\times$.
Continuous Versatile Jumping Using Learned Action Residuals
Authors: Yuxiang Yang, Xiangyun Meng, Wenhao Yu, Tingnan Zhang, Jie Tan, Byron Boots
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08663
Pdf link: https://arxiv.org/pdf/2304.08663
Abstract Jumping is essential for legged robots to traverse through difficult terrains. In this work, we propose a hierarchical framework that combines optimal control and reinforcement learning to learn continuous jumping motions for quadrupedal robots. The core of our framework is a stance controller, which combines a manually designed acceleration controller with a learned residual policy. As the acceleration controller warm starts policy for efficient training, the trained policy overcomes the limitation of the acceleration controller and improves the jumping stability. In addition, a low-level whole-body controller converts the body pose command from the stance controller to motor commands. After training in simulation, our framework can be deployed directly to the real robot, and perform versatile, continuous jumping motions, including omni-directional jumps at up to 50cm high, 60cm forward, and jump-turning at up to 90 degrees. Please visit our website for more results: https://sites.google.com/view/learning-to-jump.
A Voice Disease Detection Method Based on MFCCs and Shallow CNN
Authors: Xiaoping Xie, Hao Cai, Can Li, Fei Ding
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2304.08708
Pdf link: https://arxiv.org/pdf/2304.08708
Abstract The incidence rate of voice diseases is increasing year by year. The use of software for remote diagnosis is a technical development trend and has important practical value. Among voice diseases, common diseases that cause hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and vocal cord polyp. This paper presents a voice disease detection method that can be applied in a wide range of clinical. We cooperated with Xiangya Hospital of Central South University to collect voice samples from sixty-one different patients. The Mel Frequency Cepstrum Coefficient (MFCC) parameters are extracted as input features to describe the voice in the form of data. An innovative model combining MFCC parameters and single convolution layer CNN is proposed for fast calculation and classification. The highest accuracy we achieved was 92%, it is fully ahead of the original research results and internationally advanced. And we use Advanced Voice Function Assessment Databases (AVFAD) to evaluate the generalization ability of the method we proposed, which achieved an accuracy rate of 98%. Experiments on clinical and standard datasets show that for the pathological detection of voice diseases, our method has greatly improved in accuracy and computational efficiency.
InversOS: Efficient Control-Flow Protection for AArch64 Applications with Privilege Inversion
Authors: Zhuojia Shen, John Criswell
Subjects: Cryptography and Security (cs.CR); Operating Systems (cs.OS)
Arxiv link: https://arxiv.org/abs/2304.08717
Pdf link: https://arxiv.org/pdf/2304.08717
Abstract With the increasing popularity of AArch64 processors in general-purpose computing, securing software running on AArch64 systems against control-flow hijacking attacks has become a critical part toward secure computation. Shadow stacks keep shadow copies of function return addresses and, when protected from illegal modifications and coupled with forward-edge control-flow integrity, form an effective and proven defense against such attacks. However, AArch64 lacks native support for write-protected shadow stacks, while software alternatives either incur prohibitive performance overhead or provide weak security guarantees. We present InversOS, the first hardware-assisted write-protected shadow stacks for AArch64 user-space applications, utilizing commonly available features of AArch64 to achieve efficient intra-address space isolation (called Privilege Inversion) required to protect shadow stacks. Privilege Inversion adopts unconventional design choices that run protected applications in the kernel mode and mark operating system (OS) kernel memory as user-accessible; InversOS therefore uses a novel combination of OS kernel modifications, compiler transformations, and another AArch64 feature to ensure the safety of doing so and to support legacy applications. We show that InversOS is secure by design, effective against various control-flow hijacking attacks, and performant on selected benchmarks and applications (incurring overhead of 7.0% on LMBench, 7.1% on SPEC CPU 2017, and 3.0% on Nginx web server).
Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets
Authors: Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08742
Pdf link: https://arxiv.org/pdf/2304.08742
Abstract Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.
A Survey on Biomedical Text Summarization with Pre-trained Language Model
Authors: Qianqian Xie, Zheheng Luo, Benyou Wang, Sophia Ananiadou
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.08763
Pdf link: https://arxiv.org/pdf/2304.08763
Abstract The exponential growth of biomedical texts such as biomedical literature and electronic health records (EHRs), provides a big challenge for clinicians and researchers to access clinical information efficiently. To address the problem, biomedical text summarization has been proposed to support clinical information retrieval and management, aiming at generating concise summaries that distill key information from single or multiple biomedical documents. In recent years, pre-trained language models (PLMs) have been the de facto standard of various natural language processing tasks in the general domain. Most recently, PLMs have been further investigated in the biomedical field and brought new insights into the biomedical text summarization task. In this paper, we systematically summarize recent advances that explore PLMs for biomedical text summarization, to help understand recent progress, challenges, and future directions. We categorize PLMs-based approaches according to how they utilize PLMs and what PLMs they use. We then review available datasets, recent approaches and evaluation metrics of the task. We finally discuss existing challenges and promising future directions. To facilitate the research community, we line up open resources including available datasets, recent approaches, codes, evaluation metrics, and the leaderboard in a public project: https://github.com/KenZLuo/Biomedical-Text-Summarization-Survey/tree/master.
Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services
Authors: Minrui Xu, Dusit Niyato, Hongliang Zhang, Jiawen Kang, Zehui Xiong, Shiwen Mao, Zhu Han
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.08782
Pdf link: https://arxiv.org/pdf/2304.08782
Abstract Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation models (PFMs), e.g., generative pretrained transformers (GPTs), can effectively provide various AI services, such as autonomous driving, digital twins, and AI-generated content (AIGC) for extended reality. With the advantages of low latency and privacy-preserving, serving PFMs of mobile AI services in edge intelligence is a viable solution for caching and executing PFMs on edge servers with limited computing resources and GPU memory. However, PFMs typically consist of billions of parameters that are computation and memory-intensive for edge servers during loading and execution. In this article, we investigate edge PFM serving problems for mobile AIGC services of Metaverse. First, we introduce the fundamentals of PFMs and discuss their characteristic fine-tuning and inference methods in edge intelligence. Then, we propose a novel framework of joint model caching and inference for managing models and allocating resources to satisfy users' requests efficiently. Furthermore, considering the in-context learning ability of PFMs, we propose a new metric to evaluate the freshness and relevance between examples in demonstrations and executing tasks, namely the Age of Context (AoC). Finally, we propose a least context algorithm for managing cached models at edge servers by balancing the tradeoff among latency, energy consumption, and accuracy.
Connectivity in the presence of an opponent
Authors: Zihui Liang, Bakh Khoussainov, Toru Takisaka, Mingyu Xiao
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2304.08783
Pdf link: https://arxiv.org/pdf/2304.08783
Abstract The paper introduces two player connectivity games played on finite bipartite graphs. Algorithms that solve these connectivity games can be used as subroutines for solving M\"uller games. M\"uller games constitute a well established class of games in model checking and verification. In connectivity games, the objective of one of the players is to visit every node of the game graph infinitely often. The first contribution of this paper is our proof that solving connectivity games can be reduced to the incremental strongly connected component maintenance (ISCCM) problem, an important problem in graph algorithms and data structures. The second contribution is that we non-trivially adapt two known algorithms for the ISCCM problem to provide two efficient algorithms that solve the connectivity games problem. Finally, based on the techniques developed, we recast Horn's polynomial time algorithm that solves explicitly given M\"uller games and provide an alternative proof of its correctness. Our algorithms are more efficient than that of Horn's algorithm. Our solution for connectivity games is used as a subroutine in the algorithm.
Large-scale Dynamic Network Representation via Tensor Ring Decomposition
Authors: Qu Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08798
Pdf link: https://arxiv.org/pdf/2304.08798
Abstract Large-scale Dynamic Networks (LDNs) are becoming increasingly important in the Internet age, yet the dynamic nature of these networks captures the evolution of the network structure and how edge weights change over time, posing unique challenges for data analysis and modeling. A Latent Factorization of Tensors (LFT) model facilitates efficient representation learning for a LDN. But the existing LFT models are almost based on Canonical Polyadic Factorization (CPF). Therefore, this work proposes a model based on Tensor Ring (TR) decomposition for efficient representation learning for a LDN. Specifically, we incorporate the principle of single latent factor-dependent, non-negative, and multiplicative update (SLF-NMU) into the TR decomposition model, and analyze the particular bias form of TR decomposition. Experimental studies on two real LDNs demonstrate that the propose method achieves higher accuracy than existing models.
Neuromorphic computing for attitude estimation onboard quadrotors
Authors: Stein Stroobants, Julien Dupeyroux, Guido C.H.E. de Croon
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.08802
Pdf link: https://arxiv.org/pdf/2304.08802
Abstract Compelling evidence has been given for the high energy efficiency and update rates of neuromorphic processors, with performance beyond what standard Von Neumann architectures can achieve. Such promising features could be advantageous in critical embedded systems, especially in robotics. To date, the constraints inherent in robots (e.g., size and weight, battery autonomy, available sensors, computing resources, processing time, etc.), and particularly in aerial vehicles, severely hamper the performance of fully-autonomous on-board control, including sensor processing and state estimation. In this work, we propose a spiking neural network (SNN) capable of estimating the pitch and roll angles of a quadrotor in highly dynamic movements from 6-degree of freedom Inertial Measurement Unit (IMU) data. With only 150 neurons and a limited training dataset obtained using a quadrotor in a real world setup, the network shows competitive results as compared to state-of-the-art, non-neuromorphic attitude estimators. The proposed architecture was successfully tested on the Loihi neuromorphic processor on-board a quadrotor to estimate the attitude when flying. Our results show the robustness of neuromorphic attitude estimation and pave the way towards energy-efficient, fully autonomous control of quadrotors with dedicated neuromorphic computing systems.
Implicit representation priors meet Riemannian geometry for Bayesian robotic grasping
Authors: Norman Marlier, Julien Gustin, Gilles Louppe, Olivier Brüls
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08805
Pdf link: https://arxiv.org/pdf/2304.08805
Abstract Robotic grasping in highly noisy environments presents complex challenges, especially with limited prior knowledge about the scene. In particular, identifying good grasping poses with Bayesian inference becomes difficult due to two reasons: i) generating data from uninformative priors proves to be inefficient, and ii) the posterior often entails a complex distribution defined on a Riemannian manifold. In this study, we explore the use of implicit representations to construct scene-dependent priors, thereby enabling the application of efficient simulation-based Bayesian inference algorithms for determining successful grasp poses in unstructured environments. Results from both simulation and physical benchmarks showcase the high success rate and promising potential of this approach.
Revisiting the Role of Similarity and Dissimilarity inBest Counter Argument Retrieval
Authors: Hongguang Shi, Shuirong Cao, Cam-Tu Nguyen
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.08807
Pdf link: https://arxiv.org/pdf/2304.08807
Abstract This paper studies the task of best counter-argument retrieval given an input argument. Following the definition that the best counter-argument addresses the same aspects as the input argument while having the opposite stance, we aim to develop an efficient and effective model for scoring counter-arguments based on similarity and dissimilarity metrics. We first conduct an experimental study on the effectiveness of available scoring methods, including traditional Learning-To-Rank (LTR) and recent neural scoring models. We then propose Bipolar-encoder, a novel BERT-based model to learn an optimal representation for simultaneous similarity and dissimilarity. Experimental results show that our proposed method can achieve the accuracy@1 of 88.9\%, which significantly outperforms other baselines by a large margin. When combined with an appropriate caching technique, Bipolar-encoder is comparably efficient at prediction time.
DILI: A Distribution-Driven Learned Index
Authors: Pengfei Li, Hua Lu, Rong Zhu, Bolin Ding, Long Yang, Gang Pan
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2304.08817
Pdf link: https://arxiv.org/pdf/2304.08817
Abstract Targeting in-memory one-dimensional search keys, we propose a novel DIstribution-driven Learned Index tree (DILI), where a concise and computation-efficient linear regression model is used for each node. An internal node's key range is equally divided by its child nodes such that a key search enjoys perfect model prediction accuracy to find the relevant leaf node. A leaf node uses machine learning models to generate searchable data layout and thus accurately predicts the data record position for a key. To construct DILI, we first build a bottom-up tree with linear regression models according to global and local key distributions. Using the bottom-up tree, we build DILI in a top-down manner, individualizing the fanouts for internal nodes according to local distributions. DILI strikes a good balance between the number of leaf nodes and the height of the tree, two critical factors of key search time. Moreover, we design flexible algorithms for DILI to efficiently insert and delete keys and automatically adjust the tree structure when necessary. Extensive experimental results show that DILI outperforms the state-of-the-art alternatives on different kinds of workloads.
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08818
Pdf link: https://arxiv.org/pdf/2304.08818
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos. Similarly, we temporally align diffusion model upsamplers, turning them into temporally consistent video super resolution models. We focus on two relevant real-world applications: Simulation of in-the-wild driving data and creative content creation with text-to-video modeling. In particular, we validate our Video LDM on real driving videos of resolution 512 x 1024, achieving state-of-the-art performance. Furthermore, our approach can easily leverage off-the-shelf pre-trained image LDMs, as we only need to train a temporal alignment model in that case. Doing so, we turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048. We show that the temporal layers trained in this way generalize to different fine-tuned text-to-image LDMs. Utilizing this property, we show the first results for personalized text-to-video generation, opening exciting directions for future content creation. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/
Motion-state Alignment for Video Semantic Segmentation
Authors: Jinming Su, Ruihong Yin, Shuaibin Zhang, Junfeng Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08820
Pdf link: https://arxiv.org/pdf/2304.08820
Abstract In recent years, video semantic segmentation has made great progress with advanced deep neural networks. However, there still exist two main challenges \ie, information inconsistency and computation cost. To deal with the two difficulties, we propose a novel motion-state alignment framework for video semantic segmentation to keep both motion and state consistency. In the framework, we first construct a motion alignment branch armed with an efficient decoupled transformer to capture dynamic semantics, guaranteeing region-level temporal consistency. Then, a state alignment branch composed of a stage transformer is designed to enrich feature spaces for the current frame to extract static semantics and achieve pixel-level state consistency. Next, by a semantic assignment mechanism, the region descriptor of each semantic category is gained from dynamic semantics and linked with pixel descriptors from static semantics. Benefiting from the alignment of these two kinds of effective information, the proposed method picks up dynamic and static semantics in a targeted way, so that video semantic regions are consistently segmented to obtain precise locations with low computational complexity. Extensive experiments on Cityscapes and CamVid datasets show that the proposed approach outperforms state-of-the-art methods and validates the effectiveness of the motion-state alignment framework.
Contact Tracing over Uncertain Indoor Positioning Data (Extended Version)
Authors: Tiantian Liu, Huan Li, Hua Lu, Muhammad Aamir Cheema, Harry Kai-Ho Chan
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2304.08838
Pdf link: https://arxiv.org/pdf/2304.08838
Abstract Pandemics often cause dramatic losses of human lives and impact our societies in many aspects such as public health, tourism, and economy. To contain the spread of an epidemic like COVID-19, efficient and effective contact tracing is important, especially in indoor venues where the risk of infection is higher. In this work, we formulate and study a novel query called Indoor Contact Query (ICQ) over raw, uncertain indoor positioning data that digitalizes people's movements indoors. Given a query object o, e.g., a person confirmed to be a virus carrier, an ICQ analyzes uncertain indoor positioning data to find objects that most likely had close contact with o for a long period of time. To process ICQ, we propose a set of techniques. First, we design an enhanced indoor graph model to organize different types of data necessary for ICQ. Second, for indoor moving objects, we devise methods to determine uncertain regions and to derive positioning samples missing in the raw data. Third, we propose a query processing framework with a close contact determination method, a search algorithm, and the acceleration strategies. We conduct extensive experiments on synthetic and real datasets to evaluate our proposals. The results demonstrate the efficiency and effectiveness of our proposals.
GoferBot: A Visual Guided Human-Robot Collaborative Assembly System
Authors: Zheyu Zhuang, Yizhak Ben-Shabat, Jiahao Zhang, Stephen Gould, Robert Mahony
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08840
Pdf link: https://arxiv.org/pdf/2304.08840
Abstract The current transformation towards smart manufacturing has led to a growing demand for human-robot collaboration (HRC) in the manufacturing process. Perceiving and understanding the human co-worker's behaviour introduces challenges for collaborative robots to efficiently and effectively perform tasks in unstructured and dynamic environments. Integrating recent data-driven machine vision capabilities into HRC systems is a logical next step in addressing these challenges. However, in these cases, off-the-shelf components struggle due to generalisation limitations. Real-world evaluation is required in order to fully appreciate the maturity and robustness of these approaches. Furthermore, understanding the pure-vision aspects is a crucial first step before combining multiple modalities in order to understand the limitations. In this paper, we propose GoferBot, a novel vision-based semantic HRC system for a real-world assembly task. It is composed of a visual servoing module that reaches and grasps assembly parts in an unstructured multi-instance and dynamic environment, an action recognition module that performs human action prediction for implicit communication, and a visual handover module that uses the perceptual understanding of human behaviour to produce an intuitive and efficient collaborative assembly experience. GoferBot is a novel assembly system that seamlessly integrates all sub-modules by utilising implicit semantic information purely from visual perception.
Two-stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems
Authors: Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08841
Pdf link: https://arxiv.org/pdf/2304.08841
Abstract Source localization is the inverse problem of graph information dissemination and has broad practical applications. However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges. Recently, deep generative models, particularly diffusion models inspired by classical non-equilibrium thermodynamics, have made significant progress. While diffusion models have proven to be powerful in solving inverse problems and producing high-quality reconstructions, applying them directly to the source localization is infeasible for two reasons. Firstly, it is impossible to calculate the posterior disseminated results on a large-scale network for iterative denoising sampling, which would incur enormous computational costs. Secondly, in the existing methods for this field, the training data itself are ill-posed (many-to-one); thus simply transferring the diffusion model would only lead to local optima. To address these challenges, we propose a two-stage optimization framework, the source localization denoising diffusion model (SL-Diff). In the coarse stage, we devise the source proximity degrees as the supervised signals to generate coarse-grained source predictions. This aims to efficiently initialize the next stage, significantly reducing its convergence time and calibrating the convergence process. Furthermore, the introduction of cascade temporal information in this training method transforms the many-to-one mapping relationship into a one-to-one relationship, perfectly addressing the ill-posed problem. In the fine stage, we design a diffusion model for the graph inverse problem that can quantify the uncertainty in the dissemination. The proposed SL-Diff yields excellent prediction results within a reasonable sampling time at extensive experiments.
Revisiting Fast Fourier multiplication algorithms on quotient rings
Authors: Ramiro Martínez, Paz Morillo
Subjects: Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2304.08860
Pdf link: https://arxiv.org/pdf/2304.08860
Abstract This work formalizes efficient Fast Fourier-based multiplication algorithms for polynomials in quotient rings such as $\mathbb{Z}_{m}[x]/\left<x^{n}-a\right>$, with $n$ a power of 2 and $m$ a non necessarily prime integer. We also present a meticulous study on the necessary and/or sufficient conditions required for the applicability of these multiplication algorithms. This paper allows us to unify the different approaches to the problem of efficiently computing the product of two polynomials in these quotient rings.
Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition
Authors: Maurits Bleeker, Pawel Swietojanski, Stefan Braun, Xiaodan Zhuang
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2304.08862
Pdf link: https://arxiv.org/pdf/2304.08862
Abstract This paper presents an extension to train end-to-end Context-Aware Transformer Transducer ( CATT ) models by using a simple, yet efficient method of mining hard negative phrases from the latent space of the context encoder. During training, given a reference query, we mine a number of similar phrases using approximate nearest neighbour search. These sampled phrases are then used as negative examples in the context list alongside random and ground truth contextual information. By including approximate nearest neighbour phrases (ANN-P) in the context list, we encourage the learned representation to disambiguate between similar, but not identical, biasing phrases. This improves biasing accuracy when there are several similar phrases in the biasing inventory. We carry out experiments in a large-scale data regime obtaining up to 7% relative word error rate reductions for the contextual portion of test data. We also extend and evaluate CATT approach in streaming applications.
Romanization-based Large-scale Adaptation of Multilingual Language Models
Authors: Sukannya Purkayastha, Sebastian Ruder, Jonas Pfeiffer, Iryna Gurevych, Ivan Vulić
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08865
Pdf link: https://arxiv.org/pdf/2304.08865
Abstract Large multilingual pretrained language models (mPLMs) have become the de facto state of the art for cross-lingual transfer in NLP. However, their large-scale deployment to many languages, besides pretraining data scarcity, is also hindered by the increase in vocabulary size and limitations in their parameter budget. In order to boost the capacity of mPLMs to deal with low-resource and unseen languages, we explore the potential of leveraging transliteration on a massive scale. In particular, we explore the UROMAN transliteration tool, which provides mappings from UTF-8 to Latin characters for all the writing systems, enabling inexpensive romanization for virtually any language. We first focus on establishing how UROMAN compares against other language-specific and manually curated transliterators for adapting multilingual PLMs. We then study and compare a plethora of data- and parameter-efficient strategies for adapting the mPLMs to romanized and non-romanized corpora of 14 diverse low-resource languages. Our results reveal that UROMAN-based transliteration can offer strong performance for many languages, with particular gains achieved in the most challenging setups: on languages with unseen scripts and with limited training data without any vocabulary augmentation. Further analyses reveal that an improved tokenizer based on romanized data can even outperform non-transliteration-based methods in the majority of languages.
Differentiable Genetic Programming for High-dimensional Symbolic Regression
Authors: Peng Zeng, Xiaotian Song, Andrew Lensen, Yuwei Ou, Yanan Sun, Mengjie Zhang, Jiancheng Lv
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08915
Pdf link: https://arxiv.org/pdf/2304.08915
Abstract Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-dimensional problems. This limitation is mainly caused by the stochastic evolutionary nature of traditional GP in constructing the trees. In this paper, we propose a differentiable approach named DGP to construct GP trees towards high-dimensional SR for the first time. Specifically, a new data structure called differentiable symbolic tree is proposed to relax the discrete structure to be continuous, thus a gradient-based optimizer can be presented for the efficient optimization. In addition, a sampling method is proposed to eliminate the discrepancy caused by the above relaxation for valid symbolic expressions. Furthermore, a diversification mechanism is introduced to promote the optimizer escaping from local optima for globally better solutions. With these designs, the proposed DGP method can efficiently search for the GP trees with higher performance, thus being capable of dealing with high-dimensional SR. To demonstrate the effectiveness of DGP, we conducted various experiments against the state of the arts based on both GP and deep neural networks. The experiment results reveal that DGP can outperform these chosen peer competitors on high-dimensional regression benchmarks with dimensions varying from tens to thousands. In addition, on the synthetic SR problems, the proposed DGP method can also achieve the best recovery rate even with different noisy levels. It is believed this work can facilitate SR being a powerful alternative to interpretable ML for a broader range of real-world problems.
Coefficient Synthesis for Threshold Automata
Authors: A. R. Balasubramanian
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.08917
Pdf link: https://arxiv.org/pdf/2304.08917
Abstract Threshold automata are a formalism for modeling fault-tolerant distributed algorithms. The main feature of threshold automata is the notion of a threshold guard, which allows us to compare the number of received messages with the total number of different types of processes. In this paper, we consider the coefficient synthesis problem for threshold automata, in which we are given a sketch of a threshold automaton (with the constants in the threshold guards left unspecified) and a specification and we want to synthesize a set of constants which when plugged into the sketch, gives a threshold automaton satisfying the specification. Our main result is that this problem is undecidable, even when the specification is a coverability specification and the underlying sketch is acyclic.
Quantum Annealing for Single Image Super-Resolution
Authors: Han Yao Choong, Suryansh Kumar, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08924
Pdf link: https://arxiv.org/pdf/2304.08924
Abstract This paper proposes a quantum computing-based algorithm to solve the single image super-resolution (SISR) problem. One of the well-known classical approaches for SISR relies on the well-established patch-wise sparse modeling of the problem. Yet, this field's current state of affairs is that deep neural networks (DNNs) have demonstrated far superior results than traditional approaches. Nevertheless, quantum computing is expected to become increasingly prominent for machine learning problems soon. As a result, in this work, we take the privilege to perform an early exploration of applying a quantum computing algorithm to this important image enhancement problem, i.e., SISR. Among the two paradigms of quantum computing, namely universal gate quantum computing and adiabatic quantum computing (AQC), the latter has been successfully applied to practical computer vision problems, in which quantum parallelism has been exploited to solve combinatorial optimization efficiently. This work demonstrates formulating quantum SISR as a sparse coding optimization problem, which is solved using quantum annealers accessed via the D-Wave Leap platform. The proposed AQC-based algorithm is demonstrated to achieve improved speed-up over a classical analog while maintaining comparable SISR accuracy.
Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks
Authors: Ping Gong, Yuxin Ma, Cheng Li, Xiaosong Ma, Sam H. Noh
Subjects: Machine Learning (cs.LG); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2304.08925
Pdf link: https://arxiv.org/pdf/2304.08925
Abstract In this paper, we primarily focus on understanding the data preprocessing pipeline for DNN Training in the public cloud. First, we run experiments to test the performance implications of the two major data preprocessing methods using either raw data or record files. The preliminary results show that data preprocessing is a clear bottleneck, even with the most efficient software and hardware configuration enabled by NVIDIA DALI, a high-optimized data preprocessing library. Second, we identify the potential causes, exercise a variety of optimization methods, and present their pros and cons. We hope this work will shed light on the new co-design of data storage, loading pipeline'' andtraining framework'' and flexible resource configurations between them so that the resources can be fully exploited and performance can be maximized.
Multitenant Containers as a Service (CaaS) for Clouds and Edge Clouds
Authors: Berat Can Senel, Maxime Mouchet, Justin Cappos, Olivier Fourmaux, Timur Friedman, Rick McGeer
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.08927
Pdf link: https://arxiv.org/pdf/2304.08927
Abstract Cloud computing, offering on-demand access to computing resources through the Internet and the pay-as-you-go model, has marked the last decade with its three main service models; Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). The lightweight nature of containers compared to virtual machines has led to the rapid uptake of another in recent years, called Containers as a Service (CaaS), which falls between IaaS and PaaS regarding control abstraction. However, when CaaS is offered to multiple independent users, or tenants, a multi-instance approach is used, in which each tenant receives its own separate cluster, which reimposes significant overhead due to employing virtual machines for isolation. If CaaS is to be offered not just at the cloud, but also at the edge cloud, where resources are limited, another solution is required. We introduce a native CaaS multitenancy framework, meaning that tenants share a cluster, which is more efficient than the one tenant per cluster model. Whenever there are shared resources, isolation of multitenant workloads is an issue. Such workloads can be isolated by Kata Containers today. Besides, our framework esteems the application requirements that compel complete isolation and a fully customized environment. Node-level slicing empowers tenants to programmatically reserve isolated subclusters where they can choose the container runtime that suits application needs. The framework is publicly available as liberally-licensed, free, open-source software that extends Kubernetes, the de facto standard container orchestration system. It is in production use within the EdgeNet testbed for researchers.
Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
Authors: Dingwen Kong, Lin F. Yang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2304.08944
Pdf link: https://arxiv.org/pdf/2304.08944
Abstract An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires too much feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs. After that, the algorithm guarantees to provide a nearly optimal policy for the task with high probability. We show that, even with the presence of random noise in the feedback, the algorithm only takes $\widetilde{O}(H{{\dim{R}^2}})$ queries on the reward function to provide an $\epsilon$-optimal policy for any $\epsilon > 0$. Here $H$ is the horizon of the RL environment, and $\dim{R}$ specifies the complexity of the function class representing the reward function. In contrast, standard RL algorithms require to query the reward function for at least $\Omega(\operatorname{poly}(d, 1/\epsilon))$ state-action pairs where $d$ depends on the complexity of the environmental transition.
Generative modeling of living cells with SO(3)-equivariant implicit neural representations
Authors: David Wiesner, Julian Suk, Sven Dummer, Tereza Nečasová, Vladimír Ulman, David Svoboda, Jelmer M. Wolterink
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2304.08960
Pdf link: https://arxiv.org/pdf/2304.08960
Abstract Data-driven cell tracking and segmentation methods in biomedical imaging require diverse and information-rich training data. In cases where the number of training samples is limited, synthetic computer-generated data sets can be used to improve these methods. This requires the synthesis of cell shapes as well as corresponding microscopy images using generative models. To synthesize realistic living cell shapes, the shape representation used by the generative model should be able to accurately represent fine details and changes in topology, which are common in cells. These requirements are not met by 3D voxel masks, which are restricted in resolution, and polygon meshes, which do not easily model processes like cell growth and mitosis. In this work, we propose to represent living cell shapes as level sets of signed distance functions (SDFs) which are estimated by neural networks. We optimize a fully-connected neural network to provide an implicit representation of the SDF value at any point in a 3D+time domain, conditioned on a learned latent code that is disentangled from the rotation of the cell shape. We demonstrate the effectiveness of this approach on cells that exhibit rapid deformations (Platynereis dumerilii), cells that grow and divide (C. elegans), and cells that have growing and branching filopodial protrusions (A549 human lung carcinoma cells). A quantitative evaluation using shape features, Hausdorff distance, and Dice similarity coefficients of real and synthetic cell shapes shows that our model can generate topologically plausible complex cell shapes in 3D+time with high similarity to real living cell shapes. Finally, we show how microscopy images of living cells that correspond to our generated cell shapes can be synthesized using an image-to-image model.
SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes
Authors: Yiming Gao, Yan-Pei Cao, Ying Shan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08971
Pdf link: https://arxiv.org/pdf/2304.08971
Abstract Online reconstructing and rendering of large-scale indoor scenes is a long-standing challenge. SLAM-based methods can reconstruct 3D scene geometry progressively in real time but can not render photorealistic results. While NeRF-based methods produce promising novel view synthesis results, their long offline optimization time and lack of geometric constraints pose challenges to efficiently handling online input. Inspired by the complementary advantages of classical 3D reconstruction and NeRF, we thus investigate marrying explicit geometric representation with NeRF rendering to achieve efficient online reconstruction and high-quality rendering. We introduce SurfelNeRF, a variant of neural radiance field which employs a flexible and scalable neural surfel representation to store geometric attributes and extracted appearance features from input images. We further extend the conventional surfel-based fusion scheme to progressively integrate incoming input frames into the reconstructed global neural scene representation. In addition, we propose a highly-efficient differentiable rasterization scheme for rendering neural surfel radiance fields, which helps SurfelNeRF achieve $10\times$ speedups in training and inference time, respectively. Experimental results show that our method achieves the state-of-the-art 23.82 PSNR and 29.58 PSNR on ScanNet in feedforward inference and per-scene optimization settings, respectively.
Neural Architecture Search for Visual Anomaly Segmentation
Authors: Tommie Kerssies
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08975
Pdf link: https://arxiv.org/pdf/2304.08975
Abstract This paper presents AutoPatch, the first application of neural architecture search to the complex task of segmenting visual anomalies. Measurement of anomaly segmentation quality is challenging due to imbalanced anomaly pixels, varying region areas, and various types of anomalies. First, the weighted average precision (wAP) metric is proposed as an alternative to AUROC and AUPRO, which does not need to be limited to a specific maximum FPR. Second, a novel neural architecture search method is proposed, which enables efficient segmentation of visual anomalies without any training. By leveraging a pre-trained supernet, a black-box optimization algorithm can directly minimize FLOPS and maximize wAP on a small validation set of anomalous examples. Finally, compelling results on the widely studied MVTec [3] dataset are presented, demonstrating that AutoPatch outperforms the current state-of-the-art method PatchCore [12] with more than 18x fewer FLOPS, using only one example per anomaly type. These results highlight the potential of automated machine learning to optimize throughput in industrial quality control. The code for AutoPatch is available at: https://github.com/tommiekerssies/AutoPatch
A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese
Authors: Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, Mário Amorim Lopes
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.08999
Pdf link: https://arxiv.org/pdf/2304.08999
Abstract Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over $10$ years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved $F_1$ scores of $88.6$, $95.0$, and $55.8$ per cent in the mention extraction of procedures, drugs, and diseases, respectively.
An Augmented Subspace Based Adaptive Proper Orthogonal Decomposition Method for Time Dependent Partial Differential Equations
Authors: Xiaoying Dai, Miao Hu, Jack Xin, Aihui Zhou
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.09007
Pdf link: https://arxiv.org/pdf/2304.09007
Abstract In this paper, we propose an augmented subspace based adaptive proper orthogonal decomposition (POD) method for solving the time dependent partial differential equations. By augmenting the POD subspace with some auxiliary modes, we obtain an augmented subspace. We use the difference between the approximation obtained in this augmented subspace and that obtained in the original POD subspace to construct an error indicator, by which we obtain a general framework for augmented subspace based adaptive POD method. We then provide two strategies to obtain some specific augmented subspaces, the random vector based augmented subspace and the coarse-grid approximations based augmented subspace. We apply our new method to two typical 3D advection-diffusion equations with the advection being the Kolmogorov flow and the ABC flow. Numerical results show that our method is more efficient than the existing adaptive POD methods, especially for the advection dominated models.
GUILGET: GUI Layout GEneration with Transformer
Authors: Andrey Sobolevsky, Guillaume-Alexandre Bilodeau, Jinghui Cheng, Jin L.C. Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09012
Pdf link: https://arxiv.org/pdf/2304.09012
Abstract Sketching out Graphical User Interface (GUI) layout is part of the pipeline of designing a GUI and a crucial task for the success of a software application. Arranging all components inside a GUI layout manually is a time-consuming task. In order to assist designers, we developed a method named GUILGET to automatically generate GUI layouts from positional constraints represented as GUI arrangement graphs (GUI-AGs). The goal is to support the initial step of GUI design by producing realistic and diverse GUI layouts. The existing image layout generation techniques often cannot incorporate GUI design constraints. Thus, GUILGET needs to adapt existing techniques to generate GUI layouts that obey to constraints specific to GUI designs. GUILGET is based on transformers in order to capture the semantic in relationships between elements from GUI-AG. Moreover, the model learns constraints through the minimization of losses responsible for placing each component inside its parent layout, for not letting components overlap if they are inside the same parent, and for component alignment. Our experiments, which are conducted on the CLAY dataset, reveal that our model has the best understanding of relationships from GUI-AG and has the best performances in most of evaluation metrics. Therefore, our work contributes to improved GUI layout generation by proposing a novel method that effectively accounts for the constraints on GUI elements and paves the road for a more efficient GUI design pipeline.
DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables
Authors: Darshan C. Ganji, Saad Ashfaq, Ehsan Saboori, Sudhakar Sah, Saptarshi Mitra, MohammadHossein AskariHemmat, Alexander Hoffman, Ahmed Hassanien, Mathieu Léonardon
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09049
Pdf link: https://arxiv.org/pdf/2304.09049
Abstract A lot of recent progress has been made in ultra low-bit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can achieve model accuracy that is comparable to full-precision floating-point baselines even with sub-byte quantization. However, it is extremely challenging to deploy these ultra low-bit quantized models on mainstream CPU devices because commodity SIMD (Single Instruction, Multiple Data) hardware typically supports no less than 8-bit precision. To overcome this limitation, we propose DeepGEMM, a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. The proposed method precomputes all possible products of weights and activations, stores them in a lookup table, and efficiently accesses them at inference time to avoid costly multiply-accumulate operations. Our 2-bit implementation outperforms corresponding 8-bit integer kernels in the QNNPACK framework by up to 1.74x on x86 platforms.
Revisiting k-NN for Pre-trained Language Models
Authors: Lei Li, Jing Chen, Bozhong Tian, Ningyu Zhang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09058
Pdf link: https://arxiv.org/pdf/2304.09058
Abstract Pre-trained Language Models (PLMs), as parametric-based eager learners, have become the de-facto choice for current paradigms of Natural Language Processing (NLP). In contrast, k-Nearest-Neighbor (k-NN) classifiers, as the lazy learning paradigm, tend to mitigate over-fitting and isolated noise. In this paper, we revisit k-NN classifiers for augmenting the PLMs-based classifiers. From the methodological level, we propose to adopt k-NN with textual representations of PLMs in two steps: (1) Utilize k-NN as prior knowledge to calibrate the training process. (2) Linearly interpolate the probability distribution predicted by k-NN with that of the PLMs' classifier. At the heart of our approach is the implementation of k-NN-calibrated training, which treats predicted results as indicators for easy versus hard examples during the training process. From the perspective of the diversity of application scenarios, we conduct extensive experiments on fine-tuning, prompt-tuning paradigms and zero-shot, few-shot and fully-supervised settings, respectively, across eight diverse end-tasks. We hope our exploration will encourage the community to revisit the power of classical methods for efficient NLP\footnote{Code and datasets are available in https://github.com/zjunlp/Revisit-KNN.
Always Strengthen Your Strengths: A Drift-Aware Incremental Learning Framework for CTR Prediction
Authors: Congcong Liu, Fei Teng, Xiwei Zhao, Zhangang Lin, Jinghe Hu, Jingping Shao
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09062
Pdf link: https://arxiv.org/pdf/2304.09062
Abstract Click-through rate (CTR) prediction is of great importance in recommendation systems and online advertising platforms. When served in industrial scenarios, the user-generated data observed by the CTR model typically arrives as a stream. Streaming data has the characteristic that the underlying distribution drifts over time and may recur. This can lead to catastrophic forgetting if the model simply adapts to new data distribution all the time. Also, it's inefficient to relearn distribution that has been occurred. Due to memory constraints and diversity of data distributions in large-scale industrial applications, conventional strategies for catastrophic forgetting such as replay, parameter isolation, and knowledge distillation are difficult to be deployed. In this work, we design a novel drift-aware incremental learning framework based on ensemble learning to address catastrophic forgetting in CTR prediction. With explicit error-based drift detection on streaming data, the framework further strengthens well-adapted ensembles and freezes ensembles that do not match the input distribution avoiding catastrophic interference. Both evaluations on offline experiments and A/B test shows that our method outperforms all baselines considered.
METAM: Goal-Oriented Data Discovery
Authors: Sainyam Galhotra, Yue Gong, Raul Castro Fernandez
Subjects: Databases (cs.DB); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09068
Pdf link: https://arxiv.org/pdf/2304.09068
Abstract Data is a central component of machine learning and causal inference tasks. The availability of large amounts of data from sources such as open data repositories, data lakes and data marketplaces creates an opportunity to augment data and boost those tasks' performance. However, augmentation techniques rely on a user manually discovering and shortlisting useful candidate augmentations. Existing solutions do not leverage the synergy between discovery and augmentation, thus under exploiting data. In this paper, we introduce METAM, a novel goal-oriented framework that queries the downstream task with a candidate dataset, forming a feedback loop that automatically steers the discovery and augmentation process. To select candidates efficiently, METAM leverages properties of the: i) data, ii) utility function, and iii) solution set size. We show METAM's theoretical guarantees and demonstrate those empirically on a broad set of tasks. All in all, we demonstrate the promise of goal-oriented data discovery to modern data science applications.
DRIFT: A Federated Recommender System with Implicit Feedback on the Items
Authors: Theo Nommay
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09084
Pdf link: https://arxiv.org/pdf/2304.09084
Abstract Nowadays there are more and more items available online, this makes it hard for users to find items that they like. Recommender systems aim to find the item who best suits the user, using his historical interactions. Depending on the context, these interactions may be more or less sensitive and collecting them brings an important problem concerning the users' privacy. Federated systems have shown that it is possible to make accurate and efficient recommendations without storing users' personal information. However, these systems use instantaneous feedback from the user. In this report, we propose DRIFT, a federated architecture for recommender systems, using implicit feedback. Our learning model is based on a recent algorithm for recommendation with implicit feedbacks SAROS. We aim to make recommendations as precise as SAROS, without compromising the users' privacy. In this report we show that thanks to our experiments, but also thanks to a theoretical analysis on the convergence. We have shown also that the computation time has a linear complexity with respect to the number of interactions made. Finally, we have shown that our algorithm is secure, and participants in our federated system cannot guess the interactions made by the user, except DOs that have the item involved in the interaction.
Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations
Authors: Haoxuan Li, Yanghao Xiao, Chunyuan Zheng, Peng Wu
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09085
Pdf link: https://arxiv.org/pdf/2304.09085
Abstract Recommender systems are seen as an effective tool to address information overload, but it is widely known that the presence of various biases makes direct training on large-scale observational data result in sub-optimal prediction performance. In contrast, unbiased ratings obtained from randomized controlled trials or A/B tests are considered to be the golden standard, but are costly and small in scale in reality. To exploit both types of data, recent works proposed to use unbiased ratings to correct the parameters of the propensity or imputation models trained on the biased dataset. However, the existing methods fail to obtain accurate predictions in the presence of unobserved confounding or model misspecification. In this paper, we propose a theoretically guaranteed model-agnostic balancing approach that can be applied to any existing debiasing method with the aim of combating unobserved confounding and model misspecification. The proposed approach makes full use of unbiased data by alternatively correcting model parameters learned with biased data, and adaptively learning balance coefficients of biased samples for further debiasing. Extensive real-world experiments are conducted along with the deployment of our proposal on four representative debiasing methods to demonstrate the effectiveness.
MATURE-HEALTH: HEALTH Recommender System for MAndatory FeaTURE choices
Authors: Ritu Shandilya, Sugam Sharma, Johnny Wong
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09099
Pdf link: https://arxiv.org/pdf/2304.09099
Abstract Balancing electrolytes is utmost important and essential for appropriate functioning of organs in human body as electrolytes imbalance can be an indication of the development of underlying pathophysiology. Efficient monitoring of electrolytes imbalance not only can increase the chances of early detection of disease, but also prevents the further deterioration of the health by strictly following nutrient controlled diet for balancing the electrolytes post disease detection. In this research, a recommender system MATURE Health is proposed and implemented, which predicts the imbalance of mandatory electrolytes and other substances presented in blood and recommends the food items with the balanced nutrients to avoid occurrence of the electrolytes imbalance. The proposed model takes user most recent laboratory results and daily food intake into account to predict the electrolytes imbalance. MATURE Health relies on MATURE Food algorithm to recommend food items as latter recommends only those food items that satisfy all mandatory nutrient requirements while also considering user past food preferences. To validate the proposed method, particularly sodium, potassium, and BUN levels have been predicted with prediction algorithm, Random Forest, for dialysis patients using their laboratory reports history and daily food intake. And, the proposed model demonstrates 99.53 percent, 96.94 percent and 95.35 percent accuracy for Sodium, Potassium, and BUN respectively. MATURE Health is a novel health recommender system that implements machine learning models to predict the imbalance of mandatory electrolytes and other substances in the blood and recommends the food items which contain the required amount of the nutrients that prevent or at least reduce the risk of the electrolytes imbalance.
LaSNN: Layer-wise ANN-to-SNN Distillation for Effective and Efficient Training in Deep Spiking Neural Networks
Authors: Di Hong, Jiangrong Shen, Yu Qi, Yueming Wang
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09101
Pdf link: https://arxiv.org/pdf/2304.09101
Abstract Spiking Neural Networks (SNNs) are biologically realistic and practically promising in low-power computation because of their event-driven mechanism. Usually, the training of SNNs suffers accuracy loss on various tasks, yielding an inferior performance compared with ANNs. A conversion scheme is proposed to obtain competitive accuracy by mapping trained ANNs' parameters to SNNs with the same structures. However, an enormous number of time steps are required for these converted SNNs, thus losing the energy-efficient benefit. Utilizing both the accuracy advantages of ANNs and the computing efficiency of SNNs, a novel SNN training framework is proposed, namely layer-wise ANN-to-SNN knowledge distillation (LaSNN). In order to achieve competitive accuracy and reduced inference latency, LaSNN transfers the learning from a well-trained ANN to a small SNN by distilling the knowledge other than converting the parameters of ANN. The information gap between heterogeneous ANN and SNN is bridged by introducing the attention scheme, the knowledge in an ANN is effectively compressed and then efficiently transferred by utilizing our layer-wise distillation paradigm. We conduct detailed experiments to demonstrate the effectiveness, efficacy, and scalability of LaSNN on three benchmark data sets (CIFAR-10, CIFAR-100, and Tiny ImageNet). We achieve competitive top-1 accuracy compared to ANNs and 20x faster inference than converted SNNs with similar performance. More importantly, LaSNN is dexterous and extensible that can be effortlessly developed for SNNs with different architectures/depths and input encoding methods, contributing to their potential development.
Fast Neural Scene Flow
Authors: Xueqian Li, Jianqiao Zheng, Francesco Ferroni, Jhony Kaesemodel Pontes, Simon Lucey
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09121
Pdf link: https://arxiv.org/pdf/2304.09121
Abstract Scene flow is an important problem as it provides low-level motion cues for many downstream tasks. State-of-the-art learning methods are usually fast and can achieve impressive performance on in-domain data, but usually fail to generalize to out-of-the-distribution (OOD) data or handle dense point clouds. In this paper, we focus on a runtime optimization-based neural scene flow pipeline. In (a) one can see its application in the densification of lidar. However, in (c) one sees that the major drawback is the extensive computation time. We identify that the common speedup strategy in network architectures for coordinate networks has little effect on scene flow acceleration [see green (b)] unlike image reconstruction [see pink (b)]. With the dominant computational burden stemming instead from the Chamfer loss function, we propose to use a distance transform-based loss function to accelerate [see purple (b)], which achieves up to 30x speedup and on-par estimation performance compared to NSFP [see (c)]. When tested on 8k points, it is as efficient [see (c)] as leading learning methods, achieving real-time performance.
Keyword: faster

Agent-Based Modeling and its Tradeoffs: An Introduction & Examples
Authors: G. Wade McDonald, Nathaniel D. Osgood
Subjects: Multiagent Systems (cs.MA); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2304.08497
Pdf link: https://arxiv.org/pdf/2304.08497
Abstract Agent-based modeling is a computational dynamic modeling technique that may be less familiar to some readers. Agent-based modeling seeks to understand the behaviour of complex systems by situating agents in an environment and studying the emergent outcomes of agent-agent and agent-environment interactions. In comparison with compartmental models, agent-based models offer simpler, more scalable and flexible representation of heterogeneity, the ability to capture dynamic and static network and spatial context, and the ability to consider history of individuals within the model. In contrast, compartmental models offer faster development time with less programming required, lower computational requirements that do not scale with population, and the option for concise mathematical formulation with ordinary, delay or stochastic differential equations supporting derivation of properties of the system behaviour. In this chapter, basic characteristics of agent-based models are introduced, advantages and disadvantages of agent-based models, as compared with compartmental models, are discussed, and two example agent-based infectious disease models are reviewed.
Hybrid Materialization in a Disk-Based Column-Store
Authors: Evgeniy Klyuchikov, Elena Mikhailova, George Chernishev
Subjects: Databases (cs.DB); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2304.08532
Pdf link: https://arxiv.org/pdf/2304.08532
Abstract In column-oriented query processing, a materialization strategy determines when lightweight positions (row IDs) are translated into tuples. It is an important part of column-store architecture, since it defines the class of supported query plans, and, therefore, impacts the overall system performance. In this paper we continue investigating materialization strategies for a distributed disk-based column-store. We start with demonstrating cases when existing approaches impose fundamental limitations on the resulting system performance. Then, in order to address them, we propose a new hybrid materialization model. The main feature of hybrid materialization is the ability to manipulate both positions and values at the same time. This way, query engine can flexibly combine advantages of all the existing strategies and support a new class of query plans. Moreover, hybrid materialization allows the query engine to flexibly customize the materialization policy of individual attributes. We describe our vision of how hybrid materialization can be implemented in a columnar system. As an example, we use PosDB~ -- a distributed, disk-based column-store. We present necessary data structures, the internals of a hybrid operator, and describe the algebra of such operators. Based on this implementation, we evaluate performance of late, ultra-late, and hybrid materialization strategies in several scenarios based on TPC-H queries. Our experiments demonstrate that hybrid materialization is almost two times faster than its counterparts, while providing a more flexible query model.
Stochastic Subgraph Neighborhood Pooling for Subgraph Classification
Authors: Shweta Ann Jacob, Paul Louis, Amirali Salehi-Abari
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2304.08556
Pdf link: https://arxiv.org/pdf/2304.08556
Abstract Subgraph classification is an emerging field in graph representation learning where the task is to classify a group of nodes (i.e., a subgraph) within a graph. Subgraph classification has applications such as predicting the cellular function of a group of proteins or identifying rare diseases given a collection of phenotypes. Graph neural networks (GNNs) are the de facto solution for node, link, and graph-level tasks but fail to perform well on subgraph classification tasks. Even GNNs tailored for graph classification are not directly transferable to subgraph classification as they ignore the external topology of the subgraph, thus failing to capture how the subgraph is located within the larger graph. The current state-of-the-art models for subgraph classification address this shortcoming through either labeling tricks or multiple message-passing channels, both of which impose a computation burden and are not scalable to large graphs. To address the scalability issue while maintaining generalization, we propose Stochastic Subgraph Neighborhood Pooling (SSNP), which jointly aggregates the subgraph and its neighborhood (i.e., external topology) information without any computationally expensive operations such as labeling tricks. To improve scalability and generalization further, we also propose a simple data augmentation pre-processing step for SSNP that creates multiple sparse views of the subgraph neighborhood. We show that our model is more expressive than GNNs without labeling tricks. Our extensive experiments demonstrate that our models outperform current state-of-the-art methods (with a margin of up to 2%) while being up to 3X faster in training.
LaSNN: Layer-wise ANN-to-SNN Distillation for Effective and Efficient Training in Deep Spiking Neural Networks
Authors: Di Hong, Jiangrong Shen, Yu Qi, Yueming Wang
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09101
Pdf link: https://arxiv.org/pdf/2304.09101
Abstract Spiking Neural Networks (SNNs) are biologically realistic and practically promising in low-power computation because of their event-driven mechanism. Usually, the training of SNNs suffers accuracy loss on various tasks, yielding an inferior performance compared with ANNs. A conversion scheme is proposed to obtain competitive accuracy by mapping trained ANNs' parameters to SNNs with the same structures. However, an enormous number of time steps are required for these converted SNNs, thus losing the energy-efficient benefit. Utilizing both the accuracy advantages of ANNs and the computing efficiency of SNNs, a novel SNN training framework is proposed, namely layer-wise ANN-to-SNN knowledge distillation (LaSNN). In order to achieve competitive accuracy and reduced inference latency, LaSNN transfers the learning from a well-trained ANN to a small SNN by distilling the knowledge other than converting the parameters of ANN. The information gap between heterogeneous ANN and SNN is bridged by introducing the attention scheme, the knowledge in an ANN is effectively compressed and then efficiently transferred by utilizing our layer-wise distillation paradigm. We conduct detailed experiments to demonstrate the effectiveness, efficacy, and scalability of LaSNN on three benchmark data sets (CIFAR-10, CIFAR-100, and Tiny ImageNet). We achieve competitive top-1 accuracy compared to ANNs and 20x faster inference than converted SNNs with similar performance. More importantly, LaSNN is dexterous and extensible that can be effortlessly developed for SNNs with different architectures/depths and input encoding methods, contributing to their potential development.
Keyword: mobile

Coordinated Multi-Agent Reinforcement Learning for Unmanned Aerial Vehicle Swarms in Autonomous Mobile Access Applications
Authors: Chanyoung Park, Haemin Lee, Won Joon Yun, Soyi Jung, Joongheon Kim
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.08493
Pdf link: https://arxiv.org/pdf/2304.08493
Abstract This paper proposes a novel centralized training and distributed execution (CTDE)-based multi-agent deep reinforcement learning (MADRL) method for multiple unmanned aerial vehicles (UAVs) control in autonomous mobile access applications. For the purpose, a single neural network is utilized in centralized training for cooperation among multiple agents while maximizing the total quality of service (QoS) in mobile access applications.
Safe Navigation and Obstacle Avoidance Using Differentiable Optimization Based Control Barrier Functions
Authors: Bolun Dai, Rooholla Khorrambakht, Prashanth Krishnamurthy, Vinícius Gonçalves, Anthony Tzes, Farshad Khorrami
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.08586
Pdf link: https://arxiv.org/pdf/2304.08586
Abstract Control barrier functions (CBFs) have been widely applied to safety-critical robotic applications. However, the construction of control barrier functions for robotic systems remains a challenging task. Recently, collision detection using differentiable optimization has provided a way to compute the minimum uniform scaling factor that results in an intersection between two convex shapes and to also compute the Jacobian of the scaling factor. In this paper, we propose a framework that uses this scaling factor, with an offset, to systematically define a CBF for obstacle avoidance tasks. We provide a theoretical analysis that proves the continuity of the proposed CBF. Empirically, we show that the proposed CBF is continuously differentiable, and the resulting optimal control problem is computationally efficient, which makes it applicable for real-time robotic control. We validate our approach, first using a 2D mobile robot example, then on the Franka-Emika Research~3 (FR3) robot manipulator both in simulation and experiment.
Graceful User Following for Mobile Balance Assistive Robot in Daily Activities Assistance
Authors: Yifan Wang, Meng Yuan, Lei Li, Karen Sui Geok Chua, Seng Kwee Wee, Wei Tech Ang
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.08695
Pdf link: https://arxiv.org/pdf/2304.08695
Abstract Numerous diseases and aging can cause degeneration of people's balance ability resulting in limited mobility and even high risks of fall. Robotic technologies can provide more intensive rehabilitation exercises or be used as assistive devices to compensate for balance ability. However, With the new healthcare paradigm shifting from hospital care to home care, there is a gap in robotic systems that can provide care at home. This paper introduces Mobile Robotic Balance Assistant (MRBA), a compact and cost-effective balance assistive robot that can provide both rehabilitation training and activities of daily living (ADLs) assistance at home. A three degrees of freedom (3-DoF) robotic arm was designed to mimic the therapist arm function to provide balance assistance to the user. To minimize the interference to users' natural pelvis movements and gait patterns, the robot must have a Human-Robot Interface(HRI) that can detect user intention accurately and follow the user's movement smoothly and timely. Thus, a graceful user following control rule was proposed. The overall control architecture consists of two parts: an observer for human inputs estimation and an LQR-based controller with disturbance rejection. The proposed controller is validated in high-fidelity simulation with actual human trajectories, and the results successfully show the effectiveness of the method in different walking modes.
AoI-Delay Tradeoff in Mobile Edge Caching: A Mixed-Order Drift-Plus-Penalty Algorithm
Authors: Ran Li, Chuan Huang, Xiaoqi Qin
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.08781
Pdf link: https://arxiv.org/pdf/2304.08781
Abstract We consider a scheduling problem in a Mobile Edge Caching (MEC) network, where a base station (BS) uploads messages from multiple source nodes (SNs) and transmits them to mobile users (MUs) via downlinks, aiming to jointly optimize the average service Age of Information (AoI) and service delay over MUs. This problem is formulated as a difficult sequential decision making problem with discrete-valued and linearly-constrained design variables. To solve this problem, we first approximate its achievable region by characterizing its superset and subset. The superset is derived based on the rate stability theorem, while the subset is obtained using a novel stochastic policy. We also validate that this subset is substantially identical to the achievable region when the number of schedule resources is large. Additionally, we propose a sufficient condition to check the existence of the solution to the problem. Then, we propose the mixed-order drift-plus-penalty algorithm that uses a dynamic programming (DP) method to optimize the summation over a linear and quadratic Lyapunov drift and a penalty term, to handle the product term over different queue backlogs in the objective function. Finally, by associating the proposed algorithm with the stochastic policy, we demonstrate that it achieves an $O(1/V)$ versus $O(V)$ tradeoff for the average AoI and average delay.
Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services
Authors: Minrui Xu, Dusit Niyato, Hongliang Zhang, Jiawen Kang, Zehui Xiong, Shiwen Mao, Zhu Han
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.08782
Pdf link: https://arxiv.org/pdf/2304.08782
Abstract Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation models (PFMs), e.g., generative pretrained transformers (GPTs), can effectively provide various AI services, such as autonomous driving, digital twins, and AI-generated content (AIGC) for extended reality. With the advantages of low latency and privacy-preserving, serving PFMs of mobile AI services in edge intelligence is a viable solution for caching and executing PFMs on edge servers with limited computing resources and GPU memory. However, PFMs typically consist of billions of parameters that are computation and memory-intensive for edge servers during loading and execution. In this article, we investigate edge PFM serving problems for mobile AIGC services of Metaverse. First, we introduce the fundamentals of PFMs and discuss their characteristic fine-tuning and inference methods in edge intelligence. Then, we propose a novel framework of joint model caching and inference for managing models and allocating resources to satisfy users' requests efficiently. Furthermore, considering the in-context learning ability of PFMs, we propose a new metric to evaluate the freshness and relevance between examples in demonstrations and executing tasks, namely the Age of Context (AoC). Finally, we propose a least context algorithm for managing cached models at edge servers by balancing the tradeoff among latency, energy consumption, and accuracy.
Full-Duplex Wireless for 6G: Progress Brings New Opportunities and Challenges
Authors: Besma Smida, Ashutosh Sabharwal, Gabor Fodor, George C. Alexandropoulos, Himal A. Suraweera, Chan-Byoung Chae
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.08789
Pdf link: https://arxiv.org/pdf/2304.08789
Abstract The use of in-band full-duplex (FD) enables nodes to simultaneously transmit and receive on the same frequency band, which challenges the traditional assumption in wireless network design. The full-duplex capability enhances spectral efficiency and decreases latency, which are two key drivers pushing the performance expectations of next-generation mobile networks. In less than ten years, in-band FD has advanced from being demonstrated in research labs to being implemented in standards and products, presenting new opportunities to utilize its foundational concepts. Some of the most significant opportunities include using FD to enable wireless networks to sense the physical environment, integrate sensing and communication applications, develop integrated access and backhaul solutions, and work with smart signal propagation environments powered by reconfigurable intelligent surfaces. However, these new opportunities also come with new challenges for large-scale commercial deployment of FD technology, such as managing self-interference, combating cross-link interference in multi-cell networks, and coexistence of dynamic time division duplex, subband FD and FD networks.
Event Camera and LiDAR based Human Tracking for Adverse Lighting Conditions in Subterranean Environments
Authors: Mario A.V. Saucedo, Akash Patel, Rucha Sawlekar, Akshit Saradagi, Christoforos Kanellakis, Ali-Akbar Agha-Mohammadi, George Nikolakopoulos
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08908
Pdf link: https://arxiv.org/pdf/2304.08908
Abstract In this article, we propose a novel LiDAR and event camera fusion modality for subterranean (SubT) environments for fast and precise object and human detection in a wide variety of adverse lighting conditions, such as low or no light, high-contrast zones and in the presence of blinding light sources. In the proposed approach, information from the event camera and LiDAR are fused to localize a human or an object-of-interest in a robot's local frame. The local detection is then transformed into the inertial frame and used to set references for a Nonlinear Model Predictive Controller (NMPC) for reactive tracking of humans or objects in SubT environments. The proposed novel fusion uses intensity filtering and K-means clustering on the LiDAR point cloud and frequency filtering and connectivity clustering on the events induced in an event camera by the returning LiDAR beams. The centroids of the clusters in the event camera and LiDAR streams are then paired to localize reflective markers present on safety vests and signs in SubT environments. The efficacy of the proposed scheme has been experimentally validated in a real SubT environment (a mine) with a Pioneer 3AT mobile robot. The experimental results show real-time performance for human detection and the NMPC-based controller allows for reactive tracking of a human or object of interest, even in complete darkness.
Continuous-Time Range-Only Pose Estimation
Authors: Abhishek Goudar, Timothy D. Barfoot, Angela P. Schoellig
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.09043
Pdf link: https://arxiv.org/pdf/2304.09043
Abstract Range-only (RO) localization involves determining the position of a mobile robot by measuring the distance to specific anchors. RO localization is challenging since the measurements are low-dimensional and a single range sensor does not have enough information to estimate the full pose of the robot. As such, range sensors are typically coupled with other sensing modalities such as wheel encoders or inertial measurement units (IMUs) to estimate the full pose. In this work, we propose a continuous-time Gaussian process (GP)- based trajectory estimation method to estimate the full pose of a robot using only range measurements from multiple range sensors. Results from simulation and real experiments show that our proposed method, using off-the-shelf range sensors, is able to achieve comparable performance and in some cases outperform alternative state-of-the-art sensor-fusion methods that use additional sensing modalities.
Designing the mobile robot Kevin for a life science laboratory
Authors: Sarah Kleine-Wechelmann, Kim Bastiaanse, Matthias Freundel, Christian Becker-Asano
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2304.09090
Pdf link: https://arxiv.org/pdf/2304.09090
Abstract Laboratories are being increasingly automated. In small laboratories individual processes can be fully automated, but this is usually not economically viable. Nevertheless, individual process steps can be performed by flexible, mobile robots to relieve the laboratory staff. As a contribution to the requirements in a life science laboratory the mobile, dextrous robot Kevin was designed by the Fraunhofer IPA research institute in Stuttgart, Germany. Kevin is a mobile service robot which is able to fulfill non-value adding activities such as transportation of labware. This paper gives an overview of Kevin's functionalities, its development process, and presents a preliminary study on how its lights and sounds improve user interaction.
Keyword: pruning

CyFormer: Accurate State-of-Health Prediction of Lithium-Ion Batteries via Cyclic Attention
Authors: Zhiqiang Nie, Jiankun Zhao, Qicheng Li, Yong Qin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.08502
Pdf link: https://arxiv.org/pdf/2304.08502
Abstract Predicting the State-of-Health (SoH) of lithium-ion batteries is a fundamental task of battery management systems on electric vehicles. It aims at estimating future SoH based on historical aging data. Most existing deep learning methods rely on filter-based feature extractors (e.g., CNN or Kalman filters) and recurrent time sequence models. Though efficient, they generally ignore cyclic features and the domain gap between training and testing batteries. To address this problem, we present CyFormer, a transformer-based cyclic time sequence model for SoH prediction. Instead of the conventional CNN-RNN structure, we adopt an encoder-decoder architecture. In the encoder, row-wise and column-wise attention blocks effectively capture intra-cycle and inter-cycle connections and extract cyclic features. In the decoder, the SoH queries cross-attend to these features to form the final predictions. We further utilize a transfer learning strategy to narrow the domain gap between the training and testing set. To be specific, we use fine-tuning to shift the model to a target working condition. Finally, we made our model more efficient by pruning. The experiment shows that our method attains an MAE of 0.75\% with only 10\% data for fine-tuning on a testing battery, surpassing prior methods by a large margin. Effective and robust, our method provides a potential solution for all cyclic time sequence prediction tasks.
Keyword: voxel

Generative modeling of living cells with SO(3)-equivariant implicit neural representations
Authors: David Wiesner, Julian Suk, Sven Dummer, Tereza Nečasová, Vladimír Ulman, David Svoboda, Jelmer M. Wolterink
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2304.08960
Pdf link: https://arxiv.org/pdf/2304.08960
Abstract Data-driven cell tracking and segmentation methods in biomedical imaging require diverse and information-rich training data. In cases where the number of training samples is limited, synthetic computer-generated data sets can be used to improve these methods. This requires the synthesis of cell shapes as well as corresponding microscopy images using generative models. To synthesize realistic living cell shapes, the shape representation used by the generative model should be able to accurately represent fine details and changes in topology, which are common in cells. These requirements are not met by 3D voxel masks, which are restricted in resolution, and polygon meshes, which do not easily model processes like cell growth and mitosis. In this work, we propose to represent living cell shapes as level sets of signed distance functions (SDFs) which are estimated by neural networks. We optimize a fully-connected neural network to provide an implicit representation of the SDF value at any point in a 3D+time domain, conditioned on a learned latent code that is disentangled from the rotation of the cell shape. We demonstrate the effectiveness of this approach on cells that exhibit rapid deformations (Platynereis dumerilii), cells that grow and divide (C. elegans), and cells that have growing and branching filopodial protrusions (A549 human lung carcinoma cells). A quantitative evaluation using shape features, Hausdorff distance, and Dice similarity coefficients of real and synthetic cell shapes shows that our model can generate topologically plausible complex cell shapes in 3D+time with high similarity to real living cell shapes. Finally, we show how microscopy images of living cells that correspond to our generated cell shapes can be synthesized using an image-to-image model.
Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-modal Distillation and Super-Voxel Clustering
Authors: Zisheng Chen, Hongbin Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08965
Pdf link: https://arxiv.org/pdf/2304.08965
Abstract Semantic segmentation of point clouds usually requires exhausting efforts of human annotations, hence it attracts wide attention to the challenging topic of learning from unlabeled or weaker forms of annotations. In this paper, we take the first attempt for fully unsupervised semantic segmentation of point clouds, which aims to delineate semantically meaningful objects without any form of annotations. Previous works of unsupervised pipeline on 2D images fails in this task of point clouds, due to: 1) Clustering Ambiguity caused by limited magnitude of data and imbalanced class distribution; 2) Irregularity Ambiguity caused by the irregular sparsity of point cloud. Therefore, we propose a novel framework, PointDC, which is comprised of two steps that handle the aforementioned problems respectively: Cross-Modal Distillation (CMD) and Super-Voxel Clustering (SVC). In the first stage of CMD, multi-view visual features are back-projected to the 3D space and aggregated to a unified point feature to distill the training of the point representation. In the second stage of SVC, the point features are aggregated to super-voxels and then fed to the iterative clustering process for excavating semantic classes. PointDC yields a significant improvement over the prior state-of-the-art unsupervised methods, on both the ScanNet-v2 (+18.4 mIoU) and S3DIS (+11.5 mIoU) semantic segmentation benchmarks.
Keyword: lidar

PALF: Pre-Annotation and Camera-LiDAR Late Fusion for the Easy Annotation of Point Clouds
Authors: Yucheng Zhang, Masaki Fukuda, Yasunori Ishii, Kyoko Ohshima, Takayoshi Yamashita
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.08591
Pdf link: https://arxiv.org/pdf/2304.08591
Abstract 3D object detection has become indispensable in the field of autonomous driving. To date, gratifying breakthroughs have been recorded in 3D object detection research, attributed to deep learning. However, deep learning algorithms are data-driven and require large amounts of annotated point cloud data for training and evaluation. Unlike 2D image labels, annotating point cloud data is difficult due to the limitations of sparsity, irregularity, and low resolution, which requires more manual work, and the annotation efficiency is much lower than 2D image.Therefore, we propose an annotation algorithm for point cloud data, which is pre-annotation and camera-LiDAR late fusion algorithm to easily and accurately annotate. The contributions of this study are as follows. We propose (1) a pre-annotation algorithm that employs 3D object detection and auto fitting for the easy annotation of point clouds, (2) a camera-LiDAR late fusion algorithm using 2D and 3D results for easily error checking, which helps annotators easily identify missing objects, and (3) a point cloud annotation evaluation pipeline to evaluate our experiments. The experimental results show that the proposed algorithm improves the annotating speed by 6.5 times and the annotation quality in terms of the 3D Intersection over Union and precision by 8.2 points and 5.6 points, respectively; additionally, the miss rate is reduced by 31.9 points.
(LC)$^2$: LiDAR-Camera Loop Constraints For Cross-Modal Place Recognition
Authors: Alex Junho Lee, Seungwon Song, Hyungtae Lim, Woojoo Lee, Hyun Myung
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.08660
Pdf link: https://arxiv.org/pdf/2304.08660
Abstract Localization has been a challenging task for autonomous navigation. A loop detection algorithm must overcome environmental changes for the place recognition and re-localization of robots. Therefore, deep learning has been extensively studied for the consistent transformation of measurements into localization descriptors. Street view images are easily accessible; however, images are vulnerable to appearance changes. LiDAR can robustly provide precise structural information. However, constructing a point cloud database is expensive, and point clouds exist only in limited places. Different from previous works that train networks to produce shared embedding directly between the 2D image and 3D point cloud, we transform both data into 2.5D depth images for matching. In this work, we propose a novel cross-matching method, called (LC)$^2$, for achieving LiDAR localization without a prior point cloud map. To this end, LiDAR measurements are expressed in the form of range images before matching them to reduce the modality discrepancy. Subsequently, the network is trained to extract localization descriptors from disparity and range images. Next, the best matches are employed as a loop factor in a pose graph. Using public datasets that include multiple sessions in significantly different lighting conditions, we demonstrated that LiDAR-based navigation systems could be optimized from image databases and vice versa.
Event Camera and LiDAR based Human Tracking for Adverse Lighting Conditions in Subterranean Environments
Authors: Mario A.V. Saucedo, Akash Patel, Rucha Sawlekar, Akshit Saradagi, Christoforos Kanellakis, Ali-Akbar Agha-Mohammadi, George Nikolakopoulos
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08908
Pdf link: https://arxiv.org/pdf/2304.08908
Abstract In this article, we propose a novel LiDAR and event camera fusion modality for subterranean (SubT) environments for fast and precise object and human detection in a wide variety of adverse lighting conditions, such as low or no light, high-contrast zones and in the presence of blinding light sources. In the proposed approach, information from the event camera and LiDAR are fused to localize a human or an object-of-interest in a robot's local frame. The local detection is then transformed into the inertial frame and used to set references for a Nonlinear Model Predictive Controller (NMPC) for reactive tracking of humans or objects in SubT environments. The proposed novel fusion uses intensity filtering and K-means clustering on the LiDAR point cloud and frequency filtering and connectivity clustering on the events induced in an event camera by the returning LiDAR beams. The centroids of the clusters in the event camera and LiDAR streams are then paired to localize reflective markers present on safety vests and signs in SubT environments. The efficacy of the proposed scheme has been experimentally validated in a real SubT environment (a mine) with a Pioneer 3AT mobile robot. The experimental results show real-time performance for human detection and the NMPC-based controller allows for reactive tracking of a human or object of interest, even in complete darkness.
Visual-LiDAR Odometry and Mapping with Monocular Scale Correction and Motion Compensation
Authors: Hanyu Cai, Ni Ou, Junzheng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.08978
Pdf link: https://arxiv.org/pdf/2304.08978
Abstract This paper presents a novel visual-LiDAR odometry and mapping method with low-drift characteristics. The proposed method is based on two popular approaches, ORB-SLAM and A-LOAM, with monocular scale correction and visual-assisted LiDAR motion compensation modifications. The scale corrector calculates the proportion between the depth of image keypoints recovered by triangulation and that provided by LiDAR, using an outlier rejection process for accuracy improvement. Concerning LiDAR motion compensation, the visual odometry approach gives the initial guesses of LiDAR motions for better performance. This methodology is not only applicable to high-resolution LiDAR but can also adapt to low-resolution LiDAR. To evaluate the proposed SLAM system's robustness and accuracy, we conducted experiments on the KITTI Odometry and S3E datasets. Experimental results illustrate that our method significantly outperforms standalone ORB-SLAM2 and A-LOAM. Furthermore, regarding the accuracy of visual odometry with scale correction, our method performs similarly to the stereo-mode ORB-SLAM2.
Fast Neural Scene Flow
Authors: Xueqian Li, Jianqiao Zheng, Francesco Ferroni, Jhony Kaesemodel Pontes, Simon Lucey
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09121
Pdf link: https://arxiv.org/pdf/2304.09121
Abstract Scene flow is an important problem as it provides low-level motion cues for many downstream tasks. State-of-the-art learning methods are usually fast and can achieve impressive performance on in-domain data, but usually fail to generalize to out-of-the-distribution (OOD) data or handle dense point clouds. In this paper, we focus on a runtime optimization-based neural scene flow pipeline. In (a) one can see its application in the densification of lidar. However, in (c) one sees that the major drawback is the extensive computation time. We identify that the common speedup strategy in network architectures for coordinate networks has little effect on scene flow acceleration [see green (b)] unlike image reconstruction [see pink (b)]. With the dominant computational burden stemming instead from the Chamfer loss function, we propose to use a distance transform-based loss function to accelerate [see purple (b)], which achieves up to 30x speedup and on-par estimation performance compared to NSFP [see (c)]. When tested on 8k points, it is as efficient [see (c)] as leading learning methods, achieving real-time performance.
Keyword: diffusion

Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
Authors: Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08577
Pdf link: https://arxiv.org/pdf/2304.08577
Abstract With the recent surge in popularity of AR/VR applications, realistic and accurate control of 3D full-body avatars has become a highly demanded feature. A particular challenge is that only a sparse tracking signal is available from standalone HMDs (Head Mounted Devices), often limited to tracking the user's head and wrists. While this signal is resourceful for reconstructing the upper body motion, the lower body is not tracked and must be synthesized from the limited information provided by the upper body joints. In this paper, we present AGRoL, a novel conditional diffusion model specifically designed to track full bodies given sparse upper-body tracking signals. Our model is based on a simple multi-layer perceptron (MLP) architecture and a novel conditioning scheme for motion data. It can predict accurate and smooth full-body motion, particularly the challenging lower body movement. Unlike common diffusion architectures, our compact architecture can run in real-time, making it suitable for online body-tracking applications. We train and evaluate our model on AMASS motion capture dataset, and demonstrate that our approach outperforms state-of-the-art methods in generated motion accuracy and smoothness. We further justify our design choices through extensive experiments and ablation studies.
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08818
Pdf link: https://arxiv.org/pdf/2304.08818
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos. Similarly, we temporally align diffusion model upsamplers, turning them into temporally consistent video super resolution models. We focus on two relevant real-world applications: Simulation of in-the-wild driving data and creative content creation with text-to-video modeling. In particular, we validate our Video LDM on real driving videos of resolution 512 x 1024, achieving state-of-the-art performance. Furthermore, our approach can easily leverage off-the-shelf pre-trained image LDMs, as we only need to train a temporal alignment model in that case. Doing so, we turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048. We show that the temporal layers trained in this way generalize to different fine-tuned text-to-image LDMs. Utilizing this property, we show the first results for personalized text-to-video generation, opening exciting directions for future content creation. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/
TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models
Authors: Yuwei Yin, Jean Kaddour, Xiang Zhang, Yixin Nie, Zhenguang Liu, Lingpeng Kong, Qi Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08821
Pdf link: https://arxiv.org/pdf/2304.08821
Abstract Data augmentation has been established as an efficacious approach to supplement useful information for low-resource datasets. Traditional augmentation techniques such as noise injection and image transformations have been widely used. In addition, generative data augmentation (GDA) has been shown to produce more diverse and flexible data. While generative adversarial networks (GANs) have been frequently used for GDA, they lack diversity and controllability compared to text-to-image diffusion models. In this paper, we propose TTIDA (Text-to-Text-to-Image Data Augmentation) to leverage the capabilities of large-scale pre-trained Text-to-Text (T2T) and Text-to-Image (T2I) generative models for data augmentation. By conditioning the T2I model on detailed descriptions produced by T2T models, we are able to generate photo-realistic labeled images in a flexible and controllable manner. Experiments on in-domain classification, cross-domain classification, and image captioning tasks show consistent improvements over other data augmentation baselines. Analytical studies in varied settings, including few-shot, long-tail, and adversarial, further reinforce the effectiveness of TTIDA in enhancing performance and increasing robustness.
Two-stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems
Authors: Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08841
Pdf link: https://arxiv.org/pdf/2304.08841
Abstract Source localization is the inverse problem of graph information dissemination and has broad practical applications. However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges. Recently, deep generative models, particularly diffusion models inspired by classical non-equilibrium thermodynamics, have made significant progress. While diffusion models have proven to be powerful in solving inverse problems and producing high-quality reconstructions, applying them directly to the source localization is infeasible for two reasons. Firstly, it is impossible to calculate the posterior disseminated results on a large-scale network for iterative denoising sampling, which would incur enormous computational costs. Secondly, in the existing methods for this field, the training data itself are ill-posed (many-to-one); thus simply transferring the diffusion model would only lead to local optima. To address these challenges, we propose a two-stage optimization framework, the source localization denoising diffusion model (SL-Diff). In the coarse stage, we devise the source proximity degrees as the supervised signals to generate coarse-grained source predictions. This aims to efficiently initialize the next stage, significantly reducing its convergence time and calibrating the convergence process. Furthermore, the introduction of cascade temporal information in this training method transforms the many-to-one mapping relationship into a one-to-one relationship, perfectly addressing the ill-posed problem. In the fine stage, we design a diffusion model for the graph inverse problem that can quantify the uncertainty in the dissemination. The proposed SL-Diff yields excellent prediction results within a reasonable sampling time at extensive experiments.
UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer
Authors: Soon Yau Cheong, Armin Mustafa, Andrew Gilbert
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08870
Pdf link: https://arxiv.org/pdf/2304.08870
Abstract Existing person image generative models can do either image generation or pose transfer but not both. We propose a unified diffusion model, UPGPT to provide a universal solution to perform all the person image tasks - generative, pose transfer, and editing. With fine-grained multimodality and disentanglement capabilities, our approach offers fine-grained control over the generation and the editing process of images using a combination of pose, text, and image, all without needing a semantic segmentation mask which can be challenging to obtain or edit. We also pioneer the parameterized body SMPL model in pose-guided person image generation to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining a person's appearance. Results on the benchmark DeepFashion dataset show that UPGPT is the new state-of-the-art while simultaneously pioneering new capabilities of edit and pose transfer in human image generation.
An Augmented Subspace Based Adaptive Proper Orthogonal Decomposition Method for Time Dependent Partial Differential Equations
Authors: Xiaoying Dai, Miao Hu, Jack Xin, Aihui Zhou
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.09007
Pdf link: https://arxiv.org/pdf/2304.09007
Abstract In this paper, we propose an augmented subspace based adaptive proper orthogonal decomposition (POD) method for solving the time dependent partial differential equations. By augmenting the POD subspace with some auxiliary modes, we obtain an augmented subspace. We use the difference between the approximation obtained in this augmented subspace and that obtained in the original POD subspace to construct an error indicator, by which we obtain a general framework for augmented subspace based adaptive POD method. We then provide two strategies to obtain some specific augmented subspaces, the random vector based augmented subspace and the coarse-grid approximations based augmented subspace. We apply our new method to two typical 3D advection-diffusion equations with the advection being the Kolmogorov flow and the ABC flow. Numerical results show that our method is more efficient than the existing adaptive POD methods, especially for the advection dominated models.
Look ATME: The Discriminator Mean Entropy Needs Attention
Authors: Edgardo Solano-Carrillo, Angel Bueno Rodriguez, Borja Carrillo-Perez, Yannik Steiniger, Jannis Stoppe
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09024
Pdf link: https://arxiv.org/pdf/2304.09024
Abstract Generative adversarial networks (GANs) are successfully used for image synthesis but are known to face instability during training. In contrast, probabilistic diffusion models (DMs) are stable and generate high-quality images, at the cost of an expensive sampling procedure. In this paper, we introduce a simple method to allow GANs to stably converge to their theoretical optimum, while bringing in the denoising machinery from DMs. These models are combined into a simpler model (ATME) that only requires a forward pass during inference, making predictions cheaper and more accurate than DMs and popular GANs. ATME breaks an information asymmetry existing in most GAN models in which the discriminator has spatial knowledge of where the generator is failing. To restore the information symmetry, the generator is endowed with knowledge of the entropic state of the discriminator, which is leveraged to allow the adversarial game to converge towards equilibrium. We demonstrate the power of our method in several image-to-image translation tasks, showing superior performance than state-of-the-art methods at a lesser cost. Code is available at https://github.com/DLR-MI/atme
Keyword: dynamic

Agent-Based Modeling and its Tradeoffs: An Introduction & Examples
Authors: G. Wade McDonald, Nathaniel D. Osgood
Subjects: Multiagent Systems (cs.MA); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2304.08497
Pdf link: https://arxiv.org/pdf/2304.08497
Abstract Agent-based modeling is a computational dynamic modeling technique that may be less familiar to some readers. Agent-based modeling seeks to understand the behaviour of complex systems by situating agents in an environment and studying the emergent outcomes of agent-agent and agent-environment interactions. In comparison with compartmental models, agent-based models offer simpler, more scalable and flexible representation of heterogeneity, the ability to capture dynamic and static network and spatial context, and the ability to consider history of individuals within the model. In contrast, compartmental models offer faster development time with less programming required, lower computational requirements that do not scale with population, and the option for concise mathematical formulation with ordinary, delay or stochastic differential equations supporting derivation of properties of the system behaviour. In this chapter, basic characteristics of agent-based models are introduced, advantages and disadvantages of agent-based models, as compared with compartmental models, are discussed, and two example agent-based infectious disease models are reviewed.
A comparison between Recurrent Neural Networks and classical machine learning approaches In Laser induced breakdown spectroscopy
Authors: Fatemeh Rezaei, Pouriya Khaliliyan, Mohsen Rezaei, Parvin Karimi, Behnam Ashrafkhani
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08500
Pdf link: https://arxiv.org/pdf/2304.08500
Abstract Recurrent Neural Networks are classes of Artificial Neural Networks that establish connections between different nodes form a directed or undirected graph for temporal dynamical analysis. In this research, the laser induced breakdown spectroscopy (LIBS) technique is used for quantitative analysis of aluminum alloys by different Recurrent Neural Network (RNN) architecture. The fundamental harmonic (1064 nm) of a nanosecond Nd:YAG laser pulse is employed to generate the LIBS plasma for the prediction of constituent concentrations of the aluminum standard samples. Here, Recurrent Neural Networks based on different networks, such as Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Simple Recurrent Neural Network (Simple RNN), and as well as Recurrent Convolutional Networks comprising of Conv-SimpleRNN, Conv-LSTM and Conv-GRU are utilized for concentration prediction. Then a comparison is performed among prediction by classical machine learning methods of support vector regressor (SVR), the Multi Layer Perceptron (MLP), Decision Tree algorithm, Gradient Boosting Regression (GBR), Random Forest Regression (RFR), Linear Regression, and k-Nearest Neighbor (KNN) algorithm. Results showed that the machine learning tools based on Convolutional Recurrent Networks had the best efficiencies in prediction of the most of the elements among other multivariate methods.
Robust Control Barrier Functions with Uncertainty Estimation
Authors: Ersin Daş, Skylar X. Wei, Joel W. Burdick
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.08538
Pdf link: https://arxiv.org/pdf/2304.08538
Abstract This paper proposes a safety controller for control-affine nonlinear systems with unmodelled dynamics and disturbances to improve closed-loop robustness. Uncertainty estimation-based control barrier functions (CBFs) are utilized to ensure robust safety in the presence of model uncertainties, which may depend on control input and states. We present a new uncertainty/disturbance estimator with theoretical upper bounds on estimation error and estimated outputs, which are used to ensure robust safety by formulating a convex optimization problem using a high-order CBF. The possibly unsafe nominal feedback controller is augmented with the proposed estimator in two frameworks (1) an uncertainty compensator and (2) a robustifying reformulation of CBF constraint with respect to the estimator outputs. The former scheme ensures safety with performance improvement by adaptively rejecting the matched uncertainty. The second method uses uncertainty estimation to robustify higher-order CBFs for safety-critical control. The proposed methods are demonstrated in simulations of an uncertain adaptive cruise control problem and a multirotor obstacle avoidance situation.
RS2G: Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception and Scenario Understanding
Authors: Arnav Vaibhav Malawade, Shih-Yuan Yu, Junyao Wang, Mohammad Abdullah Al Faruque
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08600
Pdf link: https://arxiv.org/pdf/2304.08600
Abstract Human drivers naturally reason about interactions between road users to understand and safely navigate through traffic. Thus, developing autonomous vehicles necessitates the ability to mimic such knowledge and model interactions between road users to understand and navigate unpredictable, dynamic environments. However, since real-world scenarios often differ from training datasets, effectively modeling the behavior of various road users in an environment remains a significant research challenge. This reality necessitates models that generalize to a broad range of domains and explicitly model interactions between road users and the environment to improve scenario understanding. Graph learning methods address this problem by modeling interactions using graph representations of scenarios. However, existing methods cannot effectively transfer knowledge gained from the training domain to real-world scenarios. This constraint is caused by the domain-specific rules used for graph extraction that can vary in effectiveness across domains, limiting generalization ability. To address these limitations, we propose RoadScene2Graph (RS2G): a data-driven graph extraction and modeling approach that learns to extract the best graph representation of a road scene for solving autonomous scene understanding tasks. We show that RS2G enables better performance at subjective risk assessment than rule-based graph extraction methods and deep-learning-based models. RS2G also improves generalization and Sim2Real transfer learning, which denotes the ability to transfer knowledge gained from simulation datasets to unseen real-world scenarios. We also present ablation studies showing how RS2G produces a more useful graph representation for downstream classifiers. Finally, we show how RS2G can identify the relative importance of rule-based graph edges and enables intelligent graph sparsity tuning.
Dynamic Vector Bin Packing for Online Resource Allocation in the Cloud
Authors: Aniket Murhekar, David Arbour, Tung Mai, Anup Rao
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.08648
Pdf link: https://arxiv.org/pdf/2304.08648
Abstract Several cloud-based applications, such as cloud gaming, rent servers to execute jobs which arrive in an online fashion. Each job has a resource demand and must be dispatched to a cloud server which has enough resources to execute the job, which departs after its completion. Under the `pay-as-you-go' billing model, the server rental cost is proportional to the total time that servers are actively running jobs. The problem of efficiently allocating a sequence of online jobs to servers without exceeding the resource capacity of any server while minimizing total server usage time can be modelled as a variant of the dynamic bin packing problem (DBP), called MinUsageTime DBP. In this work, we initiate the study of the problem with multi-dimensional resource demands (e.g. CPU/GPU usage, memory requirement, bandwidth usage, etc.), called MinUsageTime Dynamic Vector Bin Packing (DVBP). We study the competitive ratio (CR) of Any Fit packing algorithms for this problem. We show almost-tight bounds on the CR of three specific Any Fit packing algorithms, namely First Fit, Next Fit, and Move To Front. We prove that the CR of Move To Front is at most $(2\mu+1)d +1$, where $\mu$ is the ratio of the max/min item durations. For $d=1$, this significantly improves the previously known upper bound of $6\mu+7$ (Kamali & Lopez-Ortiz, 2015). We then prove the CR of First Fit and Next Fit are bounded by $(\mu+2)d+1$ and $2\mu d+1$, respectively. Next, we prove a lower bound of $(\mu+1)d$ on the CR of any Any Fit packing algorithm, an improved lower bound of $2\mu d$ for Next Fit, and a lower bound of $2\mu$ for Move To Front in the 1-D case. All our bounds improve or match the best-known bounds for the 1-D case. Finally, we experimentally study the average-case performance of these algorithms on randomly generated synthetic data, and observe that Move To Front outperforms other Any Fit packing algorithms.
Mechanical Intelligence Simplifies Control in Terrestrial Limbless Locomotion
Authors: Tianyu Wang, Christopher Pierce, Velin Kojouharov, Baxi Chong, Kelimar Diaz, Hang Lu, Daniel I. Goldman
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.08652
Pdf link: https://arxiv.org/pdf/2304.08652
Abstract Limbless locomotors, from microscopic worms to macroscopic snakes, traverse complex, heterogeneous natural environments typically using undulatory body wave propagation. Theoretical and robophysical models typically emphasize body kinematics and active neural/electronic control. However, we contend that because such approaches often neglect the role of passive, mechanically controlled processes (i.e., those involving mechanical intelligence), they fail to reproduce the performance of even the simplest organisms. To discover principles of how mechanical intelligence aids limbless locomotion in heterogeneous terradynamic regimes, here we conduct a comparative study of locomotion in a model of heterogeneous terrain (lattices of rigid posts). We use a model biological system, the highly studied nematode worm C. elegans, and a novel robophysical device whose bilateral actuator morphology models that of limbless organisms across scales. The robot's kinematics quantitatively reproduce the performance of the nematodes with purely open-loop control; mechanical intelligence simplifies control of obstacle navigation and exploitation by reducing the need for active sensing and feedback. An active behavior observed in C. elegans, undulatory wave reversal upon head collisions, robustifies locomotion via exploitation of the systems' mechanical intelligence. Our study provides insights into how neurally simple limbless organisms like nematodes can leverage mechanical intelligence via appropriately tuned bilateral actuation to locomote in complex environments. These principles likely apply to neurally more sophisticated organisms and also provide a new design and control paradigm for limbless robots for applications like search and rescue and planetary exploration.
RPDP: An Efficient Data Placement based on Residual Performance for P2P Storage Systems
Authors: Fitrio Pakana, Nasrin Sohrabi, Chenhao Xu, Zahir Tari, Hai Dong
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.08692
Pdf link: https://arxiv.org/pdf/2304.08692
Abstract Storage systems using Peer-to-Peer (P2P) architecture are an alternative to the traditional client-server systems. They offer better scalability and fault tolerance while at the same time eliminate the single point of failure. The nature of P2P storage systems (which consist of heterogeneous nodes) introduce however data placement challenges that create implementation trade-offs (e.g., between performance and scalability). Existing Kademlia-based DHT data placement method stores data at closest node, where the distance is measured by bit-wise XOR operation between data and a given node. This approach is highly scalable because it does not require global knowledge for placing data nor for the data retrieval. It does not however consider the heterogeneous performance of the nodes, which can result in imbalanced resource usage affecting the overall latency of the system. Other works implement criteria-based selection that addresses heterogeneity of nodes, however often cause subsequent data retrieval to require global knowledge of where the data stored. This paper introduces Residual Performance-based Data Placement (RPDP), a novel data placement method based on dynamic temporal residual performance of data nodes. RPDP places data to most appropriate selected nodes based on their throughput and latency with the aim to achieve lower overall latency by balancing data distribution with respect to the individual performance of nodes. RPDP relies on Kademlia-based DHT with modified data structure to allow data subsequently retrieved without the need of global knowledge. The experimental results indicate that RPDP reduces the overall latency of the baseline Kademlia-based P2P storage system (by 4.87%) and it also reduces the variance of latency among the nodes, with minimal impact to the data retrieval complexity.
Super-Logarithmic Lower Bounds for Dynamic Graph Problems
Authors: Kasper Green Larsen, Huacheng Yu
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2304.08745
Pdf link: https://arxiv.org/pdf/2304.08745
Abstract In this work, we prove a $\tilde{\Omega}(\lg^{3/2} n )$ unconditional lower bound on the maximum of the query time and update time for dynamic data structures supporting reachability queries in $n$-node directed acyclic graphs under edge insertions. This is the first super-logarithmic lower bound for any natural graph problem. In proving the lower bound, we also make novel contributions to the state-of-the-art data structure lower bound techniques that we hope may lead to further progress in proving lower bounds.
Cooperative Multi-Agent Reinforcement Learning for Inventory Management
Authors: Madhav Khirwar, Karthik S. Gurumoorthy, Ankit Ajit Jain, Shantala Manchenahally
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2304.08769
Pdf link: https://arxiv.org/pdf/2304.08769
Abstract With Reinforcement Learning (RL) for inventory management (IM) being a nascent field of research, approaches tend to be limited to simple, linear environments with implementations that are minor modifications of off-the-shelf RL algorithms. Scaling these simplistic environments to a real-world supply chain comes with a few challenges such as: minimizing the computational requirements of the environment, specifying agent configurations that are representative of dynamics at real world stores and warehouses, and specifying a reward framework that encourages desirable behavior across the whole supply chain. In this work, we present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores, a novel architecture for agent-environment dynamics incorporating enhanced state and action spaces, and a shared reward specification that seeks to optimize for a large retailer's supply chain needs. Each vertex in the supply chain graph is an independent agent that, based on its own inventory, able to place replenishment orders to the vertex upstream. The warehouse agent, aside from placing orders from the supplier, has the special property of also being able to constrain replenishment to stores downstream, which results in it learning an additional allocation sub-policy. We achieve a system that outperforms standard inventory control policies such as a base-stock policy and other RL-based specifications for 1 product, and lay out a future direction of work for multiple products.
Neuromorphic Control using Input-Weighted Threshold Adaptation
Authors: Stein Stroobants, Christophe De Wagter, Guido C.H.E. de Croon
Subjects: Robotics (cs.RO); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2304.08778
Pdf link: https://arxiv.org/pdf/2304.08778
Abstract Neuromorphic processing promises high energy efficiency and rapid response rates, making it an ideal candidate for achieving autonomous flight of resource-constrained robots. It will be especially beneficial for complex neural networks as are involved in high-level visual perception. However, fully neuromorphic solutions will also need to tackle low-level control tasks. Remarkably, it is currently still challenging to replicate even basic low-level controllers such as proportional-integral-derivative (PID) controllers. Specifically, it is difficult to incorporate the integral and derivative parts. To address this problem, we propose a neuromorphic controller that incorporates proportional, integral, and derivative pathways during learning. Our approach includes a novel input threshold adaptation mechanism for the integral pathway. This Input-Weighted Threshold Adaptation (IWTA) introduces an additional weight per synaptic connection, which is used to adapt the threshold of the post-synaptic neuron. We tackle the derivative term by employing neurons with different time constants. We first analyze the performance and limits of the proposed mechanisms and then put our controller to the test by implementing it on a microcontroller connected to the open-source tiny Crazyflie quadrotor, replacing the innermost rate controller. We demonstrate the stability of our bio-inspired algorithm with flights in the presence of disturbances. The current work represents a substantial step towards controlling highly dynamic systems with neuromorphic algorithms, thus advancing neuromorphic processing and robotics. In addition, integration is an important part of any temporal task, so the proposed Input-Weighted Threshold Adaptation (IWTA) mechanism may have implications well beyond control tasks.
AoI-Delay Tradeoff in Mobile Edge Caching: A Mixed-Order Drift-Plus-Penalty Algorithm
Authors: Ran Li, Chuan Huang, Xiaoqi Qin
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.08781
Pdf link: https://arxiv.org/pdf/2304.08781
Abstract We consider a scheduling problem in a Mobile Edge Caching (MEC) network, where a base station (BS) uploads messages from multiple source nodes (SNs) and transmits them to mobile users (MUs) via downlinks, aiming to jointly optimize the average service Age of Information (AoI) and service delay over MUs. This problem is formulated as a difficult sequential decision making problem with discrete-valued and linearly-constrained design variables. To solve this problem, we first approximate its achievable region by characterizing its superset and subset. The superset is derived based on the rate stability theorem, while the subset is obtained using a novel stochastic policy. We also validate that this subset is substantially identical to the achievable region when the number of schedule resources is large. Additionally, we propose a sufficient condition to check the existence of the solution to the problem. Then, we propose the mixed-order drift-plus-penalty algorithm that uses a dynamic programming (DP) method to optimize the summation over a linear and quadratic Lyapunov drift and a penalty term, to handle the product term over different queue backlogs in the objective function. Finally, by associating the proposed algorithm with the stochastic policy, we demonstrate that it achieves an $O(1/V)$ versus $O(V)$ tradeoff for the average AoI and average delay.
Full-Duplex Wireless for 6G: Progress Brings New Opportunities and Challenges
Authors: Besma Smida, Ashutosh Sabharwal, Gabor Fodor, George C. Alexandropoulos, Himal A. Suraweera, Chan-Byoung Chae
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.08789
Pdf link: https://arxiv.org/pdf/2304.08789
Abstract The use of in-band full-duplex (FD) enables nodes to simultaneously transmit and receive on the same frequency band, which challenges the traditional assumption in wireless network design. The full-duplex capability enhances spectral efficiency and decreases latency, which are two key drivers pushing the performance expectations of next-generation mobile networks. In less than ten years, in-band FD has advanced from being demonstrated in research labs to being implemented in standards and products, presenting new opportunities to utilize its foundational concepts. Some of the most significant opportunities include using FD to enable wireless networks to sense the physical environment, integrate sensing and communication applications, develop integrated access and backhaul solutions, and work with smart signal propagation environments powered by reconfigurable intelligent surfaces. However, these new opportunities also come with new challenges for large-scale commercial deployment of FD technology, such as managing self-interference, combating cross-link interference in multi-cell networks, and coexistence of dynamic time division duplex, subband FD and FD networks.
Large-scale Dynamic Network Representation via Tensor Ring Decomposition
Authors: Qu Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08798
Pdf link: https://arxiv.org/pdf/2304.08798
Abstract Large-scale Dynamic Networks (LDNs) are becoming increasingly important in the Internet age, yet the dynamic nature of these networks captures the evolution of the network structure and how edge weights change over time, posing unique challenges for data analysis and modeling. A Latent Factorization of Tensors (LFT) model facilitates efficient representation learning for a LDN. But the existing LFT models are almost based on Canonical Polyadic Factorization (CPF). Therefore, this work proposes a model based on Tensor Ring (TR) decomposition for efficient representation learning for a LDN. Specifically, we incorporate the principle of single latent factor-dependent, non-negative, and multiplicative update (SLF-NMU) into the TR decomposition model, and analyze the particular bias form of TR decomposition. Experimental studies on two real LDNs demonstrate that the propose method achieves higher accuracy than existing models.
Neuromorphic computing for attitude estimation onboard quadrotors
Authors: Stein Stroobants, Julien Dupeyroux, Guido C.H.E. de Croon
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.08802
Pdf link: https://arxiv.org/pdf/2304.08802
Abstract Compelling evidence has been given for the high energy efficiency and update rates of neuromorphic processors, with performance beyond what standard Von Neumann architectures can achieve. Such promising features could be advantageous in critical embedded systems, especially in robotics. To date, the constraints inherent in robots (e.g., size and weight, battery autonomy, available sensors, computing resources, processing time, etc.), and particularly in aerial vehicles, severely hamper the performance of fully-autonomous on-board control, including sensor processing and state estimation. In this work, we propose a spiking neural network (SNN) capable of estimating the pitch and roll angles of a quadrotor in highly dynamic movements from 6-degree of freedom Inertial Measurement Unit (IMU) data. With only 150 neurons and a limited training dataset obtained using a quadrotor in a real world setup, the network shows competitive results as compared to state-of-the-art, non-neuromorphic attitude estimators. The proposed architecture was successfully tested on the Loihi neuromorphic processor on-board a quadrotor to estimate the attitude when flying. Our results show the robustness of neuromorphic attitude estimation and pave the way towards energy-efficient, fully autonomous control of quadrotors with dedicated neuromorphic computing systems.
Towards the Transferable Audio Adversarial Attack via Ensemble Methods
Authors: Feng Guo, Zheng Sun, Yuxuan Chen, Lei Ju
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2304.08811
Pdf link: https://arxiv.org/pdf/2304.08811
Abstract In recent years, deep learning (DL) models have achieved significant progress in many domains, such as autonomous driving, facial recognition, and speech recognition. However, the vulnerability of deep learning models to adversarial attacks has raised serious concerns in the community because of their insufficient robustness and generalization. Also, transferable attacks have become a prominent method for black-box attacks. In this work, we explore the potential factors that impact adversarial examples (AEs) transferability in DL-based speech recognition. We also discuss the vulnerability of different DL systems and the irregular nature of decision boundaries. Our results show a remarkable difference in the transferability of AEs between speech and images, with the data relevance being low in images but opposite in speech recognition. Motivated by dropout-based ensemble approaches, we propose random gradient ensembles and dynamic gradient-weighted ensembles, and we evaluate the impact of ensembles on the transferability of AEs. The results show that the AEs created by both approaches are valid for transfer to the black box API.
Motion-state Alignment for Video Semantic Segmentation
Authors: Jinming Su, Ruihong Yin, Shuaibin Zhang, Junfeng Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08820
Pdf link: https://arxiv.org/pdf/2304.08820
Abstract In recent years, video semantic segmentation has made great progress with advanced deep neural networks. However, there still exist two main challenges \ie, information inconsistency and computation cost. To deal with the two difficulties, we propose a novel motion-state alignment framework for video semantic segmentation to keep both motion and state consistency. In the framework, we first construct a motion alignment branch armed with an efficient decoupled transformer to capture dynamic semantics, guaranteeing region-level temporal consistency. Then, a state alignment branch composed of a stage transformer is designed to enrich feature spaces for the current frame to extract static semantics and achieve pixel-level state consistency. Next, by a semantic assignment mechanism, the region descriptor of each semantic category is gained from dynamic semantics and linked with pixel descriptors from static semantics. Benefiting from the alignment of these two kinds of effective information, the proposed method picks up dynamic and static semantics in a targeted way, so that video semantic regions are consistently segmented to obtain precise locations with low computational complexity. Extensive experiments on Cityscapes and CamVid datasets show that the proposed approach outperforms state-of-the-art methods and validates the effectiveness of the motion-state alignment framework.
GoferBot: A Visual Guided Human-Robot Collaborative Assembly System
Authors: Zheyu Zhuang, Yizhak Ben-Shabat, Jiahao Zhang, Stephen Gould, Robert Mahony
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08840
Pdf link: https://arxiv.org/pdf/2304.08840
Abstract The current transformation towards smart manufacturing has led to a growing demand for human-robot collaboration (HRC) in the manufacturing process. Perceiving and understanding the human co-worker's behaviour introduces challenges for collaborative robots to efficiently and effectively perform tasks in unstructured and dynamic environments. Integrating recent data-driven machine vision capabilities into HRC systems is a logical next step in addressing these challenges. However, in these cases, off-the-shelf components struggle due to generalisation limitations. Real-world evaluation is required in order to fully appreciate the maturity and robustness of these approaches. Furthermore, understanding the pure-vision aspects is a crucial first step before combining multiple modalities in order to understand the limitations. In this paper, we propose GoferBot, a novel vision-based semantic HRC system for a real-world assembly task. It is composed of a visual servoing module that reaches and grasps assembly parts in an unstructured multi-instance and dynamic environment, an action recognition module that performs human action prediction for implicit communication, and a visual handover module that uses the perceptual understanding of human behaviour to produce an intuitive and efficient collaborative assembly experience. GoferBot is a novel assembly system that seamlessly integrates all sub-modules by utilising implicit semantic information purely from visual perception.
Two-stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems
Authors: Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.08841
Pdf link: https://arxiv.org/pdf/2304.08841
Abstract Source localization is the inverse problem of graph information dissemination and has broad practical applications. However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges. Recently, deep generative models, particularly diffusion models inspired by classical non-equilibrium thermodynamics, have made significant progress. While diffusion models have proven to be powerful in solving inverse problems and producing high-quality reconstructions, applying them directly to the source localization is infeasible for two reasons. Firstly, it is impossible to calculate the posterior disseminated results on a large-scale network for iterative denoising sampling, which would incur enormous computational costs. Secondly, in the existing methods for this field, the training data itself are ill-posed (many-to-one); thus simply transferring the diffusion model would only lead to local optima. To address these challenges, we propose a two-stage optimization framework, the source localization denoising diffusion model (SL-Diff). In the coarse stage, we devise the source proximity degrees as the supervised signals to generate coarse-grained source predictions. This aims to efficiently initialize the next stage, significantly reducing its convergence time and calibrating the convergence process. Furthermore, the introduction of cascade temporal information in this training method transforms the many-to-one mapping relationship into a one-to-one relationship, perfectly addressing the ill-posed problem. In the fine stage, we design a diffusion model for the graph inverse problem that can quantify the uncertainty in the dissemination. The proposed SL-Diff yields excellent prediction results within a reasonable sampling time at extensive experiments.
PEGA: Personality-Guided Preference Aggregator for Ephemeral Group Recommendation
Authors: Guangze Ye, Wen Wu, Liye Shi, Wenxin Hu, Xin Chen, Liang He
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2304.08851
Pdf link: https://arxiv.org/pdf/2304.08851
Abstract Recently, making recommendations for ephemeral groups which contain dynamic users and few historic interactions have received an increasing number of attention. The main challenge of ephemeral group recommender is how to aggregate individual preferences to represent the group's overall preference. Score aggregation and preference aggregation are two commonly-used methods that adopt hand-craft predefined strategies and data-driven strategies, respectively. However, they neglect to take into account the importance of the individual inherent factors such as personality in the group. In addition, they fail to work well due to a small number of interactive records. To address these issues, we propose a Personality-Guided Preference Aggregator (PEGA) for ephemeral group recommendation. Concretely, we first adopt hyper-rectangle to define the concept of Group Personality. We then use the personality attention mechanism to aggregate group preferences. The role of personality in our approach is twofold: (1) To estimate individual users' importance in a group and provide explainability; (2) to alleviate the data sparsity issue that occurred in ephemeral groups. The experimental results demonstrate that our model significantly outperforms the state-of-the-art methods w.r.t. the score of both Recall and NDCG on Amazon and Yelp datasets.
Secured and Cooperative Publish/Subscribe Scheme in Autonomous Vehicular Networks
Authors: Yuntao Wang, Zhou Su, Qichao Xu, Tom H. Luan, Rongxing Lu
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2304.08875
Pdf link: https://arxiv.org/pdf/2304.08875
Abstract In order to save computing power yet enhance safety, there is a strong intention for autonomous vehicles (AVs) in future to drive collaboratively by sharing sensory data and computing results among neighbors. However, the intense collaborative computing and data transmissions among unknown others will inevitably introduce severe security concerns. Aiming at addressing security concerns in future AVs, in this paper, we develop SPAD, a secured framework to forbid free-riders and {promote trustworthy data dissemination} in collaborative autonomous driving. Specifically, we first introduce a publish/subscribe framework for inter-vehicle data transmissions{. To defend against free-riding attacks,} we formulate the interactions between publisher AVs and subscriber AVs as a vehicular publish/subscribe game, {and incentivize AVs to deliver high-quality data by analyzing the Stackelberg equilibrium of the game. We also design a reputation evaluation mechanism in the game} to identify malicious AVs {in disseminating fake information}. {Furthermore, for} lack of sufficient knowledge on parameters of {the} network model and user cost model {in dynamic game scenarios}, a two-tier reinforcement learning based algorithm with hotbooting is developed to obtain the optimal {strategies of subscriber AVs and publisher AVs with free-rider prevention}. Extensive simulations are conducted, and the results validate that our SPAD can effectively {prevent free-riders and enhance the dependability of disseminated contents,} compared with conventional schemes.
Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
Authors: Chang Xu, Jian Ding, Jinwang Wang, Wen Yang, Huai Yu, Lei Yu, Gui-Song Xia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08876
Pdf link: https://arxiv.org/pdf/2304.08876
Abstract Detecting arbitrarily oriented tiny objects poses intense challenges to existing detectors, especially for label assignment. Despite the exploration of adaptive label assignment in recent oriented object detectors, the extreme geometry shape and limited feature of oriented tiny objects still induce severe mismatch and imbalance issues. Specifically, the position prior, positive sample feature, and instance are mismatched, and the learning of extreme-shaped objects is biased and unbalanced due to little proper feature supervision. To tackle these issues, we propose a dynamic prior along with the coarse-to-fine assigner, dubbed DCFL. For one thing, we model the prior, label assignment, and object representation all in a dynamic manner to alleviate the mismatch issue. For another, we leverage the coarse prior matching and finer posterior constraint to dynamically assign labels, providing appropriate and relatively balanced supervision for diverse instances. Extensive experiments on six datasets show substantial improvements to the baseline. Notably, we obtain the state-of-the-art performance for one-stage detectors on the DOTA-v1.5, DOTA-v2.0, and DIOR-R datasets under single-scale training and testing. Codes are available at https://github.com/Chasel-Tsui/mmrotate-dcfl.
NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
Authors: Yuanwei Fang, Zihao Liu, Yanheng Lu, Jiawei Liu, Jiajie Li, Yi Jin, Jian Chen, Yenkuang Chen, Hongzhong Zheng, Yuan Xie
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2304.08880
Pdf link: https://arxiv.org/pdf/2304.08880
Abstract With the end of Moore's Law, there is a growing demand for rapid architectural innovations in modern processors, such as RISC-V custom extensions, to continue performance scaling. Program sampling is a crucial step in microprocessor design, as it selects representative simulation points for workload simulation. While SimPoint has been the de-facto approach for decades, its limited expressiveness with Basic Block Vector (BBV) requires time-consuming human tuning, often taking months, which impedes fast innovation and agile hardware development. This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding generation, leveraging an application's code structures and runtime states. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. AssemblyNet is trained with a data prefetch task that predicts consecutive memory addresses. In the experiments, NPS outperforms SimPoint by up to 63%, reducing the average error by 38%. Additionally, NPS demonstrates strong robustness with increased accuracy, reducing the expensive accuracy tuning overhead. Furthermore, NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.
Safe reinforcement learning with self-improving hard constraints for multi-energy management systems
Authors: Glenn Ceusters, Muhammad Andy Putratama, Rüdiger Franke, Ann Nowé, Maarten Messagie
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2304.08897
Pdf link: https://arxiv.org/pdf/2304.08897
Abstract Safe reinforcement learning (RL) with hard constraint guarantees is a promising optimal control direction for multi-energy management systems. It only requires the environment-specific constraint functions itself a prior and not a complete model (i.e. plant, disturbance and noise models, and prediction models for states not included in the plant model - e.g. demand, weather, and price forecasts). The project-specific upfront and ongoing engineering efforts are therefore still reduced, better representations of the underlying system dynamics can still be learned and modeling bias is kept to a minimum (no model-based objective function). However, even the constraint functions alone are not always trivial to accurately provide in advance (e.g. an energy balance constraint requires the detailed determination of all energy inputs and outputs), leading to potentially unsafe behavior. In this paper, we present two novel advancements: (I) combining the Optlayer and SafeFallback method, named OptLayerPolicy, to increase the initial utility while keeping a high sample efficiency. (II) introducing self-improving hard constraints, to increase the accuracy of the constraint functions as more data becomes available so that better policies can be learned. Both advancements keep the constraint formulation decoupled from the RL formulation, so that new (presumably better) RL algorithms can act as drop-in replacements. We have shown that, in a simulated multi-energy system case study, the initial utility is increased to 92.4% (OptLayerPolicy) compared to 86.1% (OptLayer) and that the policy after training is increased to 104.9% (GreyOptLayerPolicy) compared to 103.4% (OptLayer) - all relative to a vanilla RL benchmark. While introducing surrogate functions into the optimization problem requires special attention, we do conclude that the newly presented GreyOptLayerPolicy method is the most advantageous.
Distributed Search Planning in 3-D Environments With a Dynamically Varying Number of Agents
Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.08932
Pdf link: https://arxiv.org/pdf/2304.08932
Abstract In this work, a novel distributed search-planning framework is proposed, where a dynamically varying team of autonomous agents cooperate in order to search multiple objects of interest in three-dimension (3-D). It is assumed that the agents can enter and exit the mission space at any point in time, and as a result the number of agents that actively participate in the mission varies over time. The proposed distributed search-planning framework takes into account the agent dynamical and sensing model, and the dynamically varying number of agents, and utilizes model predictive control (MPC) to generate cooperative search trajectories over a finite rolling planning horizon. This enables the agents to adapt their decisions on-line while considering the plans of their peers, maximizing their search planning performance, and reducing the duplication of work.
Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes
Authors: Rui Li, Dong Gong, Wei Yin, Hao Chen, Yu Zhu, Kaixuan Wang, Xiaozhi Chen, Jinqiu Sun, Yanning Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.08993
Pdf link: https://arxiv.org/pdf/2304.08993
Abstract Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated in the dynamic areas, leading to corrupted estimations. Many multi-frame methods handle dynamic areas by identifying them with explicit masks and compensating the multi-view cues with monocular cues represented as local monocular depth or features. The improvements are limited due to the uncontrolled quality of the masks and the underutilized benefits of the fusion of the two types of cues. In this paper, we propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the heuristically crafted masks. As unveiled in our analyses, the multi-view cues capture more accurate geometric information in static areas, and the monocular cues capture more useful contexts in dynamic areas. To let the geometric perception learned from multi-view cues in static areas propagate to the monocular representation in dynamic areas and let monocular cues enhance the representation of multi-view cost volume, we propose a cross-cue fusion (CCF) module, which includes the cross-cue attention (CCA) to encode the spatially non-local relative intra-relations from each source to enhance the representation of the other. Experiments on real-world datasets prove the significant effectiveness and generalization ability of the proposed method.
PaTeCon: A Pattern-Based Temporal Constraint Mining Method for Conflict Detection on Knowledge Graphs
Authors: Jianhao Chen, Junyang Ren, Wentao Ding, Yuzhong Qu
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09015
Pdf link: https://arxiv.org/pdf/2304.09015
Abstract Temporal facts, the facts for characterizing events that hold in specific time periods, are attracting rising attention in the knowledge graph (KG) research communities. In terms of quality management, the introduction of time restrictions brings new challenges to maintaining the temporal consistency of KGs and detecting potential temporal conflicts. Previous studies rely on manually enumerated temporal constraints to detect conflicts, which are labor-intensive and may have granularity issues. We start from the common pattern of temporal facts and constraints and propose a pattern-based temporal constraint mining method, PaTeCon. PaTeCon uses automatically determined graph patterns and their relevant statistical information over the given KG instead of human experts to generate time constraints. Specifically, PaTeCon dynamically attaches class restriction to candidate constraints according to their measuring scores.We evaluate PaTeCon on two large-scale datasets based on Wikidata and Freebase respectively. The experimental results show that pattern-based automatic constraint mining is powerful in generating valuable temporal constraints.
Neural Lumped Parameter Differential Equations with Application in Friction-Stir Processing
Authors: James Koch, WoongJo Choi, Ethan King, David Garcia, Hrishikesh Das, Tianhao Wang, Ken Ross, Keerti Kappagantula
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2304.09047
Pdf link: https://arxiv.org/pdf/2304.09047
Abstract Lumped parameter methods aim to simplify the evolution of spatially-extended or continuous physical systems to that of a "lumped" element representative of the physical scales of the modeled system. For systems where the definition of a lumped element or its associated physics may be unknown, modeling tasks may be restricted to full-fidelity simulations of the physics of a system. In this work, we consider data-driven modeling tasks with limited point-wise measurements of otherwise continuous systems. We build upon the notion of the Universal Differential Equation (UDE) to construct data-driven models for reducing dynamics to that of a lumped parameter and inferring its properties. The flexibility of UDEs allow for composing various known physical priors suitable for application-specific modeling tasks, including lumped parameter methods. The motivating example for this work is the plunge and dwell stages for friction-stir welding; specifically, (i) mapping power input into the tool to a point-measurement of temperature and (ii) using this learned mapping for process control.
A Field Test of Bandit Algorithms for Recommendations: Understanding the Validity of Assumptions on Human Preferences in Multi-armed Bandits
Authors: Liu Leqi, Giulio Zhou, Fatma Kılınç-Karzan, Zachary C. Lipton, Alan L. Montgomery
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09088
Pdf link: https://arxiv.org/pdf/2304.09088
Abstract Personalized recommender systems suffuse modern life, shaping what media we read and what products we consume. Algorithms powering such systems tend to consist of supervised learning-based heuristics, such as latent factor models with a variety of heuristically chosen prediction targets. Meanwhile, theoretical treatments of recommendation frequently address the decision-theoretic nature of the problem, including the need to balance exploration and exploitation, via the multi-armed bandits (MABs) framework. However, MAB-based approaches rely heavily on assumptions about human preferences. These preference assumptions are seldom tested using human subject studies, partly due to the lack of publicly available toolkits to conduct such studies. In this work, we conduct a study with crowdworkers in a comics recommendation MABs setting. Each arm represents a comic category, and users provide feedback after each recommendation. We check the validity of core MABs assumptions-that human preferences (reward distributions) are fixed over time-and find that they do not hold. This finding suggests that any MAB algorithm used for recommender systems should account for human preference dynamics. While answering these questions, we provide a flexible experimental framework for understanding human preference dynamics and testing MABs algorithms with human users. The code for our experimental framework and the collected data can be found at https://github.com/HumainLab/human-bandit-evaluation.
Safety Guaranteed Manipulation Based on Reinforcement Learning Planner and Model Predictive Control Actor
Authors: Zhenshan Bing, Aleksandr Mavrichev, Sicong Shen, Xiangtong Yao, Kejia Chen, Kai Huang, Alois Knoll
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.09119
Pdf link: https://arxiv.org/pdf/2304.09119
Abstract Deep reinforcement learning (RL) has been endowed with high expectations in tackling challenging manipulation tasks in an autonomous and self-directed fashion. Despite the significant strides made in the development of reinforcement learning, the practical deployment of this paradigm is hindered by at least two barriers, namely, the engineering of a reward function and ensuring the safety guaranty of learning-based controllers. In this paper, we address these challenging limitations by proposing a framework that merges a reinforcement learning \lstinline[columns=fixed]{planner} that is trained using sparse rewards with a model predictive controller (MPC) \lstinline[columns=fixed]{actor}, thereby offering a safe policy. On the one hand, the RL \lstinline[columns=fixed]{planner} learns from sparse rewards by selecting intermediate goals that are easy to achieve in the short term and promising to lead to target goals in the long term. On the other hand, the MPC \lstinline[columns=fixed]{actor} takes the suggested intermediate goals from the RL \lstinline[columns=fixed]{planner} as the input and predicts how the robot's action will enable it to reach that goal while avoiding any obstacles over a short period of time. We evaluated our method on four challenging manipulation tasks with dynamic obstacles and the results demonstrate that, by leveraging the complementary strengths of these two components, the agent can solve manipulation tasks in complex, dynamic environments safely with a $100\%$ success rate. Videos are available at \url{https://videoviewsite.wixsite.com/mpc-hgg}.
Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics
Authors: Luke Snow, Vikram Krishnamurthy
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.09123
Pdf link: https://arxiv.org/pdf/2304.09123
Abstract Stochastic gradient Langevin dynamics (SGLD) are a useful methodology for sampling from probability distributions. This paper provides a finite sample analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD) designed to achieve inverse reinforcement learning. By "passive", we mean that the noisy gradients available to the PSGLD algorithm (inverse learning process) are evaluated at randomly chosen points by an external stochastic gradient algorithm (forward learner). The PSGLD algorithm thus acts as a randomized sampler which recovers the cost function being optimized by this external process. Previous work has analyzed the asymptotic performance of this passive algorithm using stochastic approximation techniques; in this work we analyze the non-asymptotic performance. Specifically, we provide finite-time bounds on the 2-Wasserstein distance between the passive algorithm and its stationary measure, from which the reconstructed cost function is obtained.

A-suozhang / GetArxivDaily

New submissions for Wed, 19 Apr 23 #36

Keyword: efficient

Model-Driven Quantum Federated Learning (QFL)

CyFormer: Accurate State-of-Health Prediction of Lithium-Ion Batteries via Cyclic Attention

Schottky Barrier MOSFET Enabled Ultra-Low Power Real-Time Neuron for Neuromorphic Computing

Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness

LIMIT: Learning Interfaces to Maximize Information Transfer

GrOVe: Ownership Verification of Graph Neural Networks using Embeddings

Traversing combinatorial 0/1-polytopes via optimization

Diagnosing applications' I/O behavior through system call observability

Energy-Efficient Lane Changes Planning and Control for Connected Autonomous Vehicles on Urban Roads

Graph Sparsification by Approximate Matrix Multiplication

Safe Navigation and Obstacle Avoidance Using Differentiable Optimization Based Control Barrier Functions

Revisiting Block-Diagonal SDP Relaxations for the Clique Number of the Paley Graphs

Dynamic Vector Bin Packing for Online Resource Allocation in the Cloud

An Ethereum-compatible blockchain that explicates and ensures design-level safety properties for smart contracts

Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU

Continuous Versatile Jumping Using Learned Action Residuals

A Voice Disease Detection Method Based on MFCCs and Shallow CNN

InversOS: Efficient Control-Flow Protection for AArch64 Applications with Privilege Inversion

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

A Survey on Biomedical Text Summarization with Pre-trained Language Model

Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services

Connectivity in the presence of an opponent

Large-scale Dynamic Network Representation via Tensor Ring Decomposition

Neuromorphic computing for attitude estimation onboard quadrotors

Implicit representation priors meet Riemannian geometry for Bayesian robotic grasping

Revisiting the Role of Similarity and Dissimilarity inBest Counter Argument Retrieval

DILI: A Distribution-Driven Learned Index

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Motion-state Alignment for Video Semantic Segmentation

Contact Tracing over Uncertain Indoor Positioning Data (Extended Version)

GoferBot: A Visual Guided Human-Robot Collaborative Assembly System

Two-stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems

Revisiting Fast Fourier multiplication algorithms on quotient rings

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

Romanization-based Large-scale Adaptation of Multilingual Language Models

Differentiable Genetic Programming for High-dimensional Symbolic Regression

Coefficient Synthesis for Threshold Automata

Quantum Annealing for Single Image Super-Resolution

Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks

Multitenant Containers as a Service (CaaS) for Clouds and Edge Clouds

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

Generative modeling of living cells with SO(3)-equivariant implicit neural representations

SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes

Neural Architecture Search for Visual Anomaly Segmentation

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

An Augmented Subspace Based Adaptive Proper Orthogonal Decomposition Method for Time Dependent Partial Differential Equations

GUILGET: GUI Layout GEneration with Transformer

DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables

Revisiting k-NN for Pre-trained Language Models

Always Strengthen Your Strengths: A Drift-Aware Incremental Learning Framework for CTR Prediction

METAM: Goal-Oriented Data Discovery

DRIFT: A Federated Recommender System with Implicit Feedback on the Items

Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations

MATURE-HEALTH: HEALTH Recommender System for MAndatory FeaTURE choices

LaSNN: Layer-wise ANN-to-SNN Distillation for Effective and Efficient Training in Deep Spiking Neural Networks

Fast Neural Scene Flow

Keyword: faster

Agent-Based Modeling and its Tradeoffs: An Introduction & Examples

Hybrid Materialization in a Disk-Based Column-Store

Stochastic Subgraph Neighborhood Pooling for Subgraph Classification

LaSNN: Layer-wise ANN-to-SNN Distillation for Effective and Efficient Training in Deep Spiking Neural Networks

Keyword: mobile

Coordinated Multi-Agent Reinforcement Learning for Unmanned Aerial Vehicle Swarms in Autonomous Mobile Access Applications

Safe Navigation and Obstacle Avoidance Using Differentiable Optimization Based Control Barrier Functions

Graceful User Following for Mobile Balance Assistive Robot in Daily Activities Assistance

AoI-Delay Tradeoff in Mobile Edge Caching: A Mixed-Order Drift-Plus-Penalty Algorithm

Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services

Full-Duplex Wireless for 6G: Progress Brings New Opportunities and Challenges

Event Camera and LiDAR based Human Tracking for Adverse Lighting Conditions in Subterranean Environments

Continuous-Time Range-Only Pose Estimation

Designing the mobile robot Kevin for a life science laboratory

Keyword: pruning

CyFormer: Accurate State-of-Health Prediction of Lithium-Ion Batteries via Cyclic Attention

Keyword: voxel

Generative modeling of living cells with SO(3)-equivariant implicit neural representations

Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-modal Distillation and Super-Voxel Clustering

Keyword: lidar