Abstract
Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and generalize. Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks. Conventional LS, however, regardless of the training instance assumes that each non-target class is equally likely. In this work, we present a general framework for training with label regularization, which includes conventional LS but can also model instance-specific variants. Based on this formulation, we propose an efficient way of learning LAbel regularization by devising a Bi-level Optimization (LABO) problem. We derive a deterministic and interpretable solution of the inner loop as the optimal label smoothing without the need to store the parameters or the output of a trained model. Finally, we conduct extensive experiments and demonstrate our LABO consistently yields improvement over conventional label regularization on various fields, including seven machine translation and three image classification tasks across various
Abstract
Existing digital human models approximate the human skeletal system using rigid bodies connected by rotational joints. While the simplification is considered acceptable for legs and arms, it significantly lacks fidelity to model rich torso movements in common activities such as dancing, Yoga, and various sports. Research from biomechanics provides more detailed modeling for parts of the torso, but their models often operate in isolation and are not fast and robust enough to support computationally heavy applications and large-scale data generation for full-body digital humans. This paper proposes a new torso model that aims to achieve high fidelity both in perception and in functionality, while being computationally feasible for simulation and optimal control tasks. We build a detailed human torso model consisting of various anatomical components, including facets, ligaments, and intervertebral discs, by coupling efficient finite-element and rigid-body simulations. Given an existing motion capture sequence without dense markers placed on the torso, our new model is able to recover the underlying torso bone movements. Our method is remarkably robust that it can be used to automatically "retrofit" the entire Mixamo motion database of highly diverse human motions without user intervention. We also show that our model is computationally efficient for solving trajectory optimization of highly dynamic full-body movements, without relying on any reference motion. Physiological validity of the model is validated against established literature.
Do Not Blindly Imitate the Teacher: Using Perturbed Loss for Knowledge Distillation
Authors: Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Jialu Liu, Michael Bendersky, Marc Najork, Chao Zhang
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
Knowledge distillation is a popular technique to transfer knowledge from large teacher models to a small student model. Typically, the student learns to imitate the teacher by minimizing the KL divergence of its output distribution with the teacher's output distribution. In this work, we argue that such a learning objective is sub-optimal because there exists a discrepancy between the teacher's output distribution and the ground truth label distribution. Therefore, forcing the student to blindly imitate the unreliable teacher output distribution leads to inferior performance. To this end, we propose a novel knowledge distillation objective PTLoss by first representing the vanilla KL-based distillation loss function via a Maclaurin series and then perturbing the leading-order terms in this series. This perturbed loss implicitly transforms the original teacher into a proxy teacher with a distribution closer to the ground truth distribution. We establish the theoretical connection between this "distribution closeness" and the student model generalizability, which enables us to select the PTLoss's perturbation coefficients in a principled way. Extensive experiments on five datasets demonstrate PTLoss can significantly improve the distillation effectiveness for teachers of various scales.
Abstract
Recently, beyond diagonal reconfigurable intelligent surface (BD-RIS) has been proposed to generalize conventional RIS. BD-RIS has a scattering matrix that is not restricted to being diagonal and thus brings a performance improvement over conventional RIS. While different BD-RIS architectures have been proposed, it still remains an open problem to develop a systematic approach to design BD-RIS architectures achieving the optimal trade-off between performance and circuit complexity. In this work, we propose novel modeling, architecture design, and optimization for BD-RIS based on graph theory. This graph theoretical modeling allows us to develop two new efficient BD-RIS architectures, denoted as tree-connected and forest-connected RIS. Tree-connected RIS, whose corresponding graph is a tree, is proven to be the least complex BD-RIS architecture able to achieve the performance upper bound in multiple-input single-output (MISO) systems. Besides, forest-connected RIS allows us to strike a balance between performance and complexity, further decreasing the complexity over tree-connected RIS. To optimize tree-connected RIS, we derive a closed-form global optimal solution, while forest-connected RIS is optimized through a low-complexity iterative algorithm. Numerical results confirm that tree-connected (resp. forest-connected) RIS achieves the same performance as fully-connected (resp. group-connected) RIS, while reducing the complexity by up to 16.4 times.
A Case for CXL-Centric Server Processors
Authors: Albert Cho, Anish Saxena, Moinuddin Qureshi, Alexandros Daglis
Abstract
The memory system is a major performance determinant for server processors. Ever-growing core counts and datasets demand higher bandwidth and capacity as well as lower latency from the memory system. To keep up with growing demands, DDR--the dominant processor interface to memory over the past two decades--has offered higher bandwidth with every generation. However, because each parallel DDR interface requires a large number of on-chip pins, the processor's memory bandwidth is ultimately restrained by its pin-count, which is a scarce resource. With limited bandwidth, multiple memory requests typically contend for each memory channel, resulting in significant queuing delays that often overshadow DRAM's service time and degrade performance. We present CoaXiaL, a server design that overcomes memory bandwidth limitations by replacing \textit{all} DDR interfaces to the processor with the more pin-efficient CXL interface. The widespread adoption and industrial momentum of CXL makes such a transition possible, offering $4\times$ higher bandwidth per pin compared to DDR at a modest latency overhead. We demonstrate that, for a broad range of workloads, CXL's latency premium is more than offset by its higher bandwidth. As CoaXiaL distributes memory requests across more channels, it drastically reduces queuing delays and thereby both the average value and variance of memory access latency. Our evaluation with a variety of workloads shows that CoaXiaL improves the performance of manycore throughput-oriented servers by $1.52\times$ on average and by up to $3\times$.
A Unifying Framework of Attention-based Neural Load Forecasting
Authors: Jing Xiong, Yu Zhang
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
Accurate load forecasting is critical for reliable and efficient planning and operation of electric power grids. In this paper, we propose a unifying deep learning framework for load forecasting, which includes time-varying feature weighting, hierarchical temporal attention, and feature-reinforced error correction. Our framework adopts a modular design with good generalization capability. First, the feature-weighting mechanism assigns input features with temporal weights. Second, a recurrent encoder-decoder structure with hierarchical attention is developed as a load predictor. The hierarchical attention enables a similar day selection, which re-evaluates the importance of historical information at each time step. Third, we develop an error correction module that explores the errors and learned feature hidden information to further improve the model's forecasting performance. Experimental results demonstrate that our proposed framework outperforms existing methods on two public datasets and performance metrics, with the feature weighting mechanism and error correction module being critical to achieving superior performance. Our framework provides an effective solution to the electric load forecasting problem, which can be further adapted to many other forecasting tasks.
Who Needs Decoders? Efficient Estimation of Sequence-level Attributes
Authors: Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.
Sorting Finite Automata via Partition Refinement
Authors: Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, Nicola Prezza
Abstract
Wheeler nondeterministic finite automata (WNFAs) were introduced as a generalization of prefix sorting from strings to labeled graphs. WNFAs admit optimal solutions to classic hard problems on labeled graphs and languages. The problem of deciding whether a given NFA is Wheeler is known to be NP-complete. Recently, however, Alanko et al. showed how to side-step this complexity by switching to preorders: letting $Q$ be the set of states, $E$ the set of transitions, $|Q|=n$, and $|E|=m$, they provided a $O(mn^2)$-time algorithm computing a totally-ordered partition of the WNFA's states such that (1) equivalent states recognize the same regular language, and (2) the order of non-equivalent states is consistent with any Wheeler order, when one exists. Then, the output is a preorder of the states as useful for pattern matching as standard Wheeler orders. Further research generalized these concepts to arbitrary NFAs by introducing co-lex partial preorders: any NFA admits a partial preorder of its states reflecting the co-lex order of their accepted strings; the smaller the width of such preorder is, the faster regular expression matching queries can be performed. To date, the fastest algorithm for computing the smallest-width partial preorder on NFAs runs in $O(m^2+n^{5/2})$ time, while on DFAs the same can be done in $O(\min(n^2\log n,mn))$ time. In this paper, we provide much more efficient solutions to the problem above. Our results are achieved by extending a classic algorithm for the relational coarsest partition refinement problem to work with ordered partitions. Specifically, we provide a $O(m\log n)$-time algorithm computing a co-lex total preorder when the input is a WNFA, and an algorithm with the same time complexity computing the smallest-width co-lex partial order of any DFA. Also, we present implementations of our algorithms and show that they are very efficient in practice.
Localisation of Mammographic masses by Greedy Backtracking of Activations in the Stacked Auto-Encoders
Authors: Shamna Pootheri, Govindan V K
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Mammographic image analysis requires accurate localisation of salient mammographic masses. In mammographic computer-aided diagnosis, mass or Region of Interest (ROI) is often marked by physicians and features are extracted from the marked ROI. In this paper, we present a novel mammographic mass localisation framework, based on the maximal class activations of the stacked auto-encoders. We hypothesize that the image regions activating abnormal classes in mammographic images will be the breast masses which causes the anomaly. The experiment is conducted using randomly selected 200 mammographic images (100 normal and 100 abnormal) from IRMA mammographic dataset. Abnormal mass regions marked by an expert radiologist are used as the ground truth. The proposed method outperforms existing Deep Convolutional Neural Network (DCNN) based techniques in terms of salient region detection accuracy. The proposed greedy backtracking method is more efficient and does not require a vast number of labelled training images as in DCNN based method. Such automatic localisation method will assist physicians to make accurate decisions on biopsy recommendations and treatment evaluations.
Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation
Authors: Tao Chen, Yazhou Yao, Jinhui Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Weakly supervised semantic segmentation (WSSS) models relying on class activation maps (CAMs) have achieved desirable performance comparing to the non-CAMs-based counterparts. However, to guarantee WSSS task feasible, we need to generate pseudo labels by expanding the seeds from CAMs which is complex and time-consuming, thus hindering the design of efficient end-to-end (single-stage) WSSS approaches. To tackle the above dilemma, we resort to the off-the-shelf and readily accessible saliency maps for directly obtaining pseudo labels given the image-level class labels. Nevertheless, the salient regions may contain noisy labels and cannot seamlessly fit the target objects, and saliency maps can only be approximated as pseudo labels for simple images containing single-class objects. As such, the achieved segmentation model with these simple images cannot generalize well to the complex images containing multi-class objects. To this end, we propose an end-to-end multi-granularity denoising and bidirectional alignment (MDBA) model, to alleviate the noisy label and multi-class generalization issues. Specifically, we propose the online noise filtering and progressive noise detection modules to tackle image-level and pixel-level noise, respectively. Moreover, a bidirectional alignment mechanism is proposed to reduce the data distribution gap at both input and output space with simple-to-complex image synthesis and complex-to-simple adversarial learning. MDBA can reach the mIoU of 69.5\% and 70.2\% on validation and test sets for the PASCAL VOC 2012 dataset. The source codes and models have been made available at \url{https://github.com/NUST-Machine-Intelligence-Laboratory/MDBA}.
A Generalized Covering Algorithm for Chained Codes
Abstract
The covering radius is a fundamental property of linear codes that characterizes the trade-off between storage and access in linear data-query protocols. The generalized covering radius was recently defined by Elimelech and Schwartz for applications in joint-recovery of linear data-queries. In this work we extend a known bound on the ordinary covering radius to the generalized one for all codes satisfying the chain condition -- a known condition which is satisfied by most known families of codes. Given a generator matrix of a special form, we also provide an algorithm which finds codewords which cover the input vectors within the distance specified by the bound. For the case of Reed-Muller codes we provide efficient construction of such generator matrices, therefore providing a faster alternative to a previous generalized covering algorithm for Reed-Muller codes.
E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation
Authors: Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong
Abstract
Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language. Existing methods, both two-stage cascade and one-stage end-to-end architectures, suffer from different issues. The cascade models can benefit from the large-scale optical character recognition (OCR) and MT datasets but the two-stage architecture is redundant. The end-to-end models are efficient but suffer from training data deficiency. To this end, in our paper, we propose an end-to-end TIMT model fully making use of the knowledge from existing OCR and MT datasets to pursue both an effective and efficient framework. More specifically, we build a novel modal adapter effectively bridging the OCR encoder and MT decoder. End-to-end TIMT loss and cross-modal contrastive loss are utilized jointly to align the feature distribution of the OCR and MT tasks. Extensive experiments show that the proposed method outperforms the existing two-stage cascade models and one-stage end-to-end models with a lighter and faster architecture. Furthermore, the ablation studies verify the generalization of our method, where the proposed modal adapter is effective to bridge various OCR and MT models.
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Authors: Lingjiao Chen, Matei Zaharia, James Zou
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)
Abstract
There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently.
DeepFire2: A Convolutional Spiking Neural Network Accelerator on FPGAs
Authors: Myat Thu Linn Aung, Daniel Gerlinghoff, Chuping Qu, Liwei Yang, Tian Huang, Rick Siow Mong Goh, Tao Luo, Weng-Fai Wong
Abstract
Brain-inspired spiking neural networks (SNNs) replace the multiply-accumulate operations of traditional neural networks by integrate-and-fire neurons, with the goal of achieving greater energy efficiency. Specialized hardware implementations of those neurons clearly have advantages over general-purpose devices in terms of power and performance, but exhibit poor scalability when it comes to accelerating large neural networks. DeepFire2 introduces a hardware architecture which can map large network layers efficiently across multiple super logic regions in a multi-die FPGA. That gives more control over resource allocation and parallelism, benefiting both throughput and energy consumption. Avoiding the use of lookup tables to implement the AND operations of an SNN, prevents the layer size to be limited by logic resources. A deep pipeline does not only lead to an increased clock speed of up to 600 MHz. We double the throughput and power efficiency compared to our previous version of DeepFire, which equates to an almost 10-fold improvement over other previous implementations. Importantly, we are able to deploy a large ImageNet model, while maintaining a throughput of over 1500 frames per second.
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Abstract
Diffusion models, which have emerged to become popular text-to-image generation models, can produce high-quality and content-rich images guided by textual prompts. However, there are limitations to semantic understanding and commonsense reasoning in existing models when the input prompts are concise narrative, resulting in low-quality image generation. To improve the capacities for narrative prompts, we propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models. To reach this goal, we first collect and annotate a new dataset SURD which consists of more than 57,000 semantically corrected multi-modal samples. Each sample contains a simple narrative prompt, a complex keyword-based prompt, and a high-quality image. Then, we align the semantic representation of narrative prompts to the complex prompts and transfer knowledge of large language models (LLMs) to our SUR-adapter via knowledge distillation so that it can acquire the powerful semantic understanding and reasoning capabilities to build a high-quality textual semantic representation for text-to-image generation. We conduct experiments by integrating multiple LLMs and popular pre-trained diffusion models to show the effectiveness of our approach in enabling diffusion models to understand and reason concise natural language without image quality degradation. Our approach can make text-to-image diffusion models easier to use with better user experience, which demonstrates our approach has the potential for further advancing the development of user-friendly text-to-image generation models by bridging the semantic gap between simple narrative prompts and complex keyword-based prompts.
A Fair and Resilient Decentralized Clock Network for Transaction Ordering
Authors: Andrei Constantinescu, Diana Ghinea, Lioba Heimbach, Zilin Wang, Roger Wattenhofer
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)
Abstract
Traditional blockchain design gives miners or validators full control over transaction ordering, i.e.,~they can freely choose which transactions to include or exclude, as well as in which order. While not an issue initially, the emergence of decentralized finance has introduced new transaction order dependencies allowing parties in control of the ordering to make a profit by front-running others' transactions. In this work, we present the Decentralized Clock Network, a new approach for achieving fair transaction ordering. Users submit their transactions to the network's clocks, which run an agreement protocol that provides each transaction with a timestamp of receipt which is then used to define the transactions' order. By separating agreement from ordering, our protocol is efficient and has a simpler design compared to other available solutions. Moreover, our protocol brings to the blockchain world the paradigm of asynchronous fallback, where the algorithm operates with stronger fairness guarantees during periods of synchronous use, switching to an asynchronous mode only during times of increased network delay.
A High-performance, Energy-efficient Modular DMA Engine Architecture
Authors: Thomas Benz, Michael Rogenmoser, Paul Scheffler, Samuel Riedel, Alessandro Ottaviano, Andreas Kurth, Torsten Hoefler, Luca Benini
Abstract
Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAEs) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8x with only 1 % additional area compared to a base system without a DMAE. We achieve an area reduction of 10 % while improving ML inference performance by 23 % in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.
Learning Personalized Page Content Ranking Using Customer Representation
Abstract
On E-commerce stores (Amazon, eBay etc.) there are rich recommendation content to help shoppers shopping more efficiently. However given numerous products, it's crucial to select most relevant content to reduce the burden of information overload. We introduced a content ranking service powered by a linear causal bandit algorithm to rank and select content for each shopper under each context. The algorithm mainly leverages aggregated customer behavior features, and ignores single shopper level past activities. We study the problem of inferring shoppers interest from historical activities. We propose a deep learning based bandit algorithm that incorporates historical shopping behavior, customer latent shopping goals, and the correlation between customers and content categories. This model produces more personalized content ranking measured by 12.08% nDCG lift. In the online A/B test setting, the model improved 0.02% annualized commercial impact measured by our business metric, validating its effectiveness.
ENCOVIZ: An open-source, secure and multi-role energy consumption visualisation platform
Abstract
The need for a more energy efficient future is now more evident than ever and has led to the continuous growth of sectors with greater potential for energy savings, such as smart buildings, energy consumption meters, etc. The large volume of energy related data produced is a huge advantage but, at the same time, it creates a new problem; The need to structure, organize and efficiently present this meaningful information. In this context, we present the ENCOVIZ platform, a multi-role, extensible, secure, energy consumption visualization platform with built-in analytics. ENCOVIZ has been built in accordance with the best visualisation practices, on top of open source technologies and includes (i) multi-role functionalities, (ii) the automated ingestion of energy consumption data and (iii) proper visualisations and information to support effective decision making both for energy providers and consumers.
Structured Sentiment Analysis as Transition-based Dependency Parsing
Authors: Daniel Fernández-González
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Structured sentiment analysis (SSA) aims to automatically extract people's opinions from a text in natural language and adequately represent that information in a graph structure. One of the most accurate methods for performing SSA was recently proposed and consists of approaching it as a dependency parsing task. Although we can find in the literature how transition-based algorithms excel in dependency parsing in terms of accuracy and efficiency, all proposed attempts to tackle SSA following that approach were based on graph-based models. In this article, we present the first transition-based method to address SSA as dependency parsing. Specifically, we design a transition system that processes the input text in a left-to-right pass, incrementally generating the graph structure containing all identified opinions. To effectively implement our final transition-based model, we resort to a Pointer Network architecture as a backbone. From an extensive evaluation, we demonstrate that our model offers the best performance to date in practically all cases among prior dependency-based methods, and surpass recent task-specific techniques on the most challenging datasets. We additionally include an in-depth analysis and empirically prove that the overall time-complexity cost of our approach is quadratic in the sentence length, being more efficient than top-performing graph-based parsers.
Two new algorithms for error support recovery of low rank parity check codes
Abstract
Due to their weak algebraic structure, low rank parity check (LRPC) codes have been employed in several post-quantum cryptographic schemes. In this paper we propose new improved decoding algorithms for (n, k) LRPC codes of dual rank weight d. The proposed algorithms can efficiently decode LRPC codes with the parameters satisfying n - k = rd - c, where r is the dimension of the error support and c <= d - 2. They outperform the original decoding algorithm of LRPC codes when d > 2 and allow for decoding LRPC codes with a higher code rate and smaller values m.
GPT-NAS: Neural Architecture Search with the Generative Pre-Trained Model
Abstract
Neural Architecture Search (NAS) has emerged as one of the effective methods to design the optimal neural network architecture automatically. Although neural architectures have achieved human-level performances in several tasks, few of them are obtained from the NAS method. The main reason is the huge search space of neural architectures, making NAS algorithms inefficient. This work presents a novel architecture search algorithm, called GPT-NAS, that optimizes neural architectures by Generative Pre-Trained (GPT) model. In GPT-NAS, we assume that a generative model pre-trained on a large-scale corpus could learn the fundamental law of building neural architectures. Therefore, GPT-NAS leverages the generative pre-trained (GPT) model to propose reasonable architecture components given the basic one. Such an approach can largely reduce the search space by introducing prior knowledge in the search process. Extensive experimental results show that our GPT-NAS method significantly outperforms seven manually designed neural architectures and thirteen architectures provided by competing NAS methods. In addition, our ablation study indicates that the proposed algorithm improves the performance of finely tuned neural architectures by up to about 12% compared to those without GPT, further demonstrating its effectiveness in searching neural architectures.
VEDLIoT -- Next generation accelerated AIoT systems and applications
Authors: Kevin Mika, René Griessl, Nils Kucza, Florian Porrmann, Martin Kaiser, Lennart Tigges, Jens Hagemeyer, Pedro Trancoso, Muhammad Waqar Azhar, Fareed Qararyah, Stavroula Zouzoula, Jämes Ménétrey, Marcelo Pasin, Pascal Felber, Carina Marcus, Oliver Brunnegard, Olof Eriksson, Hans Salomonsson, Daniel Ödman, Andreas Ask, Antonio Casimiro, Alysson Bessani, Tiago Carvalho, Karol Gugala, Piotr Zierhoffer, Grzegorz Latosinski, Marco Tassemeier, Mario Porrmann, Hans-Martin Heyn, Eric Knauss, Yufei Mao, Franz Meierhöfer
Abstract
The VEDLIoT project aims to develop energy-efficient Deep Learning methodologies for distributed Artificial Intelligence of Things (AIoT) applications. During our project, we propose a holistic approach that focuses on optimizing algorithms while addressing safety and security challenges inherent to AIoT systems. The foundation of this approach lies in a modular and scalable cognitive IoT hardware platform, which leverages microserver technology to enable users to configure the hardware to meet the requirements of a diverse array of applications. Heterogeneous computing is used to boost performance and energy efficiency. In addition, the full spectrum of hardware accelerators is integrated, providing specialized ASICs as well as FPGAs for reconfigurable computing. The project's contributions span across trusted computing, remote attestation, and secure execution environments, with the ultimate goal of facilitating the design and deployment of robust and efficient AIoT systems. The overall architecture is validated on use-cases ranging from Smart Home to Automotive and Industrial IoT appliances. Ten additional use cases are integrated via an open call, broadening the range of application areas.
Fast Many-to-Many Routing for Ridesharing with Multiple Pickup and Dropoff Locations
Abstract
We introduce KaRRi, an improved algorithm for scheduling a fleet of shared vehicles as it is used by services like UberXShare and Lyft Shared. We speed up the basic online algorithm that looks for all possible insertions of a new customer into a set of existing routes, we generalize the objective function, and efficiently support a large number of possible pick-up and drop-off locations. This lays an algorithmic foundation for ridesharing systems with higher vehicle occupancy -- enabling greatly reduced cost and ecological impact at comparable service quality. We find that our algorithm computes assignments between vehicles and riders several times faster than a previous state-of-the-art approach. Further, we observe that allowing meeting points for vehicles and riders can reduce the operating cost of vehicle fleets by up to $15\%$ while also reducing passenger wait and trip times.
High-throughput Cotton Phenotyping Big Data Pipeline Lambda Architecture Computer Vision Deep Neural Networks
Authors: Amanda Issac (1), Alireza Ebrahimi (2), Javad Mohammadpour Velni (2), Glen Rains (3) ((1) School of Electrical and Computer Engineering, University of Georgia, (2) Department of Mechanical Engineering, Clemson University, (3) Department of Entomology, University of Georgia)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
In this study, we propose a big data pipeline for cotton bloom detection using a Lambda architecture, which enables real-time and batch processing of data. Our proposed approach leverages Azure resources such as Data Factory, Event Grids, Rest APIs, and Databricks. This work is the first to develop and demonstrate the implementation of such a pipeline for plant phenotyping through Azure's cloud computing service. The proposed pipeline consists of data preprocessing, object detection using a YOLOv5 neural network model trained through Azure AutoML, and visualization of object detection bounding boxes on output images. The trained model achieves a mean Average Precision (mAP) score of 0.96, demonstrating its high performance for cotton bloom classification. We evaluate our Lambda architecture pipeline using 9000 images yielding an optimized runtime of 34 minutes. The results illustrate the scalability of the proposed pipeline as a solution for deep learning object detection, with the potential for further expansion through additional Azure processing cores. This work advances the scientific research field by providing a new method for cotton bloom detection on a large dataset and demonstrates the potential of utilizing cloud computing resources, specifically Azure, for efficient and accurate big data processing in precision agriculture.
Abstract
The study of partial differential equations (PDE) through the framework of deep learning emerged a few years ago leading to the impressive approximations of simple dynamics. Graph neural networks (GNN) turned out to be very useful in those tasks by allowing the treatment of unstructured data often encountered in the field of numerical resolutions of PDE. However, the resolutions of harder PDE such as Navier-Stokes equations are still a challenging task and most of the work done on the latter concentrate either on simulating the flow around simple geometries or on qualitative results that looks physical for design purpose. In this study, we try to leverage the work done on deep learning for PDE and GNN by proposing an adaptation of a known architecture in order to tackle the task of approximating the solution of the two-dimensional steady-state incompressible Navier-Stokes equations over different airfoil geometries. In addition to that, we test our model not only on its performance over the volume but also on its performance to approximate surface quantities such as the wall shear stress or the isostatic pressure leading to the inference of global coefficients such as the lift and the drag of our airfoil in order to allow design exploration. This work takes place in a longer project that aims to approximate three dimensional steady-state solutions over industrial geometries.
Energy-Efficient Mining for Blockchain-Enabled IoT Applications. An Optimal Multiple-Stopping Time Approach
Authors: Anurag Gupta, Vikram Krishnamurthy
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP); Systems and Control (eess.SY)
Abstract
What are the optimal times for an Internet of Things (IoT) device to act as a blockchain miner? The aim is to minimize the energy consumed by low-power IoT devices that log their data into a secure (tamper-proof) distributed ledger. We formulate the energy-efficient blockchain mining for IoT devices as a multiple-stopping time partially observed Markov decision process (POMDP) to maximize the probability of adding a block in the blockchain; we also present a model to optimize the number of stops (mining instants). In general, POMDPs are computationally intractable to solve, but we show mathematically using submodularity that the optimal mining policy has a useful structure: 1) it is monotone in belief space, and 2) it exhibits a threshold structure, which divides the belief space into two connected sets. Exploiting the structural results, we formulate a computationally-efficient linear mining policy for the blockchain-enabled IoT device. We present a policy gradient technique to optimize the parameters of the linear mining policy. Finally, we use synthetic and real Bitcoin datasets to study the performance of our proposed mining policy. We demonstrate the energy efficiency achieved by the optimal linear mining policy in contrast to other heuristic strategies.
Investigating the effect of sub-word segmentation on the performance of transformer language models
Authors: Jue Hou, Anisia Katinskaia, Anh-Duc Vu, Roman Yangarber
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
We would like to explore how morphemes can affect the performance of a language model. We trained GPT-2 and Bert model with StateMorph for both Finnish and Russian, which is a morpheme segmenting algorithm. As a comparison, we also trained a model with BPE and Morfessor. Our preliminary result shows that StateMorph can help the model to converge more efficiently and achieve a better validation score.
ProxMaP: Proximal Occupancy Map Prediction for Efficient Indoor Robot Navigation
Abstract
In a typical path planning pipeline for a ground robot, we build a map (e.g., an occupancy grid) of the environment as the robot moves around. While navigating indoors, a ground robot's knowledge about the environment may be limited due to occlusions. Therefore, the map will have many as-yet-unknown regions that may need to be avoided by a conservative planner. Instead, if a robot is able to correctly predict what its surroundings and occluded regions look like, the robot may be more efficient in navigation. In this work, we focus on predicting occupancy within the reachable distance of the robot to enable faster navigation and present a self-supervised proximity occupancy map prediction method, named ProxMaP. We show that ProxMaP generalizes well across realistic and real domains, and improves the robot navigation efficiency in simulation by \textbf{$12.40\%$} against the traditional navigation method. We share our findings on our project webpage (see https://raaslab.org/projects/ProxMaP ).
Integrating Holistic and Local Information to Estimate Emotional Reaction Intensity
Authors: Yini Fang, Liang Wu, Frederic Jumelle, Bertram Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Video-based Emotional Reaction Intensity (ERI) estimation measures the intensity of subjects' reactions to stimuli along several emotional dimensions from videos of the subject as they view the stimuli. We propose a multi-modal architecture for video-based ERI combining video and audio information. Video input is encoded spatially first, frame-by-frame, combining features encoding holistic aspects of the subjects' facial expressions and features encoding spatially localized aspects of their expressions. Input is then combined across time: from frame-to-frame using gated recurrent units (GRUs), then globally by a transformer. We handle variable video length with a regression token that accumulates information from all frames into a fixed-dimensional vector independent of video length. Audio information is handled similarly: spectral information extracted within each frame is integrated across time by a cascade of GRUs and a transformer with regression token. The video and audio regression tokens' outputs are merged by concatenation, then input to a final fully connected layer producing intensity estimates. Our architecture achieved excellent performance on the Hume-Reaction dataset in the ERI Esimation Challenge of the Fifth Competition on Affective Behavior Analysis in-the-Wild (ABAW5). The Pearson Correlation Coefficients between estimated and subject self-reported scores, averaged across all emotions, were 0.455 on the validation dataset and 0.4547 on the test dataset, well above the baselines. The transformer's self-attention mechanism enables our architecture to focus on the most critical video frames regardless of length. Ablation experiments establish the advantages of combining holistic/local features and of multi-modal integration. Code available at https://github.com/HKUST-NISL/ABAW5.
Efficient pattern-based anomaly detection in a network of multivariate devices
Authors: Len Feremans, Boris Cule, Bart Goethals
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract
Many organisations manage service quality and monitor a large set devices and servers where each entity is associated with telemetry or physical sensor data series. Recently, various methods have been proposed to detect behavioural anomalies, however existing approaches focus on multivariate time series and ignore communication between entities. Moreover, we aim to support end-users in not only in locating entities and sensors causing an anomaly at a certain period, but also explain this decision. We propose a scalable approach to detect anomalies using a two-step approach. First, we recover relations between entities in the network, since relations are often dynamic in nature and caused by an unknown underlying process. Next, we report anomalies based on an embedding of sequential patterns. Pattern mining is efficient and supports interpretation, i.e. patterns represent frequent occurring behaviour in time series. We extend pattern mining to filter sequential patterns based on frequency, temporal constraints and minimum description length. We collect and release two public datasets for international broadcasting and X from an Internet company. \textit{BAD} achieves an overall F1-Score of 0.78 on 9 benchmark datasets, significantly outperforming the best baseline by 3\%. Additionally, \textit{BAD} is also an order-of-magnitude faster than state-of-the-art anomaly detection methods.
Buoyancy enabled autonomous underwater construction with cement blocks
Authors: Samuel Lensgraf, Devin Balkcom, Alberto Quattrini Li
Abstract
We present the first free-floating autonomous underwater construction system capable of using active ballasting to transport cement building blocks efficiently. It is the first free-floating autonomous construction robot to use a paired set of resources: compressed air for buoyancy and a battery for thrusters. In construction trials, our system built structures of up to 12 components and weighing up to 100Kg (75Kg in water). Our system achieves this performance by combining a novel one-degree-of-freedom manipulator, a novel two-component cement block construction system that corrects errors in placement, and a simple active ballasting system combined with compliant placement and grasp behaviors. The passive error correcting components of the system minimize the required complexity in sensing and control. We also explore the problem of buoyancy allocation for building structures at scale by defining a convex program which allocates buoyancy to minimize the predicted energy cost for transporting blocks.
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra
Abstract
Sparse linear algebra is crucial in many application domains, but challenging to handle efficiently in both software and hardware, with one- and two-sided operand sparsity handled with distinct approaches. In this work, we enhance an existing memory-streaming RISC-V ISA extension to accelerate both one- and two-sided operand sparsity on widespread sparse tensor formats like compressed sparse row (CSR) and compressed sparse fiber (CSF) by accelerating the underlying operations of streaming indirection, intersection, and union. Our extensions enable single-core speedups over an optimized RISC-V baseline of up to 7.0x, 7.7x, and 9.8x on sparse-dense multiply, sparse-sparse multiply, and sparse-sparse addition, respectively, and peak FPU utilizations of up to 80% on sparse-dense problems. On an eight-core cluster, sparse-dense and sparse-sparse matrix-vector multiply using real-world matrices are up to 5.0x and 5.9x faster and up to 2.9x and 3.0x more energy efficient. We explore further applications for our extensions, such as stencil codes and graph pattern matching. Compared to recent CPU, GPU, and accelerator approaches, our extensions enable higher flexibility on data representation, degree of sparsity, and dataflow at a minimal hardware footprint, adding only 1.8% in area to a compute cluster. A cluster with our extensions running CSR matrix-vector multiplication achieves 69x and 2.8x higher peak floating-point utilizations than recent CPU and GPU software, respectively.
Distributional Multi-Objective Decision Making
Authors: Willem Röpke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowé, Diederik M. Roijers
Abstract
For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.
Investigating the Software Engineering Roadmap for Smart City Infrastructure Development: Goals and Challenges
Abstract
In today's world, many cities are embracing cutting-edge technology and transforming into "smart cities". These emerging innovations are revolutionizing the standard of living for people, and as a result, smart city infrastructure development has become a major focus for city planners and policymakers worldwide. The goal is to create more livable, sustainable, and efficient urban environments, and software engineering plays a crucial role in achieving this. In this article, we will delve into what makes a city "smart" and what it means for the future. We will explore the software engineering roadmap for smart city infrastructure development, highlighting the goals and challenges that come with this innovative approach to urban planning. Our aim is to provide valuable insights into the importance of software engineering in achieving successful smart city infrastructure development. As cities continue to grow and evolve, it is essential to adopt new technologies that can help us build smarter, more sustainable communities. Smart city initiatives are paving the way for a brighter future, and software engineering is at the forefront of this movement. By understanding the software engineering roadmap for smart city infrastructure development, we can work towards creating more livable, efficient, and sustainable urban environments for generations to come.
Predictive Control of Linear Discrete-Time Markovian Jump Systems by Learning Recurrent Patterns
Authors: SooJean Han, Soon-Jo Chung, John C. Doyle
Abstract
Incorporating pattern-learning for prediction (PLP) in many discrete-time or discrete-event systems allows for computation-efficient controller design by memorizing patterns to schedule control policies based on their future occurrences. In this paper, we demonstrate the effect of PLP by designing a controller architecture for a class of linear Markovian jump systems (MJS) where the aforementioned ``patterns'' correspond to finite-length sequences of modes. In our analysis of recurrent patterns, we use martingale theory to derive closed-form solutions to quantities pertaining to the occurrence of patterns: 1) the expected minimum occurrence time of any pattern from some predefined collection, 2) the probability of a pattern being the first to occur among the collection. Our method is applicable to real-world dynamics because we make two extensions to common assumptions in prior pattern-occurrence literature. First, the distribution of the mode process is unknown, and second, the true realization of the mode process is not observable. As demonstration, we consider fault-tolerant control of a dynamic topology-switching network, and empirically compare PLP to two controllers without PLP: a baseline based on the novel System Level Synthesis (SLS) approach and a topology-robust extension of the SLS baseline. We show that PLP is able to reject disturbances as effectively as the topology-robust controller at reduced computation time and control effort. We discuss several important tradeoffs, such as the size of the pattern collection and the system scale versus the accuracy of the mode predictions, which show how different PLP implementations affect stabilization and runtime performance.
Abstract
A code of length $n$ is said to be (combinatorially) $(\rho,L)$-list decodable if the Hamming ball of radius $\rho n$ around any vector in the ambient space does not contain more than $L$ codewords. We study a recently introduced class of higher order MDS codes, which are closely related (via duality) to codes that achieve a generalized Singleton bound for list decodability. For some $\ell\geq 1$, higher order MDS codes of length $n$, dimension $k$, and order $\ell$ are denoted as $(n,k)$-MDS($\ell$) codes. We present a number of results on the structure of these codes, identifying the `extend-ability' of their parameters in various scenarios. Specifically, for some parameter regimes, we identify conditions under which $(n_1,k_1)$-MDS($\ell_1$) codes can be obtained from $(n_2,k_2)$-MDS($\ell_2$) codes, via various techniques. We believe that these results will aid in efficient constructions of higher order MDS codes. We also obtain a new field size upper bound for the existence of such codes, which arguably improves over the best known existing bound, in some parameter regimes.
Structured condition numbers for generalized saddle point systems
Abstract
In recent times, a significant amount of effort has been expended towards the development of stationary iterative techniques for the numerical solution of generalized saddle point (GSP) systems. The condition number (CN) is widely employed in perturbation analysis to determine the relative sensitivity of a numerical solution. In order to assess the robustness of numerical solution, in this paper, we address three types of condition numbers (CNs) for GSP systems: structured normwise, mixed and componentwise, with the assumption that structure-preserving perturbations are applied to blocks of the coefficient matrix of the system. Explicit formulae for the structured CNs are derived in three cases. First, when (1,1) and (2,2)-blocks exhibit linear structures (general case) and the transpose of (1,2)-block is not equal to the (2,1)-block of the coefficient matrix. Second, by employing the expressions obtained in the first case, the compact formulae for structured CNs are investigated when (1,1) and (2,2)-blocks adhere to the symmetric structures. Third, when the transpose of (1,2)-block equals (2,1)-block. We also compare the obtained formulae of structured CNs with their unstructured counterparts. In addition, obtained results are used to recover the previous CNs formulae for the weighted least squares (WLS) problem and the standard least squares (SLS) problem. Finally, numerical experiments demonstrate that the proposed structured CNs outperform their unstructured counterparts, so validating the effectiveness of the proposed CNs.
Keyword: faster
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
Authors: Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and generalize. Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks. Conventional LS, however, regardless of the training instance assumes that each non-target class is equally likely. In this work, we present a general framework for training with label regularization, which includes conventional LS but can also model instance-specific variants. Based on this formulation, we propose an efficient way of learning LAbel regularization by devising a Bi-level Optimization (LABO) problem. We derive a deterministic and interpretable solution of the inner loop as the optimal label smoothing without the need to store the parameters or the output of a trained model. Finally, we conduct extensive experiments and demonstrate our LABO consistently yields improvement over conventional label regularization on various fields, including seven machine translation and three image classification tasks across various
CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers
Authors: Brian Wheatman, Randal Burns, Aydın Buluç, Helen Xu
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Abstract
This paper introduces the batch-parallel Compressed Packed Memory Array (CPMA), a compressed dynamic ordered batch-parallel set data structure based on the Packed Memory Array (PMA). Traditionally, batch-parallel sets are built on pointer-based data structures such as trees because pointer-based structures enable fast parallel unions via pointer manipulation. When compared to cache-optimized trees, PMAs were slower to update but faster to scan. The batch-parallel CPMA overcomes this tradeoff between updates and scans by optimizing for cache-friendliness. On average, the CPMA is faster than compressed PaC-trees, a state-of-the-art batch-parallel set library based on cache-optimized trees, by 1.2x on range queries and 3x on batch updates. We further evaluate the CPMA compared to compressed PaC-trees on a real-world application of dynamic graph processing. The CPMA is on average 1.2x faster on a suite of graph algorithms and 2x faster on batch inserts for graphs when compared with compressed PaC-trees.
Who Needs Decoders? Efficient Estimation of Sequence-level Attributes
Authors: Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.
Sorting Finite Automata via Partition Refinement
Authors: Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, Nicola Prezza
Abstract
Wheeler nondeterministic finite automata (WNFAs) were introduced as a generalization of prefix sorting from strings to labeled graphs. WNFAs admit optimal solutions to classic hard problems on labeled graphs and languages. The problem of deciding whether a given NFA is Wheeler is known to be NP-complete. Recently, however, Alanko et al. showed how to side-step this complexity by switching to preorders: letting $Q$ be the set of states, $E$ the set of transitions, $|Q|=n$, and $|E|=m$, they provided a $O(mn^2)$-time algorithm computing a totally-ordered partition of the WNFA's states such that (1) equivalent states recognize the same regular language, and (2) the order of non-equivalent states is consistent with any Wheeler order, when one exists. Then, the output is a preorder of the states as useful for pattern matching as standard Wheeler orders. Further research generalized these concepts to arbitrary NFAs by introducing co-lex partial preorders: any NFA admits a partial preorder of its states reflecting the co-lex order of their accepted strings; the smaller the width of such preorder is, the faster regular expression matching queries can be performed. To date, the fastest algorithm for computing the smallest-width partial preorder on NFAs runs in $O(m^2+n^{5/2})$ time, while on DFAs the same can be done in $O(\min(n^2\log n,mn))$ time. In this paper, we provide much more efficient solutions to the problem above. Our results are achieved by extending a classic algorithm for the relational coarsest partition refinement problem to work with ordered partitions. Specifically, we provide a $O(m\log n)$-time algorithm computing a co-lex total preorder when the input is a WNFA, and an algorithm with the same time complexity computing the smallest-width co-lex partial order of any DFA. Also, we present implementations of our algorithms and show that they are very efficient in practice.
A Generalized Covering Algorithm for Chained Codes
Abstract
The covering radius is a fundamental property of linear codes that characterizes the trade-off between storage and access in linear data-query protocols. The generalized covering radius was recently defined by Elimelech and Schwartz for applications in joint-recovery of linear data-queries. In this work we extend a known bound on the ordinary covering radius to the generalized one for all codes satisfying the chain condition -- a known condition which is satisfied by most known families of codes. Given a generator matrix of a special form, we also provide an algorithm which finds codewords which cover the input vectors within the distance specified by the bound. For the case of Reed-Muller codes we provide efficient construction of such generator matrices, therefore providing a faster alternative to a previous generalized covering algorithm for Reed-Muller codes.
Latent Interactive A2C for Improved RL in Open Many-Agent Systems
Abstract
There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. But, these methods involve obtaining various types of information from the other agents, which may not be feasible in competitive or adversarial settings. A recent method, the interactive advantage actor critic (IA2C), engages in decentralized training coupled with decentralized execution, aiming to predict the other agents' actions from possibly noisy observations. In this paper, we present the latent IA2C that utilizes an encoder-decoder architecture to learn a latent representation of the hidden state and other agents' actions. Our experiments in two domains -- each populated by many agents -- reveal that the latent IA2C significantly improves sample efficiency by reducing variance and converging faster. Additionally, we introduce open versions of these domains where the agent population may change over time, and evaluate on these instances as well.
E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation
Authors: Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong
Abstract
Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language. Existing methods, both two-stage cascade and one-stage end-to-end architectures, suffer from different issues. The cascade models can benefit from the large-scale optical character recognition (OCR) and MT datasets but the two-stage architecture is redundant. The end-to-end models are efficient but suffer from training data deficiency. To this end, in our paper, we propose an end-to-end TIMT model fully making use of the knowledge from existing OCR and MT datasets to pursue both an effective and efficient framework. More specifically, we build a novel modal adapter effectively bridging the OCR encoder and MT decoder. End-to-end TIMT loss and cross-modal contrastive loss are utilized jointly to align the feature distribution of the OCR and MT tasks. Extensive experiments show that the proposed method outperforms the existing two-stage cascade models and one-stage end-to-end models with a lighter and faster architecture. Furthermore, the ablation studies verify the generalization of our method, where the proposed modal adapter is effective to bridge various OCR and MT models.
Attack Named Entity Recognition by Entity Boundary Interference
Abstract
Named Entity Recognition (NER) is a cornerstone NLP task while its robustness has been given little attention. This paper rethinks the principles of NER attacks derived from sentence classification, as they can easily violate the label consistency between the original and adversarial NER examples. This is due to the fine-grained nature of NER, as even minor word changes in the sentence can result in the emergence or mutation of any entities, resulting in invalid adversarial examples. To this end, we propose a novel one-word modification NER attack based on a key insight, NER models are always vulnerable to the boundary position of an entity to make their decision. We thus strategically insert a new boundary into the sentence and trigger the Entity Boundary Interference that the victim model makes the wrong prediction either on this boundary word or on other words in the sentence. We call this attack Virtual Boundary Attack (ViBA), which is shown to be remarkably effective when attacking both English and Chinese models with a 70%-90% attack success rate on state-of-the-art language models (e.g. RoBERTa, DeBERTa) and also significantly faster than previous methods.
HybridNet: Dual-Branch Fusion of Geometrical and Topological Views for VLSI Congestion Prediction
Abstract
Accurate early congestion prediction can prevent unpleasant surprises at the routing stage, playing a crucial character in assisting designers to iterate faster in VLSI design cycles. In this paper, we introduce a novel strategy to fully incorporate topological and geometrical features of circuits by making several key designs in our network architecture. To be more specific, we construct two individual graphs (geometry-graph, topology-graph) with distinct edge construction schemes according to their unique properties. We then propose a dual-branch network with different encoder layers in each pathway and aggregate representations with a sophisticated fusion strategy. Our network, named HybridNet, not only provides a simple yet effective way to capture the geometric interactions of cells, but also preserves the original topological relationships in the netlist. Experimental results on the ISPD2015 benchmarks show that we achieve an improvement of 10.9% compared to previous methods.
Fast Many-to-Many Routing for Ridesharing with Multiple Pickup and Dropoff Locations
Abstract
We introduce KaRRi, an improved algorithm for scheduling a fleet of shared vehicles as it is used by services like UberXShare and Lyft Shared. We speed up the basic online algorithm that looks for all possible insertions of a new customer into a set of existing routes, we generalize the objective function, and efficiently support a large number of possible pick-up and drop-off locations. This lays an algorithmic foundation for ridesharing systems with higher vehicle occupancy -- enabling greatly reduced cost and ecological impact at comparable service quality. We find that our algorithm computes assignments between vehicles and riders several times faster than a previous state-of-the-art approach. Further, we observe that allowing meeting points for vehicles and riders can reduce the operating cost of vehicle fleets by up to $15\%$ while also reducing passenger wait and trip times.
Robust Implicit Regularization via Weight Normalization
Abstract
Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice. However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing gradient descent with weight normalization, where the weight vector is reparamterized in terms of polar coordinates, and gradient descent is applied to the polar coordinates. By analyzing key invariants of the gradient flow and using Lojasiewicz's Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient descent, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale. Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization in overparameterized diagonal linear network models.
ProxMaP: Proximal Occupancy Map Prediction for Efficient Indoor Robot Navigation
Abstract
In a typical path planning pipeline for a ground robot, we build a map (e.g., an occupancy grid) of the environment as the robot moves around. While navigating indoors, a ground robot's knowledge about the environment may be limited due to occlusions. Therefore, the map will have many as-yet-unknown regions that may need to be avoided by a conservative planner. Instead, if a robot is able to correctly predict what its surroundings and occluded regions look like, the robot may be more efficient in navigation. In this work, we focus on predicting occupancy within the reachable distance of the robot to enable faster navigation and present a self-supervised proximity occupancy map prediction method, named ProxMaP. We show that ProxMaP generalizes well across realistic and real domains, and improves the robot navigation efficiency in simulation by \textbf{$12.40\%$} against the traditional navigation method. We share our findings on our project webpage (see https://raaslab.org/projects/ProxMaP ).
Efficient pattern-based anomaly detection in a network of multivariate devices
Authors: Len Feremans, Boris Cule, Bart Goethals
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract
Many organisations manage service quality and monitor a large set devices and servers where each entity is associated with telemetry or physical sensor data series. Recently, various methods have been proposed to detect behavioural anomalies, however existing approaches focus on multivariate time series and ignore communication between entities. Moreover, we aim to support end-users in not only in locating entities and sensors causing an anomaly at a certain period, but also explain this decision. We propose a scalable approach to detect anomalies using a two-step approach. First, we recover relations between entities in the network, since relations are often dynamic in nature and caused by an unknown underlying process. Next, we report anomalies based on an embedding of sequential patterns. Pattern mining is efficient and supports interpretation, i.e. patterns represent frequent occurring behaviour in time series. We extend pattern mining to filter sequential patterns based on frequency, temporal constraints and minimum description length. We collect and release two public datasets for international broadcasting and X from an Internet company. \textit{BAD} achieves an overall F1-Score of 0.78 on 9 benchmark datasets, significantly outperforming the best baseline by 3\%. Additionally, \textit{BAD} is also an order-of-magnitude faster than state-of-the-art anomaly detection methods.
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra
Abstract
Sparse linear algebra is crucial in many application domains, but challenging to handle efficiently in both software and hardware, with one- and two-sided operand sparsity handled with distinct approaches. In this work, we enhance an existing memory-streaming RISC-V ISA extension to accelerate both one- and two-sided operand sparsity on widespread sparse tensor formats like compressed sparse row (CSR) and compressed sparse fiber (CSF) by accelerating the underlying operations of streaming indirection, intersection, and union. Our extensions enable single-core speedups over an optimized RISC-V baseline of up to 7.0x, 7.7x, and 9.8x on sparse-dense multiply, sparse-sparse multiply, and sparse-sparse addition, respectively, and peak FPU utilizations of up to 80% on sparse-dense problems. On an eight-core cluster, sparse-dense and sparse-sparse matrix-vector multiply using real-world matrices are up to 5.0x and 5.9x faster and up to 2.9x and 3.0x more energy efficient. We explore further applications for our extensions, such as stencil codes and graph pattern matching. Compared to recent CPU, GPU, and accelerator approaches, our extensions enable higher flexibility on data representation, degree of sparsity, and dataflow at a minimal hardware footprint, adding only 1.8% in area to a compute cluster. A cluster with our extensions running CSR matrix-vector multiplication achieves 69x and 2.8x higher peak floating-point utilizations than recent CPU and GPU software, respectively.
Keyword: mobile
Crop identification using deep learning on LUCAS crop cover photos
Authors: Momchil Yordanov, Raphael d'Andrimont, Laura Martinez-Sanchez, Guido Lemoine, Dominique Fasbender, Marijn van der Velde
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Crop classification via deep learning on ground imagery can deliver timely and accurate crop-specific information to various stakeholders. Dedicated ground-based image acquisition exercises can help to collect data in data scarce regions, improve control on timing of collection, or when study areas are to small to monitor via satellite. Automatic labelling is essential when collecting large volumes of data. One such data collection is the EU's Land Use Cover Area frame Survey (LUCAS), and in particular, the recently published LUCAS Cover photos database. The aim of this paper is to select and publish a subset of LUCAS Cover photos for 12 mature major crops across the EU, to deploy, benchmark, and identify the best configuration of Mobile-net for the classification task, to showcase the possibility of using entropy-based metrics for post-processing of results, and finally to show the applications and limitations of the model in a practical and policy relevant context. In particular, the usefulness of automatically identifying crops on geo-tagged photos is illustrated in the context of the EU's Common Agricultural Policy. The work has produced a dataset of 169,460 images of mature crops for the 12 classes, out of which 15,876 were manually selected as representing a clean sample without any foreign objects or unfavorable conditions. The best performing model achieved a Macro F1 (M-F1) of 0.75 on an imbalanced test dataset of 8,642 photos. Using metrics from information theory, namely - the Equivalence Reference Probability, resulted in achieving an increase of 6%. The most unfavorable conditions for taking such images, across all crop classes, were found to be too early or late in the season. The proposed methodology shows the possibility for using minimal auxiliary data, outside the images themselves, in order to achieve a M-F1 of 0.817 for labelling between 12 major European crops.
Semi-Supervised Federated Learning for Keyword Spotting
Authors: Enmao Diao, Eric W. Tramel, Jie Ding, Tao Zhang
Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract
Keyword Spotting (KWS) is a critical aspect of audio-based applications on mobile devices and virtual assistants. Recent developments in Federated Learning (FL) have significantly expanded the ability to train machine learning models by utilizing the computational and private data resources of numerous distributed devices. However, existing FL methods typically require that devices possess accurate ground-truth labels, which can be both expensive and impractical when dealing with local audio data. In this study, we first demonstrate the effectiveness of Semi-Supervised Federated Learning (SSL) and FL for KWS. We then extend our investigation to Semi-Supervised Federated Learning (SSFL) for KWS, where devices possess completely unlabeled data, while the server has access to a small amount of labeled data. We perform numerical analyses using state-of-the-art SSL, FL, and SSFL techniques to demonstrate that the performance of KWS models can be significantly improved by leveraging the abundant unlabeled heterogeneous data available on devices.
Child Palm-ID: Contactless Palmprint Recognition for Children
Authors: Akash Godbole, Steven A. Grosz, Anil K. Jain
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Effective distribution of nutritional and healthcare aid for children, particularly infants and toddlers, in some of the least developed and most impoverished countries of the world, is a major problem due to the lack of reliable identification documents. Biometric authentication technology has been investigated to address child recognition in the absence of reliable ID documents. We present a mobile-based contactless palmprint recognition system, called Child Palm-ID, which meets the requirements of usability, hygiene, cost, and accuracy for child recognition. Using a contactless child palmprint database, Child-PalmDB1, consisting of 19,158 images from 1,020 unique palms (in the age range of 6 mos. to 48 mos.), we report a TAR=94.11% @ FAR=0.1%. The proposed Child Palm-ID system is also able to recognize adults, achieving a TAR=99.4% on the CASIA contactless palmprint database and a TAR=100% on the COEP contactless adult palmprint database, both @ FAR=0.1%. These accuracies are competitive with the SOTA provided by COTS systems. Despite these high accuracies, we show that the TAR for time-separated child-palmprints is only 78.1% @ FAR=0.1%.
Voicify Your UI: Towards Android App Control with Voice Commands
Abstract
Nowadays, voice assistants help users complete tasks on the smartphone with voice commands, replacing traditional touchscreen interactions when such interactions are inhibited. However, the usability of those tools remains moderate due to the problems in understanding rich language variations in human commands, along with efficiency and comprehensibility issues. Therefore, we introduce Voicify, an Android virtual assistant that allows users to interact with on-screen elements in mobile apps through voice commands. Using a novel deep learning command parser, Voicify interprets human verbal input and performs matching with UI elements. In addition, the tool can directly open a specific feature from installed applications by fetching application code information to explore the set of in-app components. Our command parser achieved 90\% accuracy on the human command dataset. Furthermore, the direct feature invocation module achieves better feature coverage in comparison to Google Assistant. The user study demonstrates the usefulness of Voicify in real-world scenarios.
Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit
Abstract
Automatic group emotion recognition plays an important role in understanding complex human-human interaction. This paper introduces, Emolysis, a standalone open-source toolkit for real-time multimodal group emotion recognition and visualization. Given any input video, Emolysis processes nearly real-time synchronized multimodal input and maps it to group level emotion, valence and arousal. Additionally, the toolkit supports major mobile and desktop platforms (Android, iOS, Windows). The Emolysis platform also comes with an intuitive graphical user interface that allows users to select different modalities and target persons for more fine grained emotion analysis. Emolysis is freely available for academic research, and encourages application developers to extend it to application specific environments on top of the existing system. We believe that the extension mechanism is quite straightforward. Our code and models are available at https://github.com/ControlNet/emolysis.
Survey of Federated Learning Models for Spatial-Temporal Mobility Applications
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Abstract
Federated learning involves training statistical models over edge devices such as mobile phones such that the training data is kept local. Federated Learning (FL) can serve as an ideal candidate for training spatial temporal models that rely on heterogeneous and potentially massive numbers of participants while preserving the privacy of highly sensitive location data. However, there are unique challenges involved with transitioning existing spatial temporal models to decentralized learning. In this survey paper, we review the existing literature that has proposed FL-based models for predicting human mobility, traffic prediction, community detection, location-based recommendation systems, and other spatial-temporal tasks. We describe the metrics and datasets these works have been using and create a baseline of these approaches in comparison to the centralized settings. Finally, we discuss the challenges of applying spatial-temporal models in a decentralized setting and by highlighting the gaps in the literature we provide a road map and opportunities for the research community.
Abstract
The main theme of this paper is to implement the mobility model in Cooja simulator and to investigate the impact of the mobility on the performance of Routing Protocol over Low power Lossy networks (RPL) in the IoT environment. In the real world, mobility occurs frequently. Therefore in this paper, a frequently used mobility model -- Random Way Point (RWP) is used for analysis. RWP can be readily applied to many existing applications. By default, the Cooja simulator does not support mobility models. For this, the Bonn Motion is introduced into Cooja as a plugin. As IoT deals with the resource-constrained environment, a comparison is done between the static environment and the mobile environment in terms of power consumption. As expected, the results indicate that mobility affects the RPL in terms of Power Consumption.
Understanding why SLAM algorithms fail in modern indoor environments
Abstract
Simultaneous localization and mapping (SLAM) algorithms are essential for the autonomous navigation of mobile robots. With the increasing demand for autonomous systems, it is crucial to evaluate and compare the performance of these algorithms in real-world environments. In this paper, we provide an evaluation strategy and real-world datasets to test and evaluate SLAM algorithms in complex and challenging indoor environments. Further, we analysed state-of-the-art (SOTA) SLAM algorithms based on various metrics such as absolute trajectory error, scale drift, and map accuracy and consistency. Our results demonstrate that SOTA SLAM algorithms often fail in challenging environments, with dynamic objects, transparent and reflecting surfaces. We also found that successful loop closures had a significant impact on the algorithm's performance. These findings highlight the need for further research to improve the robustness of the algorithms in real-world scenarios.
Resilient Temporal Logic Planning in the Presence of Robot Failures
Authors: Samarth Kalluraya, George J. Pappas, Yiannis Kantaros
Abstract
Several task and motion planning algorithms have been proposed recently to design paths for mobile robot teams with collaborative high-level missions specified using formal languages, such as Linear Temporal Logic (LTL). However, the designed paths often lack reactivity to failures of robot capabilities (e.g., sensing, mobility, or manipulation) that can occur due to unanticipated events (e.g., human intervention or system malfunctioning) which in turn may compromise mission performance. To address this novel challenge, in this paper, we propose a new resilient mission planning algorithm for teams of heterogeneous robots with collaborative LTL missions. The robots are heterogeneous with respect to their capabilities while the mission requires applications of these skills at certain areas in the environment in a temporal/logical order. The proposed method designs paths that can adapt to unexpected failures of robot capabilities. This is accomplished by re-allocating sub-tasks to the robots based on their currently functioning skills while minimally disrupting the existing team motion plans. We provide experiments and theoretical guarantees demonstrating the efficiency and resiliency of the proposed algorithm.
Implementation of a Channel Model for Non-Terrestrial Networks in ns-3
Authors: Mattia Sandri, Matteo Pagin, Marco Giordani, Michele Zorzi
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
While the 5th generation (5G) of mobile networks has landed in the commercial area, the research community is exploring new functionalities for 6th generation (6G) networks, for example non-terrestrial networks (NTNs) via space/air nodes such as Unmanned Aerial Vehicles (UAVs), High Altitute Platforms (HAPs) or satellites. Specifically, satellite-based communication offers new opportunities for future wireless applications, such as providing connectivity to remote or otherwise unconnected areas, complementing terrestrial networks to reduce connection downtime, as well as increasing traffic efficiency in hot spot areas. In this context, an accurate characterization of the NTN channel is the first step towards proper protocol design. Along these lines, this paper provides an ns-3 implementation of the 3rd Generation Partnership Project (3GPP) channel and antenna models for NTN described in Technical Report 38.811. In particular, we extend the ns-3 code base with new modules to implement the attenuation of the signal in air/space due to atmospheric gases and scintillation, and new mobility and fading models to account for the Geocentric Cartesian coordinate system of satellites. Finally, we validate the accuracy of our ns-3 module via simulations against 3GPP calibration results
TidyBot: Personalized Robot Assistance with Large Language Models
Authors: Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, Thomas Funkhouser
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people's preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models (LLMs) to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios.
Keyword: pruning
Distributional Multi-Objective Decision Making
Authors: Willem Röpke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowé, Diederik M. Roijers
Abstract
For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.
Keyword: voxel
There is no result
Keyword: lidar
DC3DCD: unsupervised learning for multiclass 3D point cloud change detection
Authors: Iris de Gélis (1 and 2), Sébastien Lefèvre (2), Thomas Corpetti (3) ((1) Magellium, (2) Institut de Recherche en Informatique et Systèmes Aléatoires IRISA - UMR 6074 - Université Bretagne Sud, (3) Littoral - Environnement - Télédétection - Géomatique LETG - UMR 6554 - Université Rennes 2)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In a constant evolving world, change detection is of prime importance to keep updated maps. To better sense areas with complex geometry (urban areas in particular), considering 3D data appears to be an interesting alternative to classical 2D images. In this context, 3D point clouds (PCs) obtained by LiDAR or photogrammetry are very interesting. While recent studies showed the considerable benefit of using deep learning-based methods to detect and characterize changes into raw 3D PCs, these studies rely on large annotated training data to obtain accurate results. The collection of these annotations are tricky and time-consuming. The availability of unsupervised or weakly supervised approaches is then of prime interest. In this paper, we propose an unsupervised method, called DeepCluster 3D Change Detection (DC3DCD), to detect and categorize multiclass changes at point level. We classify our approach in the unsupervised family given the fact that we extract in a completely unsupervised way a number of clusters associated with potential changes. Let us precise that in the end of the process, the user has only to assign a label to each of these clusters to derive the final change map. Our method builds upon the DeepCluster approach, originally designed for image classification, to handle complex raw 3D PCs and perform change segmentation task. An assessment of the method on both simulated and real public dataset is provided. The proposed method allows to outperform fully-supervised traditional machine learning algorithm and to be competitive with fully-supervised deep learning networks applied on rasterization of 3D PCs with a mean of IoU over classes of change of 57.06% and 66.69% for the simulated and the real datasets, respectively.
Keyword: diffusion
Atmospheric Turbulence Correction via Variational Deep Diffusion
Authors: Xijun Wang, Santiago López-Tapia, Aggelos K. Katsaggelos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Atmospheric Turbulence (AT) correction is a challenging restoration task as it consists of two distortions: geometric distortion and spatially variant blur. Diffusion models have shown impressive accomplishments in photo-realistic image synthesis and beyond. In this paper, we propose a novel deep conditional diffusion model under a variational inference framework to solve the AT correction problem. We use this framework to improve performance by learning latent prior information from the input and degradation processes. We use the learned information to further condition the diffusion model. Experiments are conducted in a comprehensive synthetic AT dataset. We show that the proposed framework achieves good quantitative and qualitative results.
Modeling Viral Information Spreading via Directed Acyclic Graph Diffusion
Abstract
Viral information like rumors or fake news is spread over a communication network like a virus infection in a unidirectional manner: entity $i$ conveys information to a neighbor $j$, resulting in two equally informed (infected) parties. Existing graph diffusion works focus only on bidirectional diffusion on an undirected graph. Instead, we propose a new directed acyclic graph (DAG) diffusion model to estimate the probability $x_i(t)$ of node $i$'s infection at time $t$ given a source node $s$, where $x_i(\infty)~=~1$. Specifically, given an undirected positive graph modeling node-to-node communication, we first compute its graph embedding: a latent coordinate for each node in an assumed low-dimensional manifold space from extreme eigenvectors via LOBPCG. Next, we construct a DAG based on Euclidean distances between latent coordinates. Spectrally, we prove that the asymmetric DAG Laplacian matrix contains real non-negative eigenvalues, and that the DAG diffusion converges to the all-infection vector $\x(\infty) = \1$ as $t \rightarrow \infty$. Simulation experiments show that our proposed DAG diffusion accurately models viral information spreading over a variety of graph structures at different time instants.
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Abstract
Diffusion models, which have emerged to become popular text-to-image generation models, can produce high-quality and content-rich images guided by textual prompts. However, there are limitations to semantic understanding and commonsense reasoning in existing models when the input prompts are concise narrative, resulting in low-quality image generation. To improve the capacities for narrative prompts, we propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models. To reach this goal, we first collect and annotate a new dataset SURD which consists of more than 57,000 semantically corrected multi-modal samples. Each sample contains a simple narrative prompt, a complex keyword-based prompt, and a high-quality image. Then, we align the semantic representation of narrative prompts to the complex prompts and transfer knowledge of large language models (LLMs) to our SUR-adapter via knowledge distillation so that it can acquire the powerful semantic understanding and reasoning capabilities to build a high-quality textual semantic representation for text-to-image generation. We conduct experiments by integrating multiple LLMs and popular pre-trained diffusion models to show the effectiveness of our approach in enabling diffusion models to understand and reason concise natural language without image quality degradation. Our approach can make text-to-image diffusion models easier to use with better user experience, which demonstrates our approach has the potential for further advancing the development of user-friendly text-to-image generation models by bridging the semantic gap between simple narrative prompts and complex keyword-based prompts.
Implicit-explicit Runge-Kutta for radiation hydrodynamics I: gray diffusion
Authors: Ben S. Southworth, Ryosuke Park, Svetlana Tokareva, Marc Charest
Abstract
Radiation hydrodynamics are a challenging multiscale and multiphysics set of equations. To capture the relevant physics of interest, one typically must time step on the hydrodynamics timescale, making explicit integration the obvious choice. On the other hand, the coupled radiation equations have a scaling such that implicit integration is effectively necessary in non-relativistic regimes. A first-order Lie-Trotter-like operator split is the most common time integration scheme used in practice, alternating between an explicit hydrodynamics step and an implicit radiation solve and energy deposition step. However, such a scheme is limited to first-order accuracy, and nonlinear coupling between the radiation and hydrodynamics equations makes a more general additive partitioning of the equations non-trivial. Here, we develop a new formulation and partitioning of radiation hydrodynamics with gray diffusion that allows us to apply (linearly) implicit-explicit Runge-Kutta time integration schemes. We prove conservation of total energy in the new framework, and demonstrate 2nd-order convergence in time on multiple radiative shock problems, achieving error 3--5 orders of magnitude smaller than the first-order Lie-Trotter operator split at the hydrodynamic CFL, even when Lie-Trotter applies a 3rd-order TVD Runge-Kutta scheme to the hydrodynamics equations.
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer
Authors: Nisha Huang, Yuxin Zhang, Weiming Dong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract
Large-scale text-to-video diffusion models have demonstrated an exceptional ability to synthesize diverse videos. However, due to the lack of extensive text-to-video datasets and the necessary computational resources for training, directly applying these models for video stylization remains difficult. Also, given that the noise addition process on the input content is random and destructive, fulfilling the style transfer task's content preservation criteria is challenging. This paper proposes a zero-shot video stylization method named Style-A-Video, which utilizes a generative pre-trained transformer with an image latent diffusion model to achieve a concise text-controlled video stylization. We improve the guidance condition in the denoising process, establishing a balance between artistic expression and structure preservation. Furthermore, to decrease inter-frame flicker and avoid the formation of additional artifacts, we employ a sampling optimization and a temporal consistency module. Extensive experiments show that we can attain superior content preservation and stylistic performance while incurring less consumption than previous solutions. Code will be available at https://github.com/haha-lisa/Style-A-Video.
Abstract
Existing digital human models approximate the human skeletal system using rigid bodies connected by rotational joints. While the simplification is considered acceptable for legs and arms, it significantly lacks fidelity to model rich torso movements in common activities such as dancing, Yoga, and various sports. Research from biomechanics provides more detailed modeling for parts of the torso, but their models often operate in isolation and are not fast and robust enough to support computationally heavy applications and large-scale data generation for full-body digital humans. This paper proposes a new torso model that aims to achieve high fidelity both in perception and in functionality, while being computationally feasible for simulation and optimal control tasks. We build a detailed human torso model consisting of various anatomical components, including facets, ligaments, and intervertebral discs, by coupling efficient finite-element and rigid-body simulations. Given an existing motion capture sequence without dense markers placed on the torso, our new model is able to recover the underlying torso bone movements. Our method is remarkably robust that it can be used to automatically "retrofit" the entire Mixamo motion database of highly diverse human motions without user intervention. We also show that our model is computationally efficient for solving trajectory optimization of highly dynamic full-body movements, without relying on any reference motion. Physiological validity of the model is validated against established literature.
CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers
Authors: Brian Wheatman, Randal Burns, Aydın Buluç, Helen Xu
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Abstract
This paper introduces the batch-parallel Compressed Packed Memory Array (CPMA), a compressed dynamic ordered batch-parallel set data structure based on the Packed Memory Array (PMA). Traditionally, batch-parallel sets are built on pointer-based data structures such as trees because pointer-based structures enable fast parallel unions via pointer manipulation. When compared to cache-optimized trees, PMAs were slower to update but faster to scan. The batch-parallel CPMA overcomes this tradeoff between updates and scans by optimizing for cache-friendliness. On average, the CPMA is faster than compressed PaC-trees, a state-of-the-art batch-parallel set library based on cache-optimized trees, by 1.2x on range queries and 3x on batch updates. We further evaluate the CPMA compared to compressed PaC-trees on a real-world application of dynamic graph processing. The CPMA is on average 1.2x faster on a suite of graph algorithms and 2x faster on batch inserts for graphs when compared with compressed PaC-trees.
Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer
Authors: Tao Hong
Subjects: Computation and Language (cs.CL); Pattern Formation and Solitons (nlin.PS)
Abstract
Large Language Models (LLMs), such as the Generative Pretrained Transformer (GPT), have achieved tremendous success in various language tasks, but their emergent abilities have also raised many questions, concerns, and challenges that need to be addressed. To gain a better understanding of the models' inner mechanisms, we analyze the hidden state and channel wave dynamics in a small GPT, focusing on the coherence of wave patterns in terms of cross-channel correlation and individual auto-correlation. Our findings suggest that wave dynamics offer consistent and repeatable intrinsic oscillation modes, along with context-aware plasticity and expressiveness in language generation. By analyzing wave patterns, coherence, and clustering, we provide a systematic way to identify and interpret the functionality of the hidden state channels, paving the way to understand and control higher-level language pattern formation. In addition, we investigate the Poisson statistics of spelling errors in text sequence generation across various levels of model training and observe a phase-transition-like process. As coherence builds up, there is a competition between the generation of correct and misspelled words. However, once the model is adequately trained and significant coherence has emerged, the coherent process becomes strong enough to effectively suppress spelling errors, preventing the cascade amplification of defects. The distribution of correct spellings transitions from Poissonian to Sub-Poissonian, while the distribution of misspellings shows the opposite trend. By leveraging concepts and techniques from quantum physics, we gain novel insights into the dynamics of the small GPT. This approach can be extended to larger language models that exhibit more complex coherent language patterns, opening up opportunities to interpret their emergent capabilities and develop more specialized models.
Autumn: A Scalable Read Optimized LSM-tree based Key-Value Stores with Fast Point and Range Read Speed
Authors: Fuheng Zhao, Leron Reznikov, Divyakant Agrawal, Amr El Abbadi
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)
Abstract
The Log Structured Merge Trees (LSM-tree) based key-value stores are widely used in many storage systems to support a variety of operations such as updates, point reads, and range reads. Traditionally, LSM-tree's merge policy organizes data into multiple levels of exponentially increasing capacity to support high-speed writes. However, we contend that the traditional merge policies are not optimized for reads. In this work, we present Autumn, a scalable and read optimized LSM-tree based key-value stores with minimal point and range read cost. The key idea in improving the read performance is to dynamically adjust the capacity ratio between two adjacent levels as more data are stored. As a result, smaller levels gradually increase their capacities and merge more often. In particular, the point and range read cost improves from the previous best known $O(logN)$ complexity to $O(\sqrt{logN})$ in Autumn by applying the new novel Garnering merge policy. While Garnering merge policy optimizes for both point reads and range reads, it maintains high performance for updates. Moreover, to further improve the update costs, Autumn uses a small amount of bounded space of DRAM to pin/keep the first level of LSM-tree. We implemented Autumn on top of LevelDB and experimentally showcases the gain in performance for real world workloads.
Knowing Who Knows What: Designing Socially Assistive Robots with Transactive Memory System
Abstract
Transactive Memory System (TMS) is a group theory that describes how communication can enable the combination of individual minds into a group. While this theory has been extensively studied in human-human groups, it has not yet been formally applied to socially assistive robot design. We demonstrate how the three-phase TMS group communication process-which involves encoding, storage, and retrieval-can be leveraged to improve decision making in socially assistive robots with multiple stakeholders. By clearly defining how the robot is gaining information, storing and updating its memory, and retrieving information from its memory, we believe that socially assistive robots can make better decisions and provide more transparency behind their actions in the group context. Bringing communication theory to robot design can provide a clear framework to help robots integrate better into human-human group dynamics and thus improve their acceptance and use.
BARA: Efficient Incentive Mechanism with Online Reward Budget Allocation in Cross-Silo Federated Learning
Authors: Yunchao Yang, Yipeng Zhou, Miao Hu, Di Wu, Quan Z. Sheng
Abstract
Federated learning (FL) is a prospective distributed machine learning framework that can preserve data privacy. In particular, cross-silo FL can complete model training by making isolated data islands of different organizations collaborate with a parameter server (PS) via exchanging model parameters for multiple communication rounds. In cross-silo FL, an incentive mechanism is indispensable for motivating data owners to contribute their models to FL training. However, how to allocate the reward budget among different rounds is an essential but complicated problem largely overlooked by existing works. The challenge of this problem lies in the opaque feedback between reward budget allocation and model utility improvement of FL, making the optimal reward budget allocation complicated. To address this problem, we design an online reward budget allocation algorithm using Bayesian optimization named BARA (\underline{B}udget \underline{A}llocation for \underline{R}everse \underline{A}uction). Specifically, BARA can model the complicated relationship between reward budget allocation and final model accuracy in FL based on historical training records so that the reward budget allocated to each communication round is dynamically optimized so as to maximize the final model utility. We further incorporate the BARA algorithm into reverse auction-based incentive mechanisms to illustrate its effectiveness. Extensive experiments are conducted on real datasets to demonstrate that BARA significantly outperforms competitive baselines by improving model utility with the same amount of reward budget.
DynamicKD: An Effective Knowledge Distillation via Dynamic Entropy Correction-Based Distillation for Gap Optimizing
Abstract
The knowledge distillation uses a high-performance teacher network to guide the student network. However, the performance gap between the teacher and student networks can affect the student's training. This paper proposes a novel knowledge distillation algorithm based on dynamic entropy correction to reduce the gap by adjusting the student instead of the teacher. Firstly, the effect of changing the output entropy (short for output information entropy) in the student on the distillation loss is analyzed in theory. This paper shows that correcting the output entropy can reduce the gap. Then, a knowledge distillation algorithm based on dynamic entropy correction is created, which can correct the output entropy in real-time with an entropy controller updated dynamically by the distillation loss. The proposed algorithm is validated on the CIFAR100 and ImageNet. The comparison with various state-of-the-art distillation algorithms shows impressive results, especially in the experiment on the CIFAR100 regarding teacher-student pair resnet32x4-resnet8x4. The proposed algorithm raises 2.64 points over the traditional distillation algorithm and 0.87 points over the state-of-the-art algorithm CRD in classification accuracy, demonstrating its effectiveness and efficiency.
Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue
Authors: Jian Wang, Dongding Lin, Wenjie Li
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Goal-directed dialogue systems aim to proactively reach a pre-determined target through multi-turn conversations. The key to achieving this task lies in planning dialogue paths that smoothly and coherently direct conversations towards the target. However, this is a challenging and under-explored task. In this work, we propose a coherent dialogue planning approach that uses a stochastic process to model the temporal dynamics of dialogue paths. We define a latent space that captures the coherence of goal-directed behavior using a Brownian bridge process, which allows us to incorporate user feedback flexibly in dialogue planning. Based on the derived latent trajectories, we generate dialogue paths explicitly using pre-trained language models. We finally employ these paths as natural language prompts to guide dialogue generation. Our experiments show that our approach generates more coherent utterances and achieves the goal with a higher success rate.
Construction of Control Barrier Functions Using Predictions with Finite Horizon
Authors: Adrian Wiltz, Xiao Tan, Dimos V. Dimarogonas
Abstract
In this paper, we show that under mild controllability assumptions a Control Barrier Function (CBF) can be constructed based on predictions with a finite horizon. The proposed construction methodology yields a CBF that renders a prespecified subset of the state space invariant. In particular, we leverage intuitive understanding of the system dynamics to construct a subset of an unknown control-invariant set, and then apply finite horizon predictions to compute a CBF. Moreover, we provide a thorough analysis of the properties of the constructed CBF, we characterize the impact of the prediction horizon, and comment on the practical implementation. In the end, we relate our construction approach to Model Predictive Control (MPC). At the hand of a relevant application example, we demonstrate how our method is applied.
Understanding why SLAM algorithms fail in modern indoor environments
Abstract
Simultaneous localization and mapping (SLAM) algorithms are essential for the autonomous navigation of mobile robots. With the increasing demand for autonomous systems, it is crucial to evaluate and compare the performance of these algorithms in real-world environments. In this paper, we provide an evaluation strategy and real-world datasets to test and evaluate SLAM algorithms in complex and challenging indoor environments. Further, we analysed state-of-the-art (SOTA) SLAM algorithms based on various metrics such as absolute trajectory error, scale drift, and map accuracy and consistency. Our results demonstrate that SOTA SLAM algorithms often fail in challenging environments, with dynamic objects, transparent and reflecting surfaces. We also found that successful loop closures had a significant impact on the algorithm's performance. These findings highlight the need for further research to improve the robustness of the algorithms in real-world scenarios.
Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching
Abstract
3D dynamic point cloud (DPC) compression relies on mining its temporal context, which faces significant challenges due to DPC's sparsity and non-uniform structure. Existing methods are limited in capturing sufficient temporal dependencies. Therefore, this paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module to compensate and compress the DPC geometry in latent space. Specifically, we propose a hierarchical motion estimation and motion compensation (Hie-ME/MC) framework for flexible inter-prediction, which dynamically selects the granularity of optical flow to encapsulate the motion information accurately. To improve the motion estimation efficiency of the proposed inter-prediction module, we further design a KNN-attention block matching (KABM) network that determines the impact of potential corresponding points based on the geometry and feature correlation. Finally, we compress the residual and the multi-scale optical flow with a fully-factorized deep entropy model. The experiment result on the MPEG-specified Owlii Dynamic Human Dynamic Point Cloud (Owlii) dataset shows that our framework outperforms the previous state-of-the-art methods and the MPEG standard V-PCC v18 in inter-frame low-delay mode.
Error estimate of the u-series method for molecular dynamics simulations
Abstract
This paper provides an error estimate for the u-series decomposition of the Coulomb interaction in molecular dynamics simulations. We show that the number of truncated Gaussians $M$ in the u-series and the base of interpolation nodes $b$ in the bilateral serial approximation are two key parameters for the algorithm accuracy, and that the errors converge as $\mathcal{O}(b^{-M})$ for the energy and $\mathcal{O}(b^{-3M})$ for the force. Error bounds due to numerical quadrature and cutoff in both the electrostatic energy and forces are obtained. Closed-form formulae are also provided, which are useful in the parameter setup for simulations under a given accuracy. The results are verified by analyzing the errors of two practical systems.
Physics-informed Neural Networks to Model and Control Robots: a Theoretical and Experimental Investigation
Authors: Jingyue Liu, Pablo Borja, Cosimo Della Santina
Abstract
Physics-inspired neural networks are proven to be an effective modeling method by giving more physically plausible results with less data dependency. However, their application in robotics is limited due to the non-conservative nature of robot dynamics and the difficulty in friction modeling. Moreover, these physics-inspired neural networks do not account for complex input matrices, such as those found in underactuated soft robots. This paper solves these problems by extending Lagrangian and Hamiltonian neural networks by including dissipation and a simplified input matrix. Additionally, the loss function is processed using the Runge-Kutta algorithm, circumventing the inaccuracies and environmental susceptibility inherent in direct acceleration measurements. First, the effectiveness of the proposed method is validated via simulations of soft and rigid robots. Then, the proposed approach is validated experimentally in a tendon-driven soft robot and a Panda robot. The simulations and experimental results show that the modified neural networks can model different robots while the learned model enables decent anticipatory control.
Sublogarithmic Approximation for Tollbooth Pricing on a Cactus
Authors: Andrzej Turko, Jarosław Byrka
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
We study an envy-free pricing problem, in which each buyer wishes to buy a shortest path connecting her individual pair of vertices in a network owned by a single vendor. The vendor sets the prices of individual edges with the aim of maximizing the total revenue generated by all buyers. Each customer buys a path as long as its cost does not exceed her individual budget. In this case, the revenue generated by her equals the sum of prices of edges along this path. We consider the unlimited supply setting, where each edge can be sold to arbitrarily many customers. The problem is to find a price assignment which maximizes vendor's revenue. A special case in which the network is a tree is known under the name of the tollbooth problem. Gamzu and Segev proposed a $\mathcal{O} \left( \frac{\log m}{\log \log m} \right)$-approximation algorithm for revenue maximization in that setting. Note that paths in a tree network are unique, and hence the tollbooth problem falls under the category of single-minded bidders, i.e., each buyer is interested in a single fixed set of goods. In this work we step out of the single-minded setting and consider more general networks that may contain cycles. We obtain an algorithm for pricing cactus shaped networks, namely networks in which each edge can belong to at most one simple cycle. Our result is a polynomial time $\mathcal{0} \left( \frac{\log m}{\log \log m}\right)$-approximation algorithm for revenue maximization in tollbooth pricing on a cactus graph. It builds upon the framework of Gamzu and Segev, but requires substantially extending its main ideas: the recursive decomposition of the graph, the dynamic programming for rooted instances and rounding the prices.
Implicit-explicit Runge-Kutta for radiation hydrodynamics I: gray diffusion
Authors: Ben S. Southworth, Ryosuke Park, Svetlana Tokareva, Marc Charest
Abstract
Radiation hydrodynamics are a challenging multiscale and multiphysics set of equations. To capture the relevant physics of interest, one typically must time step on the hydrodynamics timescale, making explicit integration the obvious choice. On the other hand, the coupled radiation equations have a scaling such that implicit integration is effectively necessary in non-relativistic regimes. A first-order Lie-Trotter-like operator split is the most common time integration scheme used in practice, alternating between an explicit hydrodynamics step and an implicit radiation solve and energy deposition step. However, such a scheme is limited to first-order accuracy, and nonlinear coupling between the radiation and hydrodynamics equations makes a more general additive partitioning of the equations non-trivial. Here, we develop a new formulation and partitioning of radiation hydrodynamics with gray diffusion that allows us to apply (linearly) implicit-explicit Runge-Kutta time integration schemes. We prove conservation of total energy in the new framework, and demonstrate 2nd-order convergence in time on multiple radiative shock problems, achieving error 3--5 orders of magnitude smaller than the first-order Lie-Trotter operator split at the hydrodynamic CFL, even when Lie-Trotter applies a 3rd-order TVD Runge-Kutta scheme to the hydrodynamics equations.
Self-Evolving Integrated VHetNets for 6G: A Multi-Tier HFL Approach
Abstract
Self-evolving networks (SENs) are emerging technologies that dynamically and autonomously adapt and optimize their performance and behaviour based on changing conditions and evolving requirements. With the advent of fifth-generation (5G) wireless technologies and the resurgence of machine learning, SENs are expected to become a critical component of future wireless networks. In particular, integrated vertical heterogeneous network (VHetNet) architectures, which enable dynamic, three-dimensional (3D), and agile topologies, are likely to form a key foundation for SENs. However, the distributed multi-level computational and communication structure and the fully dynamic nature of self-evolving integrated VHetNets (SEI-VHetNets) necessitate the deployment of an enhanced distributed learning and computing mechanism to enable full integration and coordination. To address this need, we propose a novel learning technique, multi-tier hierarchical federated learning (MT-HFL), based on hierarchical federated learning (HFL) that enables full integration and coordination across vertical tiers. Through MT-HFL, SEI-VHetNets can learn and adapt to dynamic network conditions, optimize resource allocation, and enhance user experience in a real-time, scalable, and accurate manner while preserving user privacy. This paper presents the key characteristics and challenges of SEI-VHetNets and discusses how MT-HFL addresses them. We also discuss potential use cases and present a case study demonstrating the advantages of MT-HFL over conventional terrestrial HFL approaches.
Abstract
The study of partial differential equations (PDE) through the framework of deep learning emerged a few years ago leading to the impressive approximations of simple dynamics. Graph neural networks (GNN) turned out to be very useful in those tasks by allowing the treatment of unstructured data often encountered in the field of numerical resolutions of PDE. However, the resolutions of harder PDE such as Navier-Stokes equations are still a challenging task and most of the work done on the latter concentrate either on simulating the flow around simple geometries or on qualitative results that looks physical for design purpose. In this study, we try to leverage the work done on deep learning for PDE and GNN by proposing an adaptation of a known architecture in order to tackle the task of approximating the solution of the two-dimensional steady-state incompressible Navier-Stokes equations over different airfoil geometries. In addition to that, we test our model not only on its performance over the volume but also on its performance to approximate surface quantities such as the wall shear stress or the isostatic pressure leading to the inference of global coefficients such as the lift and the drag of our airfoil in order to allow design exploration. This work takes place in a longer project that aims to approximate three dimensional steady-state solutions over industrial geometries.
Self-Supervised Anomaly Detection of Rogue Soil Moisture Sensors
Authors: Boje Deforce, Bart Baesens, Jan Diels, Estefanía Serral Asensio
Abstract
IoT data is a central element in the successful digital transformation of agriculture. However, IoT data comes with its own set of challenges. E.g., the risk of data contamination due to rogue sensors. A sensor is considered rogue when it provides incorrect measurements over time. To ensure correct analytical results, an essential preprocessing step when working with IoT data is the detection of such rogue sensors. Existing methods assume that well-behaving sensors are known or that a large majority of the sensors is well-behaving. However, real-world data is often completely unlabeled and voluminous, calling for self-supervised methods that can detect rogue sensors without prior information. We present a self-supervised anomalous sensor detector based on a neural network with a contrastive loss, followed by DBSCAN. A core contribution of our paper is the use of Dynamic Time Warping in the negative sampling for the triplet loss. This novelty makes the use of triplet networks feasible for anomalous sensor detection. Our method shows promising results on a challenging dataset of soil moisture sensors deployed in multiple pear orchards.
Efficient pattern-based anomaly detection in a network of multivariate devices
Authors: Len Feremans, Boris Cule, Bart Goethals
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract
Many organisations manage service quality and monitor a large set devices and servers where each entity is associated with telemetry or physical sensor data series. Recently, various methods have been proposed to detect behavioural anomalies, however existing approaches focus on multivariate time series and ignore communication between entities. Moreover, we aim to support end-users in not only in locating entities and sensors causing an anomaly at a certain period, but also explain this decision. We propose a scalable approach to detect anomalies using a two-step approach. First, we recover relations between entities in the network, since relations are often dynamic in nature and caused by an unknown underlying process. Next, we report anomalies based on an embedding of sequential patterns. Pattern mining is efficient and supports interpretation, i.e. patterns represent frequent occurring behaviour in time series. We extend pattern mining to filter sequential patterns based on frequency, temporal constraints and minimum description length. We collect and release two public datasets for international broadcasting and X from an Internet company. \textit{BAD} achieves an overall F1-Score of 0.78 on 9 benchmark datasets, significantly outperforming the best baseline by 3\%. Additionally, \textit{BAD} is also an order-of-magnitude faster than state-of-the-art anomaly detection methods.
Group Activity Recognition via Dynamic Composition and Interaction
Abstract
Previous group activity recognition approaches were limited to reasoning using human relations or finding important subgroups and tended to ignore indispensable group composition and human-object interactions. This absence makes a partial interpretation of the scene and increases the interference of irrelevant actions on the results. Therefore, we propose our DynamicFormer with Dynamic composition Module (DcM) and Dynamic interaction Module (DiM) to model relations and locations of persons and discriminate the contribution of participants, respectively. Our findings on group composition and human-object interaction inspire our core idea. Group composition tells us the location of people and their relations inside the group, while interaction reflects the relation between humans and objects outside the group. We utilize spatial and temporal encoders in DcM to model our dynamic composition and build DiM to explore interaction with a novel GCN, which has a transformer inside to consider the temporal neighbors of human/object. Also, a Multi-level Dynamic Integration is employed to integrate features from different levels. We conduct extensive experiments on two public datasets and show that our method achieves state-of-the-art.
Predictive Control of Linear Discrete-Time Markovian Jump Systems by Learning Recurrent Patterns
Authors: SooJean Han, Soon-Jo Chung, John C. Doyle
Abstract
Incorporating pattern-learning for prediction (PLP) in many discrete-time or discrete-event systems allows for computation-efficient controller design by memorizing patterns to schedule control policies based on their future occurrences. In this paper, we demonstrate the effect of PLP by designing a controller architecture for a class of linear Markovian jump systems (MJS) where the aforementioned ``patterns'' correspond to finite-length sequences of modes. In our analysis of recurrent patterns, we use martingale theory to derive closed-form solutions to quantities pertaining to the occurrence of patterns: 1) the expected minimum occurrence time of any pattern from some predefined collection, 2) the probability of a pattern being the first to occur among the collection. Our method is applicable to real-world dynamics because we make two extensions to common assumptions in prior pattern-occurrence literature. First, the distribution of the mode process is unknown, and second, the true realization of the mode process is not observable. As demonstration, we consider fault-tolerant control of a dynamic topology-switching network, and empirically compare PLP to two controllers without PLP: a baseline based on the novel System Level Synthesis (SLS) approach and a topology-robust extension of the SLS baseline. We show that PLP is able to reject disturbances as effectively as the topology-robust controller at reduced computation time and control effort. We discuss several important tradeoffs, such as the size of the pattern collection and the system scale versus the accuracy of the mode predictions, which show how different PLP implementations affect stabilization and runtime performance.
Distributed economic predictive control of integrated energy systems for enhanced synergy and grid response: A decomposition and cooperation strategy
Authors: Long Wu, Xunyuan Yin, Lei Pan, Jinfeng Liu (University of Alberta)
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)
Abstract
The close integration of increasing operating units into an integrated energy system (IES) results in complex interconnections between these units. The strong dynamic interactions create barriers to designing a successful distributed coordinated controller to achieve synergy between all the units and unlock the potential for grid response. To address these challenges, we introduce a directed graph representation of IESs using an augmented Jacobian matrix to depict their underlying dynamics topology. By utilizing this representation, a generic subsystem decomposition method is proposed to partition the entire IES vertically based on the dynamic time scale and horizontally based on the closeness of interconnections between the operating units. Exploiting the decomposed subsystems, we develop a cooperative distributed economic model predictive control (DEMPC) with multiple global objectives that regulate the generated power at the grid's requests and satisfy the customers cooling and system economic requirements. In the DEMPC, multiple local decision-making agents cooperate sequentially and iteratively to leverage the potential across all the units for system-wide dynamic synergy. Furthermore, we discuss how subsystem decomposition impacts the design of distributed cooperation schemes for IESs and provide a control-oriented basic guideline on the optimal decomposition of complex energy systems. Extensive simulations demonstrate that the control strategies with different levels of decomposition and collaboration will lead to marked differences in the overall performance of IES. The standard control scheme based on the proposed subsystem configuration outperforms the empirical decomposition-based control benchmark by about 20%. The DEMPC architecture further improves the overall performance of the IES by about 55% compared to the benchmark.
Keyword: efficient
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
Anatomically Detailed Simulation of Human Torso
Do Not Blindly Imitate the Teacher: Using Perturbed Loss for Knowledge Distillation
Beyond Diagonal Reconfigurable Intelligent Surfaces Utilizing Graph Theory: Modeling, Architecture Design, and Optimization
A Case for CXL-Centric Server Processors
A Unifying Framework of Attention-based Neural Load Forecasting
Who Needs Decoders? Efficient Estimation of Sequence-level Attributes
Sorting Finite Automata via Partition Refinement
Localisation of Mammographic masses by Greedy Backtracking of Activations in the Stacked Auto-Encoders
Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation
A Generalized Covering Algorithm for Chained Codes
E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
DeepFire2: A Convolutional Spiking Neural Network Accelerator on FPGAs
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
A Fair and Resilient Decentralized Clock Network for Transaction Ordering
A High-performance, Energy-efficient Modular DMA Engine Architecture
Learning Personalized Page Content Ranking Using Customer Representation
ENCOVIZ: An open-source, secure and multi-role energy consumption visualisation platform
Structured Sentiment Analysis as Transition-based Dependency Parsing
Two new algorithms for error support recovery of low rank parity check codes
GPT-NAS: Neural Architecture Search with the Generative Pre-Trained Model
VEDLIoT -- Next generation accelerated AIoT systems and applications
Fast Many-to-Many Routing for Ridesharing with Multiple Pickup and Dropoff Locations
High-throughput Cotton Phenotyping Big Data Pipeline Lambda Architecture Computer Vision Deep Neural Networks
Graph Neural Networks for Airfoil Design
Energy-Efficient Mining for Blockchain-Enabled IoT Applications. An Optimal Multiple-Stopping Time Approach
Investigating the effect of sub-word segmentation on the performance of transformer language models
ProxMaP: Proximal Occupancy Map Prediction for Efficient Indoor Robot Navigation
Integrating Holistic and Local Information to Estimate Emotional Reaction Intensity
Efficient pattern-based anomaly detection in a network of multivariate devices
Buoyancy enabled autonomous underwater construction with cement blocks
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra
Distributional Multi-Objective Decision Making
Investigating the Software Engineering Roadmap for Smart City Infrastructure Development: Goals and Challenges
Predictive Control of Linear Discrete-Time Markovian Jump Systems by Learning Recurrent Patterns
On the Structure of Higher Order MDS Codes
Structured condition numbers for generalized saddle point systems
Keyword: faster
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers
Who Needs Decoders? Efficient Estimation of Sequence-level Attributes
Sorting Finite Automata via Partition Refinement
A Generalized Covering Algorithm for Chained Codes
Latent Interactive A2C for Improved RL in Open Many-Agent Systems
E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation
Attack Named Entity Recognition by Entity Boundary Interference
HybridNet: Dual-Branch Fusion of Geometrical and Topological Views for VLSI Congestion Prediction
Fast Many-to-Many Routing for Ridesharing with Multiple Pickup and Dropoff Locations
Robust Implicit Regularization via Weight Normalization
ProxMaP: Proximal Occupancy Map Prediction for Efficient Indoor Robot Navigation
Efficient pattern-based anomaly detection in a network of multivariate devices
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra
Keyword: mobile
Crop identification using deep learning on LUCAS crop cover photos
Semi-Supervised Federated Learning for Keyword Spotting
Child Palm-ID: Contactless Palmprint Recognition for Children
Voicify Your UI: Towards Android App Control with Voice Commands
Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit
Survey of Federated Learning Models for Spatial-Temporal Mobility Applications
Impact of Mobility on Power Consumption in RPL
Understanding why SLAM algorithms fail in modern indoor environments
Resilient Temporal Logic Planning in the Presence of Robot Failures
Implementation of a Channel Model for Non-Terrestrial Networks in ns-3
TidyBot: Personalized Robot Assistance with Large Language Models
Keyword: pruning
Distributional Multi-Objective Decision Making
Keyword: voxel
There is no result
Keyword: lidar
DC3DCD: unsupervised learning for multiclass 3D point cloud change detection
Keyword: diffusion
Atmospheric Turbulence Correction via Variational Deep Diffusion
Modeling Viral Information Spreading via Directed Acyclic Graph Diffusion
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Implicit-explicit Runge-Kutta for radiation hydrodynamics I: gray diffusion
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer
Keyword: dynamic
Anatomically Detailed Simulation of Human Torso
CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers
Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer
Autumn: A Scalable Read Optimized LSM-tree based Key-Value Stores with Fast Point and Range Read Speed
Knowing Who Knows What: Designing Socially Assistive Robots with Transactive Memory System
BARA: Efficient Incentive Mechanism with Online Reward Budget Allocation in Cross-Silo Federated Learning
DynamicKD: An Effective Knowledge Distillation via Dynamic Entropy Correction-Based Distillation for Gap Optimizing
Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue
Construction of Control Barrier Functions Using Predictions with Finite Horizon
Understanding why SLAM algorithms fail in modern indoor environments
Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching
Error estimate of the u-series method for molecular dynamics simulations
Physics-informed Neural Networks to Model and Control Robots: a Theoretical and Experimental Investigation
Sublogarithmic Approximation for Tollbooth Pricing on a Cactus
Implicit-explicit Runge-Kutta for radiation hydrodynamics I: gray diffusion
Self-Evolving Integrated VHetNets for 6G: A Multi-Tier HFL Approach
Graph Neural Networks for Airfoil Design
Self-Supervised Anomaly Detection of Rogue Soil Moisture Sensors
Efficient pattern-based anomaly detection in a network of multivariate devices
Group Activity Recognition via Dynamic Composition and Interaction
Predictive Control of Linear Discrete-Time Markovian Jump Systems by Learning Recurrent Patterns
Distributed economic predictive control of integrated energy systems for enhanced synergy and grid response: A decomposition and cooperation strategy