Abstract
This paper focuses on multiple-access protocol design in a wireless network assisted by multiple reconfigurable intelligent surfaces (RISs). By extending the existing approaches in single-user or single-RIS cases, we present two benchmark schemes for this multi-user multi-RIS scenario. Inspecting their shortcomings, a simple but efficient method coined opportunistic multi-user reflection (OMUR) is proposed. The key idea is to opportunistically select the best user as the anchor for optimizing the RISs, and non-orthogonally transmitting all users' signals simultaneously. A simplified version of OMUR exploiting random phase shifts is also proposed to avoid the complexity of RIS channel estimation.
Level Up: Private Non-Interactive Decision Tree Evaluation using Levelled Homomorphic Encryption
Abstract
As machine learning as a service continues gaining popularity, concerns about privacy and intellectual property arise. Users often hesitate to disclose their private information to obtain a service, while service providers aim to protect their proprietary models. Decision trees, a widely used machine learning model, are favoured for their simplicity, interpretability, and ease of training. In this context, Private Decision Tree Evaluation (PDTE) enables a server holding a private decision tree to provide predictions based on a client's private attributes. The protocol is such that the server learns nothing about the client's private attributes. Similarly, the client learns nothing about the server's model besides the prediction and some hyperparameters. In this paper, we propose two novel non-interactive PDTE protocols, XXCMP-PDTE and RCC-PDTE, based on two new non-interactive comparison protocols, XXCMP and RCC. Our evaluation of these comparison operators demonstrates that our proposed constructions can efficiently evaluate high-precision numbers. Specifically, RCC can compare 32-bit numbers in under 10 milliseconds. We assess our proposed PDTE protocols on decision trees trained over UCI datasets and compare our results with existing work in the field. Moreover, we evaluate synthetic decision trees to showcase scalability, revealing that RCC-PDTE can evaluate a decision tree with over 1000 nodes and 16 bits of precision in under 2 seconds. In contrast, the current state-of-the-art requires over 10 seconds to evaluate such a tree with only 11 bits of precision.
Exploring the Benefits of Differentially Private Pre-training and Parameter-Efficient Fine-tuning for Table Transformers
Authors: Xilong Wang, Chia-Mu Yu, Pin-Yu Chen
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
For machine learning with tabular data, Table Transformer (TabTransformer) is a state-of-the-art neural network model, while Differential Privacy (DP) is an essential component to ensure data privacy. In this paper, we explore the benefits of combining these two aspects together in the scenario of transfer learning -- differentially private pre-training and fine-tuning of TabTransformers with a variety of parameter-efficient fine-tuning (PEFT) methods, including Adapter, LoRA, and Prompt Tuning. Our extensive experiments on the ACSIncome dataset show that these PEFT methods outperform traditional approaches in terms of the accuracy of the downstream task and the number of trainable parameters, thus achieving an improved trade-off among parameter efficiency, privacy, and accuracy. Our code is available at github.com/IBM/DP-TabTransformer.
Offline Prompt Evaluation and Optimization with Inverse Reinforcement Learning
Authors: Hao Sun
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
The recent advances in the development of Large Language Models (LLMs) like ChatGPT have achieved remarkable performance by leveraging human expertise. Yet, fully eliciting LLMs' potential for complex tasks requires navigating the vast search space of natural language prompts. While prompt engineering has shown promise, the requisite human-crafted prompts in trial-and-error attempts and the associated costs pose significant challenges. Crucially, the efficiency of prompt optimization hinges on the costly procedure of prompt evaluation. This work introduces Prompt-OIRL, an approach rooted in offline inverse reinforcement learning that seeks to bridge the gap between effective prompt evaluation and affordability. Our method draws on offline datasets from expert evaluations, employing Inverse-RL to derive a reward model for offline, query-dependent prompt evaluations. The advantages of Prompt-OIRL are manifold: it predicts prompt performance, is cost-efficient, produces human-readable results, and efficiently navigates the prompt space. We validate our method across four LLMs and three arithmetic datasets, highlighting its potential as a robust and effective tool for offline prompt evaluation and optimization. Our code as well as the offline datasets are released, and we highlight the Prompt-OIRL can be reproduced within a few hours using a single laptop using CPU
An improved protocol for ExactlyN with more than 3 players
Authors: Lianna Hambardzumyan, Toniann Pitassi, Suhail Sherif, Morgan Shirley, Adi Shraibman
Abstract
The ExactlyN problem in the number-on-forehead (NOF) communication setting asks $k$ players, each of whom can see every input but their own, if the $k$ input numbers add up to $N$. Introduced by Chandra, Furst and Lipton in 1983, ExactlyN is important for its role in understanding the strength of randomness in communication complexity with many players. It is also tightly connected to the field of combinatorics: its $k$-party NOF communication complexity is related to the size of the largest corner-free subset in $[N]^{k-1}$. In 2021, Linial and Shraibman gave more efficient protocols for ExactlyN for 3 players. As an immediate consequence, this also gave a new construction of larger corner-free subsets in $[N]^2$. Later that year Green gave a further refinement to their argument. These results represent the first improvements to the highest-order term for $k=3$ since the famous work of Behrend in 1946. In this paper we give a corresponding improvement to the highest-order term for all $k>3$, the first since Rankin in 1961. That is, we give a more efficient protocol for ExactlyN as well as larger corner-free sets in higher dimensions. Nearly all previous results in this line of research approached the problem from the combinatorics perspective, implicitly resulting in non-constructive protocols for ExactlyN. Approaching the problem from the communication complexity point of view and constructing explicit protocols for ExactlyN was key to the improvements in the $k=3$ setting. As a further contribution we provide explicit protocols for ExactlyN for any number of players which serves as a base for our improvement.
METICULOUS: An FPGA-based Main Memory Emulator for System Software Studies
Authors: Takahiro Hirofuchi, Takaaki Fukai, Akram Ben Ahmed, Ryousei Takano, Kento Sato
Abstract
Due to the scaling problem of the DRAM technology, non-volatile memory devices, which are based on different principle of operation than DRAM, are now being intensively developed to expand the main memory of computers. Disaggregated memory is also drawing attention as an emerging technology to scale up the main memory. Although system software studies need to discuss management mechanisms for the new main memory designs incorporating such emerging memory systems, there are no feasible memory emulation mechanisms that efficiently work for large-scale, privileged programs such as operating systems and hypervisors. In this paper, we propose an FPGA-based main memory emulator for system software studies on new main memory systems. It can emulate the main memory incorporating multiple memory regions with different performance characteristics. For the address region of each memory device, it emulates the latencies, bandwidths and bit-flip error rates of read/write operations, respectively. The emulator is implemented at the hardware module of an off-the-self FPGA System-on-Chip board. Any privileged/unprivileged software programs running on its powerful 64-bit CPU cores can access emulated main memory devices at a practical speed through the exactly same interface as normal DRAM main memory. We confirmed that the emulator transparently worked for CPU cores and successfully changed the performance of a memory region according to given emulation parameters; for example, the latencies measured by CPU cores were exactly proportional to the latencies inserted by the emulator, involving the minimum overhead of approximately 240 ns. As a preliminary use case, we confirmed that the emulator allows us to change the bandwidth limit and the inserted latency individually for unmodified software programs, making discussions on latency sensitivity much easier.
Promises of Deep Kernel Learning for Control Synthesis
Authors: Robert Reed, Luca Laurenti, Morteza Lahijanian
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Abstract
Deep Kernel Learning (DKL) combines the representational power of neural networks with the uncertainty quantification of Gaussian Processes. Hence, it is potentially a promising tool to learn and control complex dynamical systems. In this work, we develop a scalable abstraction-based framework that enables the use of DKL for control synthesis of stochastic dynamical systems against complex specifications. Specifically, we consider temporal logic specifications and create an end-to-end framework that uses DKL to learn an unknown system from data and formally abstracts the DKL model into an Interval Markov Decision Process (IMDP) to perform control synthesis with correctness guarantees. Furthermore, we identify a deep architecture that enables accurate learning and efficient abstraction computation. The effectiveness of our approach is illustrated on various benchmarks, including a 5-D nonlinear stochastic system, showing how control synthesis with DKL can substantially outperform state-of-the-art competitive methods.
Efficient Finite Initialization for Tensorized Neural Networks
Authors: Alejandro Mata Ali, Iñigo Perez Delgado, Marina Ristol Roura, Aitor Moreno Fdez. de Leceta
Abstract
We present a novel method for initializing layers of tensorized neural networks in a way that avoids the explosion of the parameters of the matrix it emulates. The method is intended for layers with a high number of nodes in which there is a connection to the input or output of all or most of the nodes. The core of this method is the use of the Frobenius norm of this layer in an iterative partial form, so that it has to be finite and within a certain range. This norm is efficient to compute, fully or partially for most cases of interest. We apply the method to different layers and check its performance. We create a Python function to run it on an arbitrary layer, available in a Jupyter Notebook in the i3BQuantum repository: https://github.com/i3BQuantumTeam/Q4Real/blob/e07c827651ef16bcf74590ab965ea3985143f891/Quantum-Inspired%20Variational%20Methods/Normalization_process.ipynb
Do Generative Large Language Models need billions of parameters?
Authors: Sia Gholami, Marwan Omar
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
This paper presents novel systems and methodologies for the development of efficient large language models (LLMs). It explores the trade-offs between model size, performance, and computational resources, with the aim of maximizing the efficiency of these AI systems. The research explores novel methods that allow different parts of the model to share parameters, reducing the total number of unique parameters required. This approach ensures that the model remains compact without sacrificing its ability to learn and represent complex language structures. This study provides valuable insights and tools for creating more efficient and effective LLMs, contributing to a more sustainable and accessible future for AI language modeling.
A Reinforcement Learning Approach for Robotic Unloading from Visual Observations
Authors: Vittorio Giammarino, Alberto Giammarino, Matthew Pearce
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
In this work, we focus on a robotic unloading problem from visual observations, where robots are required to autonomously unload stacks of parcels using RGB-D images as their primary input source. While supervised and imitation learning have accomplished good results in these types of tasks, they heavily rely on labeled data, which are challenging to obtain in realistic scenarios. Our study aims to develop a sample efficient controller framework that can learn unloading tasks without the need for labeled data during the learning process. To tackle this challenge, we propose a hierarchical controller structure that combines a high-level decision-making module with classical motion control. The high-level module is trained using Deep Reinforcement Learning (DRL), wherein we incorporate a safety bias mechanism and design a reward function tailored to this task. Our experiments demonstrate that both these elements play a crucial role in achieving improved learning performance. Furthermore, to ensure reproducibility and establish a benchmark for future research, we provide free access to our code and simulation.
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity
Authors: Matteo Grimaldi, Darshan C. Ganji, Ivan Lazarevich, Sudhakar Sah
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. It is known that unstructured sparsity results in lower accuracy degradation with respect to structured sparsity but the former needs extensive inference engine changes to get latency benefits. To tackle this challenge, we propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. To attain high speedup levels at inference time, we design a sparse training procedure with awareness of the final position of the activations while computing the General Matrix Multiplication (GEMM). We extensively evaluate the proposed solution across various models for image classification and object detection tasks. Remarkably, our approach yields a speed improvement of $1.25 \times$ with a minimal accuracy drop of $1.1\%$ for the ResNet18 model on the ImageNet dataset. Furthermore, when combined with a state-of-the-art structured pruning method, the resulting models provide a good latency-accuracy trade-off, outperforming models that solely employ structured pruning techniques.
Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for Adaptive Learning
Authors: Atticus Beachy (1), Harok Bae (1), Jose Camberos (2), Ramana Grandhi (2) ((1) Wright State University, Dayton, OH, USA (2) Air Force Institute of Technology, Wright-Patterson AFB, OH, USA)
Abstract
Emulator embedded neural networks, which are a type of physics informed neural network, leverage multi-fidelity data sources for efficient design exploration of aerospace engineering systems. Multiple realizations of the neural network models are trained with different random initializations. The ensemble of model realizations is used to assess epistemic modeling uncertainty caused due to lack of training samples. This uncertainty estimation is crucial information for successful goal-oriented adaptive learning in an aerospace system design exploration. However, the costs of training the ensemble models often become prohibitive and pose a computational challenge, especially when the models are not trained in parallel during adaptive learning. In this work, a new type of emulator embedded neural network is presented using the rapid neural network paradigm. Unlike the conventional neural network training that optimizes the weights and biases of all the network layers by using gradient-based backpropagation, rapid neural network training adjusts only the last layer connection weights by applying a linear regression technique. It is found that the proposed emulator embedded neural network trains near-instantaneously, typically without loss of prediction accuracy. The proposed method is demonstrated on multiple analytical examples, as well as an aerospace flight parameter study of a generic hypersonic vehicle.
The Relational Bottleneck as an Inductive Bias for Efficient Abstraction
Authors: Taylor W. Webb, Steven M. Frankland, Awni Altabaa, Kamesh Krishnamurthy, Declan Campbell, Jacob Russin, Randall O'Reilly, John Lafferty, Jonathan D. Cohen
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract
A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This effort has often been framed in terms of a dichotomy between empiricist and nativist approaches, most recently embodied by debates concerning deep neural networks and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain.
MCQUIC: Multicast and unicast in a single transport protocol
Authors: Louis Navarre, Olivier Pereira, Olivier Bonaventure
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Multicast enables efficient one-to-many communications. Several applications benefit from its scalability properties, e.g., live-streaming and large-scale software updates. Historically, multicast applications have used specialized transport protocols. The flexibility of the recently standardized QUIC protocol opens the possibility of providing both unicast and multicast services to applications with a single transport protocol. We present MCQUIC, an extended version of the QUIC protocol that supports multicast communications. We show how QUIC features and built-in security can be leveraged for multicast transport. We present the design of MCQUIC and implement it in Cloudflare quiche. We assess its performance through benchmarks and in emulated networks under realistic scenarios. We also demonstrate MCQUIC in a campus network. By coupling QUIC with our multicast extension, applications can rely on multicast for efficiency with the possibility to fall back on unicast in case of incompatible network conditions.
Collaborative Dynamic 3D Scene Graphs for Automated Driving
Authors: Elias Greve, Martin Büchner, Niclas Vödisch, Wolfram Burgard, Abhinav Valada
Abstract
Maps have played an indispensable role in enabling safe and automated driving. Although there have been many advances on different fronts ranging from SLAM to semantics, building an actionable hierarchical semantic representation of urban dynamic scenes from multiple agents is still a challenging problem. In this work, we present collaborative urban scene graphs (CURB-SG) that enable higher-order reasoning and efficient querying for many functions of automated driving. CURB-SG leverages panoptic LiDAR data from multiple agents to build large-scale maps using an effective graph-based collaborative SLAM approach that detects inter-agent loop closures. To semantically decompose the obtained 3D map, we build a lane graph from the paths of ego agents and their panoptic observations of other vehicles. Based on the connectivity of the lane graph, we segregate the environment into intersecting and non-intersecting road areas. Subsequently, we construct a multi-layered scene graph that includes lane information, the position of static landmarks and their assignment to certain map sections, other vehicles observed by the ego agents, and the pose graph from SLAM including 3D panoptic point clouds. We extensively evaluate CURB-SG in urban scenarios using a photorealistic simulator and release our code at this http URL
ConR: Contrastive Regularizer for Deep Imbalanced Regression
Authors: Mahsa Keramati, Lili Meng, R. David Evans
Abstract
Imbalanced distributions are ubiquitous in real-world data. They create constraints on Deep Neural Networks to represent the minority labels and avoid bias towards majority labels. The extensive body of imbalanced approaches address categorical label spaces but fail to effectively extend to regression problems where the label space is continuous. Conversely, local and global correlations among continuous labels provide valuable insights towards effectively modelling relationships in feature space. In this work, we propose ConR, a contrastive regularizer that models global and local label similarities in feature space and prevents the features of minority samples from being collapsed into their majority neighbours. Serving the similarities of the predictions as an indicator of feature similarities, ConR discerns the dissagreements between the label space and feature space and imposes a penalty on these disagreements. ConR minds the continuous nature of label space with two main strategies in a contrastive manner: incorrect proximities are penalized proportionate to the label similarities and the correct ones are encouraged to model local similarities. ConR consolidates essential considerations into a generic, easy-to-integrate, and efficient method that effectively addresses deep imbalanced regression. Moreover, ConR is orthogonal to existing approaches and smoothly extends to uni- and multi-dimensional label spaces. Our comprehensive experiments show that ConR significantly boosts the performance of all the state-of-the-art methods on three large-scale deep imbalanced regression benchmarks. Our code is publicly available in https://github.com/BorealisAI/ConR.
Generalizable Neural Fields as Partially Observed Neural Processes
Abstract
Neural fields, which represent signals as a function parameterized by a neural network, are a promising alternative to traditional discrete vector or grid-based representations. Compared to discrete representations, neural representations both scale well with increasing resolution, are continuous, and can be many-times differentiable. However, given a dataset of signals that we would like to represent, having to optimize a separate neural field for each signal is inefficient, and cannot capitalize on shared information or structures among signals. Existing generalization methods view this as a meta-learning problem and employ gradient-based meta-learning to learn an initialization which is then fine-tuned with test-time optimization, or learn hypernetworks to produce the weights of a neural field. We instead propose a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework, and leverage neural process algorithms to solve this task. We demonstrate that this approach outperforms both state-of-the-art gradient-based meta-learning approaches and hypernetwork approaches.
A fixed-parameter tractable algorithm for combinatorial filter reduction
Abstract
What is the minimal information that a robot must retain to achieve its task? To design economical robots, the literature dealing with reduction of combinatorial filters approaches this problem algorithmically.As lossless state compression is NP-hard, prior work has examined, along with minimization algorithms, a variety of special cases in which specific properties enable efficient solution. Complementing those findings, this paper refines the present understanding from the perspective of parameterized complexity. We give a fixed-parameter tractable algorithm for the general reduction problem by exploiting a transformation into minimal clique covering. The transformation introduces new constraints that arise from sequential dependencies encoded within the input filter -- some of these constraints can be repaired, others are treated through enumeration. Through this approach, we identify parameters affecting filter reduction that are based upon inter-constraint couplings (expressed as a notion of their height and width), which add to the structural parameters present in the unconstrained problem of minimal clique covering.
Scalable Scheduling for Industrial Time-Sensitive Networking: A Hyper-flow Graph Based Scheme
Abstract
Industrial Time-Sensitive Networking (TSN) provides deterministic mechanisms for real-time and reliable flow transmission. Increasing attention has been paid to efficient scheduling for time-sensitive flows with stringent requirements such as ultra-low latency and jitter. In TSN, the fine-grained traffic shaping protocol, cyclic queuing and forwarding (CQF), eliminates uncertain delay and frame loss by cyclic traffic forwarding and queuing. However, it inevitably causes high scheduling complexity. Moreover, complexity is quite sensitive to flow attributes and network scale. The problem stems in part from the lack of an attribute mining mechanism in existing frame-based scheduling. For time-critical industrial networks with large-scale complex flows, a so-called hyper-flow graph based scheduling scheme is proposed to improve the scheduling scalability in terms of schedulability, scheduling efficiency and latency & jitter. The hyper-flow graph is built by aggregating similar flow sets as hyper-flow nodes and designing a hierarchical scheduling framework. The flow attribute-sensitive scheduling information is embedded into the condensed maximal cliques, and reverse maps them precisely to congestion flow portions for re-scheduling. Its parallel scheduling reduces network scale induced complexity. Further, this scheme is designed in its entirety as a comprehensive scheduling algorithm GH^2. It improves the three criteria of scalability along a Pareto front. Extensive simulation studies demonstrate its superiority. Notably, GH^2 is verified its scheduling stability with a runtime of less than 100 ms for 1000 flows and near 1/430 of the SOTA FITS method for 2000 flows.
Abstract
Recent work in vision-and-language demonstrates that large-scale pretraining can learn generalizable models that are efficiently transferable to downstream tasks. While this may improve dataset-scale aggregate metrics, analyzing performance around hand-crafted subgroups targeting specific bias dimensions reveals systemic undesirable behaviors. However, this subgroup analysis is frequently stalled by annotation efforts, which require extensive time and resources to collect the necessary data. Prior art attempts to automatically discover subgroups to circumvent these constraints but typically leverages model behavior on existing task-specific annotations and rapidly degrades on more complex inputs beyond "tabular" data, none of which study vision-and-language models. This paper presents VLSlice, an interactive system enabling user-guided discovery of coherent representation-level subgroups with consistent visiolinguistic behavior, denoted as vision-and-language slices, from unlabeled image sets. We show that VLSlice enables users to quickly generate diverse high-coherency slices in a user study (n=22) and release the tool publicly.
Dynamic Spectrum Mixer for Visual Recognition
Authors: Zhiqiang Hu, Tao Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Recently, MLP-based vision backbones have achieved promising performance in several visual recognition tasks. However, the existing MLP-based methods directly aggregate tokens with static weights, leaving the adaptability to different images untouched. Moreover, Recent research demonstrates that MLP-Transformer is great at creating long-range dependencies but ineffective at catching high frequencies that primarily transmit local information, which prevents it from applying to the downstream dense prediction tasks, such as semantic segmentation. To address these challenges, we propose a content-adaptive yet computationally efficient structure, dubbed Dynamic Spectrum Mixer (DSM). The DSM represents token interactions in the frequency domain by employing the Discrete Cosine Transform, which can learn long-term spatial dependencies with log-linear complexity. Furthermore, a dynamic spectrum weight generation layer is proposed as the spectrum bands selector, which could emphasize the informative frequency bands while diminishing others. To this end, the technique can efficiently learn detailed features from visual input that contains both high- and low-frequency information. Extensive experiments show that DSM is a powerful and adaptable backbone for a range of visual recognition tasks. Particularly, DSM outperforms previous transformer-based and MLP-based models, on image classification, object detection, and semantic segmentation tasks, such as 83.8 \% top-1 accuracy on ImageNet, and 49.9 \% mIoU on ADE20K.
Scaled Prompt-Tuning for Few-Shot Natural Language Generation
Abstract
The increasingly Large Language Models (LLMs) demonstrate stronger language understanding and generation capabilities, while the memory demand and computation cost of fine-tuning LLMs on downstream tasks are non-negligible. Besides, fine-tuning generally requires a certain amount of data from individual tasks whilst data collection cost is another issue to consider in real-world applications. In this work, we focus on Parameter-Efficient Fine-Tuning (PEFT) methods for few-shot Natural Language Generation (NLG), which freeze most parameters in LLMs and tune a small subset of parameters in few-shot cases so that memory footprint, training cost, and labeling cost are reduced while maintaining or even improving the performance. We propose a Scaled Prompt-Tuning (SPT) method which surpasses conventional PT with better performance and generalization ability but without an obvious increase in training cost. Further study on intermediate SPT suggests the superior transferability of SPT in few-shot scenarios, providing a recipe for data-deficient and computation-limited circumstances. Moreover, a comprehensive comparison of existing PEFT methods reveals that certain approaches exhibiting decent performance with modest training cost such as Prefix-Tuning in prior study could struggle in few-shot NLG tasks, especially on challenging datasets.
Hierarchical Time-Optimal Planning for Multi-Vehicle Racing
Authors: Georg Jank, Matthias Rowold, Boris Lohmann
Abstract
This paper presents a hierarchical planning algorithm for racing with multiple opponents. The two-stage approach consists of a high-level behavioral planning step and a low-level optimization step. By combining discrete and continuous planning methods, our algorithm encourages global time optimality without being limited by coarse discretization. In the behavioral planning step, the fastest behavior is determined with a low-resolution spatio-temporal visibility graph. Based on the selected behavior, we calculate maneuver envelopes that are subsequently applied as constraints in a time-optimal control problem. The performance of our method is comparable to a parallel approach that selects the fastest trajectory from multiple optimizations with different behavior classes. However, our algorithm can be executed on a single core. This significantly reduces computational requirements, especially when multiple opponents are involved. Therefore, the proposed method is an efficient and practical solution for real-time multi-vehicle racing scenarios.
Reliability-Latency-Rate Tradeoff in Low-Latency Communications with Finite-Blocklength Coding
Authors: Lintao Li, Wei Chen, Petar Popovski, Khaled B. Letaief
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Low-latency communication plays an increasingly important role in delay-sensitive applications by ensuring the real-time exchange of information. However, due to the constraints on the maximum instantaneous power, bounded latency is hard to be guaranteed. In this paper, we investigate the reliability-latency-rate tradeoff in low-latency communications with finite-blocklength coding (FBC). More specifically, we are interested in the fundamental tradeoff between error probability, delay-violation probability (DVP), and service rate. Based on the effective capacity (EC) and normal approximation, we present several gain-conservation inequalities to bound the reliability-latency-rate tradeoffs. In particular, we investigate the low-latency transmissions over an additive white Gaussian noise (AWGN) channel, over a Rayleigh fading channel, with frequency or spatial diversity, and over a Nakagami-$m$ fading channel. To analytically evaluate the quality-of-service-constrained low-latency communications with FBC, an EC-approximation method is further conceived to derive the closed-form expression of quality-of-service-constrained throughput. For delay-sensitive transmissions in which the latency threshold is greater than the channel coherence time, we find an asymptotic form of the tradeoff between the error probability and DVP over the AWGN and Rayleigh fading channels. Our results may provide some insights into the efficient scheduling of low-latency wireless communications in which statistical latency and reliability metrics are adopted.
OrdinalFix: Fixing Compilation Errors via Shortest-Path CFL Reachability
Abstract
The development of correct and efficient software can be hindered by compilation errors, which must be fixed to ensure the code's syntactic correctness and program language constraints. Neural network-based approaches have been used to tackle this problem, but they lack guarantees of output correctness and can require an unlimited number of modifications. Fixing compilation errors within a given number of modifications is a challenging task. We demonstrate that finding the minimum number of modifications to fix a compilation error is NP-hard. To address compilation error fixing problem, we propose OrdinalFix, a complete algorithm based on shortest-path CFL (context-free language) reachability with attribute checking that is guaranteed to output a program with the minimum number of modifications required. Specifically, OrdinalFix searches possible fixes from the smallest to the largest number of modifications. By incorporating merged attribute checking to enhance efficiency, the time complexity of OrdinalFix is acceptable for application. We evaluate OrdinalFix on two datasets and demonstrate its ability to fix compilation errors within reasonable time limit. Comparing with existing approaches, OrdinalFix achieves a success rate of 83.5%, surpassing all existing approaches (71.7%).
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Authors: Zhichao Wu, Qiulin Li, Sixing Liu, Qun Yang
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete space significantly lowers the computational consumption of the diffusion model and improves sampling speed; 2) The contrastive learning method based on discrete space is used to enhance the alignment connection between speech and text and improve sampling quality; and 3) It uses an efficient text encoder to simplify the model's parameters and increase computational efficiency. The experimental results demonstrate that the approach proposed in this paper has outstanding speech synthesis quality and sampling speed while significantly reducing the resource consumption of diffusion model. The synthesized samples are available at https://github.com/lawtherWu/DCTTS.
Dynamic NeRFs for Soccer Scenes
Authors: Sacha Lewin, Maxime Vandegar, Thomas Hoyoux, Olivier Barnich, Gilles Louppe
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The long-standing problem of novel view synthesis has many applications, notably in sports broadcasting. Photorealistic novel view synthesis of soccer actions, in particular, is of enormous interest to the broadcast industry. Yet only a few industrial solutions have been proposed, and even fewer that achieve near-broadcast quality of the synthetic replays. Except for their setup of multiple static cameras around the playfield, the best proprietary systems disclose close to no information about their inner workings. Leveraging multiple static cameras for such a task indeed presents a challenge rarely tackled in the literature, for a lack of public datasets: the reconstruction of a large-scale, mostly static environment, with small, fast-moving elements. Recently, the emergence of neural radiance fields has induced stunning progress in many novel view synthesis applications, leveraging deep learning principles to produce photorealistic results in the most challenging settings. In this work, we investigate the feasibility of basing a solution to the task on dynamic NeRFs, i.e., neural models purposed to reconstruct general dynamic content. We compose synthetic soccer environments and conduct multiple experiments using them, identifying key components that help reconstruct soccer scenes with dynamic NeRFs. We show that, although this approach cannot fully meet the quality requirements for the target application, it suggests promising avenues toward a cost-efficient, automatic solution. We also make our work dataset and code publicly available, with the goal to encourage further efforts from the research community on the task of novel view synthesis for dynamic soccer scenes. For code, data, and video results, please see https://soccernerfs.isach.be.
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Authors: Qianyu Long, Christos Anagnostopoulos, Shameem Puthiya Parambath, Daning Bi
Abstract
Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs). However, DNNs are characterized by an extremely large number of parameters, thus, yielding significant challenges in exchanging these parameters among distributed nodes and managing the memory. Although recent DNN compression methods (e.g., sparsification, pruning) tackle such challenges, they do not holistically consider an adaptively controlled reduction of parameter exchange while maintaining high accuracy levels. We, therefore, contribute with a novel FL framework (coined FedDIP), which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange, which contributes to significant performance improvement, with (ii) incremental regularization that can achieve \textit{extreme} sparsity of models. We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods using benchmark data sets and DNN models. Our results showcase that FedDIP not only controls the model sparsity but efficiently achieves similar or better performance compared to other model pruning methods adopting incremental regularization during distributed model training. The code is available at: https://github.com/EricLoong/feddip.
Bounds and Constructions for Generalized Batch Codes
Abstract
Private information retrieval (PIR) codes and batch codes are two important types of codes that are designed for coded distributed storage systems and private information retrieval protocols. These codes have been the focus of much attention in recent years, as they enable efficient and secure storage and retrieval of data in distributed systems. In this paper, we introduce a new class of codes called \emph{$(s,t)$-batch codes}. These codes are a type of storage codes that can handle any multi-set of $t$ requests, comprised of $s$ distinct information symbols. Importantly, PIR codes and batch codes are special cases of $(s,t)$-batch codes. The main goal of this paper is to explore the relationship between the number of redundancy symbols and the $(s,t)$-batch code property. Specifically, we establish a lower bound on the number of redundancy symbols required and present several constructions of $(s,t)$-batch codes. Furthermore, we extend this property to the case where each request is a linear combination of information symbols, which we refer to as \emph{functional $(s,t)$-batch codes}. Specifically, we demonstrate that simplex codes are asymptotically optimal functional $(s,t)$-batch codes, in terms of the number of redundancy symbols required, under certain parameter regime.
Comparative Analysis of Contextual Relation Extraction based on Deep Learning Models
Abstract
Contextual Relation Extraction (CRE) is mainly used for constructing a knowledge graph with a help of ontology. It performs various tasks such as semantic search, query answering, and textual entailment. Relation extraction identifies the entities from raw texts and the relations among them. An efficient and accurate CRE system is essential for creating domain knowledge in the biomedical industry. Existing Machine Learning and Natural Language Processing (NLP) techniques are not suitable to predict complex relations from sentences that consist of more than two relations and unspecified entities efficiently. In this work, deep learning techniques have been used to identify the appropriate semantic relation based on the context from multiple sentences. Even though various machine learning models have been used for relation extraction, they provide better results only for binary relations, i.e., relations occurred exactly between the two entities in a sentence. Machine learning models are not suited for complex sentences that consist of the words that have various meanings. To address these issues, hybrid deep learning models have been used to extract the relations from complex sentence effectively. This paper explores the analysis of various deep learning models that are used for relation extraction.
Time-Optimal Gate-Traversing Planner for Autonomous Drone Racing
Authors: Chao Qin, Maxime S.J. Michet, Jingxiang Chen, Hugh H.-T. Liu
Abstract
Time-minimum trajectories through race tracks are determined by the drone's capability as well as the configuration of all gates (e.g., their shapes, sizes, and orientations). However, prior works neglect the impact of the gate configuration and formulate drone racing as a waypoint flight task, leading to conservative waypoint selection through each gate. We present a novel time-optimal planner that can account for gate constraints explicitly, enabling quadrotors to follow the most time-efficient waypoints at their single-rotor-thrust limits in tracks with hybrid gate types. Our approach provides comparable solution quality to the state-of-the-art but with a computation time orders of magnitude faster. Furthermore, the proposed framework allows users to customize gate constraints such as tunnels by concatenating existing gate classes, enabling high-fidelity race track modeling. Owing to the superior computation efficiency and flexibility, we can generate optimal racing trajectories for complex race tracks with tens or even hundreds of gates with distinct shapes. We validate our method in real-world flights and demonstrate that faster lap times can be produced by using gate constraints instead of waypoint constraints.
Gpachov at CheckThat! 2023: A Diverse Multi-Approach Ensemble for Subjectivity Detection in News Articles
Authors: Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Abstract
The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task~2 on subjectivity detection. Three different research directions are explored. The first one is based on fine-tuning a sentence embeddings encoder model and dimensionality reduction. The second one explores a sample-efficient few-shot learning model. The third one evaluates fine-tuning a multilingual transformer on an altered dataset, using data from multiple languages. Finally, the three approaches are combined in a simple majority voting ensemble, resulting in 0.77 macro F1 on the test set and achieving 2nd place on the English subtask.
A Wearable Ultra-Low-Power sEMG-Triggered Ultrasound System for Long-Term Muscle Activity Monitoring
Authors: Sebastian Frey, Victor Kartsch, Christoph Leitner, Andrea Cossettini, Sergei Vostrikov, Simone Benatti, Luca Benini
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
Abstract
Wearable biosignal processing applications are driving significant progress toward miniaturized, energy-efficient Internet-of-Things solutions for both clinical and consumer applications. However, scaling toward high-density multi-channel front-ends is only feasible by performing data processing and \ac{ML} near-sensor through energy-efficient edge processing. To tackle these challenges, we introduce BioGAP, a novel, compact, modular, and lightweight (6g) medical-grade biosignal acquisition and processing platform powered by GAP9, a ten-core ultra-low-power SoC designed for efficient multi-precision (from FP to aggressively quantized integer) processing, as required for advanced ML and DSP. BioGAP's form factor is 16x21x14~mm$^3$ and comprises two stacked PCBs: a baseboard integrating the GAP9 SoC, a wireless \ac{BLE} capable SoC, a power management circuit, and an accelerometer; and a shield including an \ac{AFE} for ExG acquisition. Finally, the system also includes a flexibly placeable \ac{PPG} PCB with a size of 9x7x3~mm$^3$ and a rechargeable battery ($\phi$ 12x5~mm$^2$). We demonstrate BioGAP on a \ac{SSVEP}-based \ac{BCI} application. We achieve 3.6~$\mu J/sample$ in streaming and 2.2~$\mu J/sample$ in onboard processing mode, thanks to an efficiency on the FFT computation task of 16.7~Mflops/s/mW with wireless bandwidth reduction of 97\%, within a power budget of just 18.2~mW allowing for an operation time of 15~h.
Optimal information in Bayesian routing games
Authors: Leonardo Cianfanelli, Alexia Ambrogio, Giacomo Como
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
We study optimal information provision in transportation networks when users are strategic and the network state is uncertain. An omniscient planner observes the network state and discloses information to the users with the goal of minimizing the expected travel time at the user equilibrium. Public signal policies, including full-information disclosure, are known to be inefficient in achieving optimality. For this reason, we focus on private signals and restrict without loss of generality the analysis to signals that coincide with path recommendations that satisfy obedience constraints, namely users have no incentive in deviating from the received recommendation according to their posterior belief. We first formulate the general problem and analyze its properties for arbitrary network topologies and delay functions. Then, we consider the case of two parallel links with affine delay functions, and provide sufficient conditions under which optimality can be achieved by information design. Interestingly, we observe that the system benefits from uncertainty, namely it is easier for the planner to achieve optimality when the variance of the uncertain parameters is large. We then provide an example where optimality can be achieved even if the sufficient conditions for optimality are not met.
Manufacturing Quality Control with Autoencoder-Based Defect Localization and Unsupervised Class Selection
Authors: Devang Mehta, Noah Klarmann
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Manufacturing industries require efficient and voluminous production of high-quality finished goods. In the context of Industry 4.0, visual anomaly detection poses an optimistic solution for automatically controlling product quality with high precision. Automation based on computer vision poses a promising solution to prevent bottlenecks at the product quality checkpoint. We considered recent advancements in machine learning to improve visual defect localization, but challenges persist in obtaining a balanced feature set and database of the wide variety of defects occurring in the production line. This paper proposes a defect localizing autoencoder with unsupervised class selection by clustering with k-means the features extracted from a pre-trained VGG-16 network. The selected classes of defects are augmented with natural wild textures to simulate artificial defects. The study demonstrates the effectiveness of the defect localizing autoencoder with unsupervised class selection for improving defect detection in manufacturing industries. The proposed methodology shows promising results with precise and accurate localization of quality defects on melamine-faced boards for the furniture industry. Incorporating artificial defects into the training data shows significant potential for practical implementation in real-world quality control scenarios.
CCSPNet-Joint: Efficient Joint Training Method for Traffic Sihn Detection Under Extreme Conditions
Authors: Haoqin Hong, Yue Zhou, Xiangyu Shu, Xiangfang Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Traffic sign detection is an important research direction in intelligent driving. Unfortunately, existing methods often overlook extreme conditions such as fog, rain, and motion blur. Moreover, the end-to-end training strategy for image denoising and object detection models fails to utilize inter-model information effectively. To address these issues, we propose CCSPNet, an efficient feature extraction module based on Transformers and CNNs, which effectively leverages contextual information, achieves faster inference speed and provides stronger feature enhancement capabilities. Furthermore, we establish the correlation between object detection and image denoising tasks and propose a joint training model, CCSPNet-Joint, to improve data efficiency and generalization. Finally, to validate our approach, we create the CCTSDB-AUG dataset for traffic sign detection in extreme scenarios. Extensive experiments have shown that CCSPNet achieves state-of-the-art performance in traffic sign detection under extreme conditions. Compared to end-to-end methods, CCSPNet-Joint achieves a 5.32% improvement in precision and an 18.09% improvement in mAP@.5.
Continual Learning with Dirichlet Generative-based Rehearsal
Authors: Min Zeng, Wei Xue, Qifeng Liu, Yike Guo
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Recent advancements in data-driven task-oriented dialogue systems (ToDs) struggle with incremental learning due to computational constraints and time-consuming issues. Continual Learning (CL) attempts to solve this by avoiding intensive pre-training, but it faces the problem of catastrophic forgetting (CF). While generative-based rehearsal CL methods have made significant strides, generating pseudo samples that accurately reflect the underlying task-specific distribution is still a challenge. In this paper, we present Dirichlet Continual Learning (DCL), a novel generative-based rehearsal strategy for CL. Unlike the traditionally used Gaussian latent variable in the Conditional Variational Autoencoder (CVAE), DCL leverages the flexibility and versatility of the Dirichlet distribution to model the latent prior variable. This enables it to efficiently capture sentence-level features of previous tasks and effectively guide the generation of pseudo samples. In addition, we introduce Jensen-Shannon Knowledge Distillation (JSKD), a robust logit-based knowledge distillation method that enhances knowledge transfer during pseudo sample generation. Our experiments confirm the efficacy of our approach in both intent detection and slot-filling tasks, outperforming state-of-the-art methods.
Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning
Authors: Sanghyeon Kim, Hyunmo Yang, Younghyun Kim, Youngjoon Hong, Eunbyung Park
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The recent surge in large-scale foundation models has spurred the development of efficient methods for adapting these models to various downstream tasks. Low-rank adaptation methods, such as LoRA, have gained significant attention due to their outstanding parameter efficiency and no additional inference latency. This paper investigates a more general form of adapter module based on the analysis that parallel and sequential adaptation branches learn novel and general features during fine-tuning, respectively. The proposed method, named Hydra, due to its multi-head computational branches, combines parallel and sequential branch to integrate capabilities, which is more expressive than existing single branch methods and enables the exploration of a broader range of optimal points in the fine-tuning process. In addition, the proposed adaptation method explicitly leverages the pre-trained weights by performing a linear combination of the pre-trained features. It allows the learned features to have better generalization performance across diverse downstream tasks. Furthermore, we perform a comprehensive analysis of the characteristics of each adaptation branch with empirical evidence. Through an extensive range of experiments, encompassing comparisons and ablation studies, we substantiate the efficiency and demonstrate the superior performance of Hydra. This comprehensive evaluation underscores the potential impact and effectiveness of Hydra in a variety of applications. Our code is available on \url{https://github.com/extremebird/Hydra}
Regular Representations of Uniform TC^0
Authors: Lauri Hella, Juha Kontinen, Kerkko Luosto
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)
Abstract
The circuit complexity class DLOGTIME-uniform AC^0 is known to be a modest subclass of DLOGTIME-uniform TC^0. The weakness of AC^0 is caused by the fact that AC^0 is not closed under restricting AC^0-computable queries into simple subsequences of the input. Analogously, in descriptive complexity, the logics corresponding to DLOGTIME-uniform AC^0 do not have the relativization property and hence they are not regular. This weakness of DLOGTIME-uniform AC^0 has been elaborated in the line of research on the Crane Beach Conjecture. The conjecture (which was refuted by Barrington, Immerman, Lautemann, Schweikardt and Th{\'e}rien) was that if a language L has a neutral letter, then L can be defined in first-order logic with the collection of all numerical built-in relations, if and only if L can be already defined in FO with order. In the first part of this article we consider logics in the range of AC^0 and TC^0. First we formulate a combinatorial criterion for a cardinality quantifier C_S implying that all languages in DLOGTIME-uniform TC^0 can be defined in FO(C_S). For instance, this criterion is satisfied by C_S if S is the range of some polynomial with positive integer coefficients of degree at least two. In the second part of the paper we first adapt the key properties of abstract logics to accommodate built-in relations. Then we define the regular interior R-int(L) and regular closure R-cl(L), of a logic L, and show that the Crane Beach Conjecture can be interpreted as a statement concerning the regular interior of first-order logic with built-in relations B. We show that if B={+}, or B contains only unary relations besides the order, then R-int(FO_B) collapses to FO with order. In contrast, our results imply that if B contains the order and the range of a polynomial of degree at least two, then R-cl(FO_B) includes all languages in DLOGTIME-uniform TC^0.
Real-Time Motion Planning for In-Hand Manipulation with a Multi-Fingered Hand
Abstract
Dexterous manipulation of objects once held in hand remains a challenge. Such skills are, however, necessary for robotics to move beyond gripper-based manipulation and use all the dexterity offered by anthropomorphic robotic hands. One major challenge when manipulating an object within the hand is that fingers must move around the object while avoiding collision with other fingers or the object. Such collision-free paths must be computed in real-time, as the smallest deviation from the original plan can easily lead to collisions. We present a real-time approach to computing collision-free paths in a high-dimensional space. To guide the exploration, we learn an explicit representation of the free space, retrievable in real-time. We further combine this representation with closed-loop control via dynamical systems and sampling-based motion planning and show that the combination increases performance compared to alternatives, offering efficient search of feasible paths and real-time obstacle avoidance in a multi-fingered robotic hand.
Harvesting Brownian Motion: Zero Energy Computational Sampling
Authors: David Doty, Niels Kornerup, Austin Luchsinger, Leo Orshansky, David Soloveichik, Damien Woods
Subjects: Data Structures and Algorithms (cs.DS); Emerging Technologies (cs.ET)
Abstract
The key factor currently limiting the advancement of computational power of electronic computation is no longer the manufacturing density and speed of components, but rather their high energy consumption. While it has been widely argued that reversible computation can escape the fundamental Landauer limit of $k_B T\ln(2)$ Joules per irreversible computational step, there is disagreement around whether indefinitely reusable computation can be achieved without energy dissipation. Here we focus on the relatively simpler context of sampling problems, which take no input, so avoids modeling the energy costs of the observer perturbing the machine to change its input. Given an algorithm $A$ for generating samples from a distribution, we desire a device that can perpetually generate samples from that distribution driven entirely by Brownian motion. We show that such a device can efficiently execute algorithm $A$ in the sense that we must wait only $O(\text{time}(A)^2)$ between samples. We consider two output models: Las Vegas, which samples from the exact probability distribution every $4$ tries in expectation, and Monte Carlo, in which every try succeeds but the distribution is only approximated. We base our model on continuous-time random walks over the state space graph of a general computational machine, with a space-bounded Turing machine as one instantiation. The problem of sampling a computationally complex probability distribution with no energy dissipation informs our understanding of the energy requirements of computation, and may lead to more energy efficient randomized algorithms.
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection
Abstract
In this paper, we propose PhantomSound, a query-efficient black-box attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with >95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely ~300 queries (~5 minutes) and ~1,500 queries (~25 minutes), respectively.
Towards Reliable Dermatology Evaluation Benchmarks
Authors: Fabian Gröger, Simone Lionetti, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Matthew Groh, Roxana Daneshjou, Labelling Consortium, Alexander A. Navarini, Marc Pouly
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.
Auto-Regressive Next-Token Predictors are Universal Learners
Authors: Eran Malach
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
Large language models display remarkable capabilities in logical and mathematical reasoning, allowing them to solve complex tasks. Interestingly, these abilities emerge in networks trained on the simple task of next-token prediction. In this work, we present a theoretical framework for studying auto-regressive next-token predictors. We demonstrate that even simple models such as linear next-token predictors, trained on Chain-of-Thought (CoT) data, can approximate any function efficiently computed by a Turing machine. We introduce a new complexity measure -- length complexity -- which measures the number of intermediate tokens in a CoT sequence required to approximate some target function, and analyze the interplay between length complexity and other notions of complexity. Finally, we show experimentally that simple next-token predictors, such as linear networks and shallow Multi-Layer Perceptrons (MLPs), display non-trivial performance on text generation and arithmetic tasks. Our results demonstrate that the power of language models can be attributed, to a great extent, to the auto-regressive next-token training scheme, and not necessarily to a particular choice of architecture.
Finding Morton-Like Layouts for Multi-Dimensional Arrays Using Evolutionary Algorithms
Authors: Stephen Nicholas Swatman, Ana-Lucia Varbanescu, Andy D. Pimentel, Andreas Salzburger, Attila Krasznahorkay
Subjects: Neural and Evolutionary Computing (cs.NE); Performance (cs.PF)
Abstract
The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. Common multi-dimensional layouts include the canonical row-major and column-major layouts as well as the Morton curve layout. In this paper, we describe how the Morton layout can be generalized to a very large family of multi-dimensional data layouts with widely varying performance characteristics. We posit that this design space can be efficiently explored using a combinatorial evolutionary methodology based on genetic algorithms. To this end, we propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts using cache simulation. We show that our fitness function correlates to kernel running time in real hardware, and that our evolutionary strategy allows us to find candidates with favorable simulated cache properties in four out of the eight real-world applications under consideration in a small number of generations. Finally, we demonstrate that the array layouts found using our evolutionary method perform well not only in simulated environments but that they can effect significant performance gains -- up to a factor ten in extreme cases -- in real hardware.
Abstract
This paper serves as a foundational step towards the development of a linguistically motivated and technically relevant evaluation suite for Greek NLP. We initiate this endeavor by introducing four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation (through example comparison or sense selection) and metaphor detection. More than language-adapted replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community. Firstly, our inference dataset is the first of its kind, marking not just \textit{one}, but rather \textit{all} possible inference labels, accounting for possible shifts due to e.g. ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. Using ChatGPT as a language-neutral parser, we transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections. Alongside each task, we conduct experiments using currently available state of the art machinery. Our experimental baselines affirm the challenging nature of our tasks and highlight the need for expedited progress in order for the Greek NLP ecosystem to keep pace with contemporary mainstream research.
Asynchronous Collective Tree Exploration by Tree-Mining
Authors: Romain Cosson
Subjects: Data Structures and Algorithms (cs.DS); Multiagent Systems (cs.MA)
Abstract
We investigate the problem of collaborative tree exploration with complete communication introduced by [FGKP06], in which a group of $k$ agents is assigned to collectively go through all edges of an unknown tree in an efficient manner and then return to the origin. The agents have unrestricted communication and computation capabilities. The algorithm's runtime is typically compared to the cost of offline traversal, which is at least $\max{2n/k,2D}$ where $n$ is the number of nodes and $D$ is the tree depth. Since its introduction, two types of guarantee have emerged on the topic: the first is of the form $r(k)(n/k+D)$, where $r(k)$ is called the competitive ratio, and the other is of the form $2n/k+f(k,D)$, where $f(k,D)$ is called the competitive overhead. In this paper, we present the first algorithm with linear-in-$D$ competitive overhead, thereby reconciling both approaches. Specifically, our bound is in $2n/k + O(k^{\log_2 k} D)$ and thus leads to a competitive ratio in $O(k/\exp(0.8\sqrt{\ln k}))$. This is the first improvement over the $O(k/\ln k)$-competitive algorithm known since the introduction of the problem in 2004. Our algorithm is obtained for an asynchronous generalization of collective tree exploration (ACTE). It is an instance of a general class of locally-greedy exploration algorithms that we define. We show that the additive overhead analysis of locally-greedy algorithms can be seen through the lens of a 2-player game that we call the tree-mining game and that could be of independent interest.
Perfect Roman Domination and Unique Response Roman Domination
Abstract
The idea of enumeration algorithms with polynomial delay is to polynomially bound the running time between any two subsequent solutions output by the enumeration algorithm. While it is open for more than four decades if all minimal dominating sets of a graph can be enumerated in output-polynomial time, it has recently been proven that pointwise-minimal Roman dominating functions can be enumerated even with polynomial delay. The idea of the enumeration algorithm was to use polynomial-time solvable extension problems. We use this as a motivation to prove that also two variants of Roman dominating functions studied in the literature, named perfect and unique response, can be enumerated with polynomial delay. This is interesting since Extension Perfect Roman Domination is W[1]-complete if parameterized by the weight of the given function and even W[2]-complete if parameterized by the number vertices assigned 0 in the pre-solution, as we prove. Otherwise, efficient solvability of extension problems and enumerability with polynomial delay tend to go hand-in-hand. We achieve our enumeration result by constructing a bijection to Roman dominating functions, where the corresponding extension problem is polynomimaltime solvable. Furthermore, we show that Unique Response Roman Domination is solvable in polynomial time on split graphs, while Perfect Roman Domination is NP-complete on this graph class, which proves that both variations, albeit coming with a very similar definition, do differ in some complexity aspects. This way, we also solve an open problem from the literature.
Multi-Robot Informative Path Planning from Regression with Sparse Gaussian Processes
Abstract
This paper addresses multi-robot informative path planning (IPP) for environmental monitoring. The problem involves determining informative regions in the environment that should be visited by robots in order to gather the most information about the environment. We propose an efficient sparse Gaussian process-based approach that uses gradient descent to optimize paths in continuous environments. Our approach efficiently scales to both spatially and spatio-temporally correlated environments. Moreover, our approach can simultaneously optimize the informative paths while accounting for routing constraints, such as a distance budget and limits on the robot's velocity and acceleration. Our approach can be used for IPP with both discrete and continuous sensing robots, with point and non-point field-of-view sensing shapes, and for multi-robot IPP. The proposed approach is demonstrated to be fast and accurate on real-world data.
CLiFF-LHMP: Using Spatial Dynamics Patterns for Long-Term Human Motion Prediction
Authors: Yufei Zhu, Andrey Rudenko, Tomasz P. Kucner, Luigi Palmieri, Kai O. Arras, Achim J. Lilienthal, Martin Magnusson
Abstract
Human motion prediction is important for mobile service robots and intelligent vehicles to operate safely and smoothly around people. The more accurate predictions are, particularly over extended periods of time, the better a system can, e.g., assess collision risks and plan ahead. In this paper, we propose to exploit maps of dynamics (MoDs, a class of general representations of place-dependent spatial motion patterns, learned from prior observations) for long-term human motion prediction (LHMP). We present a new MoD-informed human motion prediction approach, named CLiFF-LHMP, which is data efficient, explainable, and insensitive to errors from an upstream tracking system. Our approach uses CLiFF-map, a specific MoD trained with human motion data recorded in the same environment. We bias a constant velocity prediction with samples from the CLiFF-map to generate multi-modal trajectory predictions. In two public datasets we show that this algorithm outperforms the state of the art for predictions over very extended periods of time, achieving 45% more accurate prediction performance at 50s compared to the baseline.
Optimized Implementation of Neuromorphic HATS Algorithm on FPGA
Abstract
In this paper, we present first-ever optimized hardware implementation of a state-of-the-art neuromorphic approach Histogram of Averaged Time Surfaces (HATS) algorithm to event-based object classification in FPGA for asynchronous time-based image sensors (ATIS). Our Implementation achieves latency of 3.3 ms for the N-CARS dataset samples and is capable of processing 2.94 Mevts/s. Speed-up is achieved by using parallelism in the design and multiple Processing Elements can be added. As development platform, Zynq-7000 SoC from Xilinx is used. The tradeoff between Average Absolute Error and Resource Utilization for fixed precision implementation is analyzed and presented. The proposed FPGA implementation is $\sim$ 32 x power efficient compared to software implementation.
Polygon Intersection-over-Union Loss for Viewpoint-Agnostic Monocular 3D Vehicle Detection
Authors: Derek Gloudemans, Xinxuan Lu, Shepard Xia, Daniel B. Work
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Monocular 3D object detection is a challenging task because depth information is difficult to obtain from 2D images. A subset of viewpoint-agnostic monocular 3D detection methods also do not explicitly leverage scene homography or geometry during training, meaning that a model trained thusly can detect objects in images from arbitrary viewpoints. Such works predict the projections of the 3D bounding boxes on the image plane to estimate the location of the 3D boxes, but these projections are not rectangular so the calculation of IoU between these projected polygons is not straightforward. This work proposes an efficient, fully differentiable algorithm for the calculation of IoU between two convex polygons, which can be utilized to compute the IoU between two 3D bounding box footprints viewed from an arbitrary angle. We test the performance of the proposed polygon IoU loss (PIoU loss) on three state-of-the-art viewpoint-agnostic 3D detection models. Experiments demonstrate that the proposed PIoU loss converges faster than L1 loss and that in 3D detection models, a combination of PIoU loss and L1 loss gives better results than L1 loss alone (+1.64% AP70 for MonoCon on cars, +0.18% AP70 for RTM3D on cars, and +0.83%/+2.46% AP50/AP25 for MonoRCNN on cyclists).
Abstract
We study inferring a tree-structured representation from a single image for object shading. Prior work typically uses the parametric or measured representation to model shading, which is neither interpretable nor easily editable. We propose using the shade tree representation, which combines basic shading nodes and compositing methods to factorize object surface shading. The shade tree representation enables novice users who are unfamiliar with the physical shading process to edit object shading in an efficient and intuitive manner. A main challenge in inferring the shade tree is that the inference problem involves both the discrete tree structure and the continuous parameters of the tree nodes. We propose a hybrid approach to address this issue. We introduce an auto-regressive inference model to generate a rough estimation of the tree structure and node parameters, and then we fine-tune the inferred shade tree through an optimization algorithm. We show experiments on synthetic images, captured reflectance, real images, and non-realistic vector drawings, allowing downstream applications such as material editing, vectorized shading, and relighting. Project website: https://chen-geng.com/inv-shade-trees
Abstract
Many existing transfer learning methods rely on leveraging information from source data that closely resembles the target data. However, this approach often overlooks valuable knowledge that may be present in different yet potentially related auxiliary samples. When dealing with a limited amount of target data and a diverse range of source models, our paper introduces a novel approach, Distributionally Robust Optimization for Transfer Learning (TransDRO), that breaks free from strict similarity constraints. TransDRO is designed to optimize the most adversarial loss within an uncertainty set, defined as a collection of target populations generated as a convex combination of source distributions that guarantee excellent prediction performances for the target data. TransDRO effectively bridges the realms of transfer learning and distributional robustness prediction models. We establish the identifiability of TransDRO and its interpretation as a weighted average of source models closest to the baseline model. We also show that TransDRO achieves a faster convergence rate than the model fitted with the target data. Our comprehensive numerical studies and analysis of multi-institutional electronic health records data using TransDRO further substantiate the robustness and accuracy of TransDRO, highlighting its potential as a powerful tool in transfer learning applications.
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Authors: Siddarth Venkatraman, Shivesh Khaitan, Ravi Tej Akella, John Dolan, Jeff Schneider, Glen Berseth
Abstract
Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approaches use conservative methods that are tricky to tune and struggle with multi-modal data (as we show) or rely on noisy Monte Carlo return-to-go samples for reward conditioning. In this work, we propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills. This facilitates learning a Q-function while avoiding extrapolation error via batch-constraining. The latent space is also expressive and gracefully copes with multi-modal data. We show that the learned temporally-abstract latent space encodes richer task-specific information for offline RL tasks as compared to raw state-actions. This improves credit assignment and facilitates faster reward propagation during Q-learning. Our method demonstrates state-of-the-art performance on the D4RL benchmarks, particularly excelling in long-horizon, sparse-reward tasks.
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
Abstract
Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers, while target speaker voice activity detection (TS-VAD) systems tend to be overly complex. In this paper, we propose a simple attention-based encoder-decoder network for end-to-end neural diarization (AED-EEND). In our training process, we introduce a teacher-forcing strategy to address the speaker permutation problem, leading to faster model convergence. For evaluation, we propose an iterative decoding method that outputs diarization results for each speaker sequentially. Additionally, we propose an Enhancer module to enhance the frame-level speaker embeddings, enabling the model to handle scenarios with an unseen number of speakers. We also explore replacing the transformer encoder with a Conformer architecture, which better models local information. Furthermore, we discovered that commonly used simulation datasets for speaker diarization have a much higher overlap ratio compared to real data. We found that using simulated training data that is more consistent with real data can achieve an improvement in consistency. Extensive experimental validation demonstrates the effectiveness of our proposed methodologies. Our best system achieved a new state-of-the-art diarization error rate (DER) performance on all the CALLHOME (10.08%), DIHARD II (24.64%), and AMI (13.00%) evaluation benchmarks, when no oracle voice activity detection (VAD) is used. Beyond speaker diarization, our AED-EEND system also shows remarkable competitiveness as a speech type detection model.
Time-Optimal Gate-Traversing Planner for Autonomous Drone Racing
Authors: Chao Qin, Maxime S.J. Michet, Jingxiang Chen, Hugh H.-T. Liu
Abstract
Time-minimum trajectories through race tracks are determined by the drone's capability as well as the configuration of all gates (e.g., their shapes, sizes, and orientations). However, prior works neglect the impact of the gate configuration and formulate drone racing as a waypoint flight task, leading to conservative waypoint selection through each gate. We present a novel time-optimal planner that can account for gate constraints explicitly, enabling quadrotors to follow the most time-efficient waypoints at their single-rotor-thrust limits in tracks with hybrid gate types. Our approach provides comparable solution quality to the state-of-the-art but with a computation time orders of magnitude faster. Furthermore, the proposed framework allows users to customize gate constraints such as tunnels by concatenating existing gate classes, enabling high-fidelity race track modeling. Owing to the superior computation efficiency and flexibility, we can generate optimal racing trajectories for complex race tracks with tens or even hundreds of gates with distinct shapes. We validate our method in real-world flights and demonstrate that faster lap times can be produced by using gate constraints instead of waypoint constraints.
Undetectable Selfish Mining
Authors: Maryam Bahrani, S. Matthew Weinberg
Subjects: Computer Science and Game Theory (cs.GT); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Abstract
Seminal work of Eyal and Sirer (2014) establishes that a strategic Bitcoin miner may strictly profit by deviating from the intended Bitcoin protocol, using a strategy now termed selfish mining. More specifically, any miner with $>1/3$ of the total hashrate can earn bitcoin at a faster rate by selfish mining than by following the intended protocol (depending on network conditions, a lower fraction of hashrate may also suffice). One convincing critique of selfish mining in practice is that the presence of a selfish miner is statistically detectable: the pattern of orphaned blocks created by the presence of a selfish miner cannot be explained by natural network delays. Therefore, if an attacker chooses to selfish mine, users can detect this, and this may (significantly) negatively impact the value of BTC. So while the attacker may get slightly more bitcoin by selfish mining, these bitcoin may be worth significantly less USD. We develop a selfish mining variant that is provably statistically undetectable: the pattern of orphaned blocks is statistically identical to a world with only honest miners but higher network delay. Specifically, we consider a stylized model where honest miners with network delay produce orphaned blocks at each height independently with probability $\beta'$. We propose a selfish mining strategy that instead produces orphaned blocks at each height independently with probability $\beta > \beta'$. We further show that our strategy is strictly profitable for attackers with $38.2\% \ll 50\%$ of the total hashrate (and this holds for all natural orphan rates $\beta'$).
CCSPNet-Joint: Efficient Joint Training Method for Traffic Sihn Detection Under Extreme Conditions
Authors: Haoqin Hong, Yue Zhou, Xiangyu Shu, Xiangfang Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Traffic sign detection is an important research direction in intelligent driving. Unfortunately, existing methods often overlook extreme conditions such as fog, rain, and motion blur. Moreover, the end-to-end training strategy for image denoising and object detection models fails to utilize inter-model information effectively. To address these issues, we propose CCSPNet, an efficient feature extraction module based on Transformers and CNNs, which effectively leverages contextual information, achieves faster inference speed and provides stronger feature enhancement capabilities. Furthermore, we establish the correlation between object detection and image denoising tasks and propose a joint training model, CCSPNet-Joint, to improve data efficiency and generalization. Finally, to validate our approach, we create the CCTSDB-AUG dataset for traffic sign detection in extreme scenarios. Extensive experiments have shown that CCSPNet achieves state-of-the-art performance in traffic sign detection under extreme conditions. Compared to end-to-end methods, CCSPNet-Joint achieves a 5.32% improvement in precision and an 18.09% improvement in mAP@.5.
DNNShifter: An Efficient DNN Pruning System for Edge Computing
Authors: Bailey J. Eccles, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese
Abstract
Deep neural networks (DNNs) underpin many machine learning applications. Production quality DNN models achieve high inference accuracy by training millions of DNN parameters which has a significant resource footprint. This presents a challenge for resources operating at the extreme edge of the network, such as mobile and embedded devices that have limited computational and memory resources. To address this, models are pruned to create lightweight, more suitable variants for these devices. Existing pruning methods are unable to provide similar quality models compared to their unpruned counterparts without significant time costs and overheads or are limited to offline use cases. Our work rapidly derives suitable model variants while maintaining the accuracy of the original model. The model variants can be swapped quickly when system and network conditions change to match workload demand. This paper presents DNNShifter, an end-to-end DNN training, spatial pruning, and model switching system that addresses the challenges mentioned above. At the heart of DNNShifter is a novel methodology that prunes sparse models using structured pruning. The pruned model variants generated by DNNShifter are smaller in size and thus faster than dense and sparse model predecessors, making them suitable for inference at the edge while retaining near similar accuracy as of the original dense model. DNNShifter generates a portfolio of model variants that can be swiftly interchanged depending on operational conditions. DNNShifter produces pruned model variants up to 93x faster than conventional training methods. Compared to sparse models, the pruned model variants are up to 5.14x smaller and have a 1.67x inference latency speedup, with no compromise to sparse model accuracy. In addition, DNNShifter has up to 11.9x lower overhead for switching models and up to 3.8x lower memory utilisation than existing approaches.
Polygon Intersection-over-Union Loss for Viewpoint-Agnostic Monocular 3D Vehicle Detection
Authors: Derek Gloudemans, Xinxuan Lu, Shepard Xia, Daniel B. Work
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Monocular 3D object detection is a challenging task because depth information is difficult to obtain from 2D images. A subset of viewpoint-agnostic monocular 3D detection methods also do not explicitly leverage scene homography or geometry during training, meaning that a model trained thusly can detect objects in images from arbitrary viewpoints. Such works predict the projections of the 3D bounding boxes on the image plane to estimate the location of the 3D boxes, but these projections are not rectangular so the calculation of IoU between these projected polygons is not straightforward. This work proposes an efficient, fully differentiable algorithm for the calculation of IoU between two convex polygons, which can be utilized to compute the IoU between two 3D bounding box footprints viewed from an arbitrary angle. We test the performance of the proposed polygon IoU loss (PIoU loss) on three state-of-the-art viewpoint-agnostic 3D detection models. Experiments demonstrate that the proposed PIoU loss converges faster than L1 loss and that in 3D detection models, a combination of PIoU loss and L1 loss gives better results than L1 loss alone (+1.64% AP70 for MonoCon on cars, +0.18% AP70 for RTM3D on cars, and +0.83%/+2.46% AP50/AP25 for MonoRCNN on cyclists).
Keyword: mobile
An overview of VANET vehicular networks
Authors: Ali Hozouri, Abbas Mirzaei, Shiva RazaghZadeh, Davoud Yousefi
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Today, with the development of intercity and metropolitan roadways and with various cars moving in various directions, there is a greater need than ever for a network to coordinate commutes. Nowadays, people spend a lot of time in their vehicles. Smart automobiles have developed to make that time safer, more effective, more fun, pollution-free, and affordable. However, maintaining the optimum use of resources and addressing rising needs continues to be a challenge given the popularity of vehicle users and the growing diversity of requests for various services. As a result, VANET will require modernized working practices in the future. Modern intelligent transportation management and driver assistance systems are created using cutting-edge communication technology. Vehicular Ad-hoc networks promise to increase transportation effectiveness, accident prevention, and pedestrian comfort by allowing automobiles and road infrastructure to communicate entertainment and traffic information. By constructing thorough frameworks, workflow patterns, and update procedures, including block-chain, artificial intelligence, and SDN (Software Defined Networking), this paper addresses VANET-related technologies, future advances, and related challenges. An overview of the VANET upgrade solution is given in this document in order to handle potential future problems.
High Fidelity Fast Simulation of Human in the Loop Human in the Plant (HIL-HIP) Systems
Abstract
Non-linearities in simulation arise from the time variance in wireless mobile networks when integrated with human in the loop, human in the plant (HIL-HIP) physical systems under dynamic contexts, leading to simulation slowdown. Time variance is handled by deriving a series of piece wise linear time invariant simulations (PLIS) in intervals, which are then concatenated in time domain. In this paper, we conduct a formal analysis of the impact of discretizing time-varying components in wireless network-controlled HIL-HIP systems on simulation accuracy and speedup, and evaluate trade-offs with reliable guarantees. We develop an accurate simulation framework for an artificial pancreas wireless network system that controls blood glucose in Type 1 Diabetes patients with time varying properties such as physiological changes associated with psychological stress and meal patterns. PLIS approach achieves accurate simulation with greater than 2.1 times speedup than a non-linear system simulation for the given dataset.
ShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal' to the ICASSP 2024 online submission system
Abstract
Document shadow is a common issue that arise when capturing documents using mobile devices, which significantly impacts the readability. Current methods encounter various challenges including inaccurate detection of shadow masks and estimation of illumination. In this paper, we propose ShaDocFormer, a Transformer-based architecture that integrates traditional methodologies and deep learning techniques to tackle the problem of document shadow removal. The ShaDocFormer architecture comprises two components: the Shadow-attentive Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module employs a traditional thresholding technique and leverages the attention mechanism of the Transformer to gather global information, thereby enabling precise detection of shadow masks. The cascaded and aggregative structure of the CFR module facilitates a coarse-to-fine restoration process for the entire image. As a result, ShaDocFormer excels in accurately detecting and capturing variations in both shadow and illumination, thereby enabling effective removal of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms current state-of-the-art methods in both qualitative and quantitative measurements.
Short reasons for long vectors in HPC CPUs: a study based on RISC-V
Authors: Pablo Vizcaino, Georgios Ieronymakis, Nikolaos Dimou, Vassilis Papaefstathiou, Jesus Labarta, Filippo Mantovani
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)
Abstract
For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction. The optimal vector width and its impact on CPU throughput due to memory latency and bandwidth remain challenging research areas. This study examines the behavior of four computational kernels on a RISC-V core connected to a customizable vector unit, capable of operating up to 256 double precision elements per instruction. The four codes have been purposefully selected to represent non-dense workloads: SpMV, BFS, PageRank, FFT. The experimental setup allows us to measure their performance while varying the vector length, the memory latency, and bandwidth. Our results not only show that larger vector lengths allow for better tolerance of limitations in the memory subsystem but also offer hope to code developers beyond dense linear algebra.
DNNShifter: An Efficient DNN Pruning System for Edge Computing
Authors: Bailey J. Eccles, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese
Abstract
Deep neural networks (DNNs) underpin many machine learning applications. Production quality DNN models achieve high inference accuracy by training millions of DNN parameters which has a significant resource footprint. This presents a challenge for resources operating at the extreme edge of the network, such as mobile and embedded devices that have limited computational and memory resources. To address this, models are pruned to create lightweight, more suitable variants for these devices. Existing pruning methods are unable to provide similar quality models compared to their unpruned counterparts without significant time costs and overheads or are limited to offline use cases. Our work rapidly derives suitable model variants while maintaining the accuracy of the original model. The model variants can be swapped quickly when system and network conditions change to match workload demand. This paper presents DNNShifter, an end-to-end DNN training, spatial pruning, and model switching system that addresses the challenges mentioned above. At the heart of DNNShifter is a novel methodology that prunes sparse models using structured pruning. The pruned model variants generated by DNNShifter are smaller in size and thus faster than dense and sparse model predecessors, making them suitable for inference at the edge while retaining near similar accuracy as of the original dense model. DNNShifter generates a portfolio of model variants that can be swiftly interchanged depending on operational conditions. DNNShifter produces pruned model variants up to 93x faster than conventional training methods. Compared to sparse models, the pruned model variants are up to 5.14x smaller and have a 1.67x inference latency speedup, with no compromise to sparse model accuracy. In addition, DNNShifter has up to 11.9x lower overhead for switching models and up to 3.8x lower memory utilisation than existing approaches.
MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems
Authors: Hanqing Guo, Xun Chen, Junfeng Guo, Li Xiao, Qiben Yan
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation of existing poisoning attacks against unseen targets. Then, we optimize a universal backdoor that is capable of attacking arbitrary targets. Next, we embed the speaker's characteristics and semantics information into the backdoor, making it imperceptible. Finally, we estimate the channel distortion and integrate it into the backdoor. We validate our attack on 6 popular SV models. Specifically, we poison a total of 53 models and use our trigger to attack 16,430 enrolled speakers, composed of 310 target speakers enrolled in 53 poisoned models. Our attack achieves 100% attack success rate with a 15% poison rate. By decreasing the poison rate to 3%, the attack success rate remains around 50%. We validate our attack in 3 real-world scenarios and successfully demonstrate the attack through both over-the-air and over-the-telephony-line scenarios.
CLiFF-LHMP: Using Spatial Dynamics Patterns for Long-Term Human Motion Prediction
Authors: Yufei Zhu, Andrey Rudenko, Tomasz P. Kucner, Luigi Palmieri, Kai O. Arras, Achim J. Lilienthal, Martin Magnusson
Abstract
Human motion prediction is important for mobile service robots and intelligent vehicles to operate safely and smoothly around people. The more accurate predictions are, particularly over extended periods of time, the better a system can, e.g., assess collision risks and plan ahead. In this paper, we propose to exploit maps of dynamics (MoDs, a class of general representations of place-dependent spatial motion patterns, learned from prior observations) for long-term human motion prediction (LHMP). We present a new MoD-informed human motion prediction approach, named CLiFF-LHMP, which is data efficient, explainable, and insensitive to errors from an upstream tracking system. Our approach uses CLiFF-map, a specific MoD trained with human motion data recorded in the same environment. We bias a constant velocity prediction with samples from the CLiFF-map to generate multi-modal trajectory predictions. In two public datasets we show that this algorithm outperforms the state of the art for predictions over very extended periods of time, achieving 45% more accurate prediction performance at 50s compared to the baseline.
Keyword: pruning
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity
Authors: Matteo Grimaldi, Darshan C. Ganji, Ivan Lazarevich, Sudhakar Sah
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. It is known that unstructured sparsity results in lower accuracy degradation with respect to structured sparsity but the former needs extensive inference engine changes to get latency benefits. To tackle this challenge, we propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. To attain high speedup levels at inference time, we design a sparse training procedure with awareness of the final position of the activations while computing the General Matrix Multiplication (GEMM). We extensively evaluate the proposed solution across various models for image classification and object detection tasks. Remarkably, our approach yields a speed improvement of $1.25 \times$ with a minimal accuracy drop of $1.1\%$ for the ResNet18 model on the ImageNet dataset. Furthermore, when combined with a state-of-the-art structured pruning method, the resulting models provide a good latency-accuracy trade-off, outperforming models that solely employ structured pruning techniques.
MCNS: Mining Causal Natural Structures Inside Time Series via A Novel Internal Causality Scheme
Authors: Yuanhao Liu, Dehui Du, Zihan Jiang, Anyan Huang, Yiyang Li
Abstract
Causal inference permits us to discover covert relationships of various variables in time series. However, in most existing works, the variables mentioned above are the dimensions. The causality between dimensions could be cursory, which hinders the comprehension of the internal relationship and the benefit of the causal graph to the neural networks (NNs). In this paper, we find that causality exists not only outside but also inside the time series because it reflects a succession of events in the real world. It inspires us to seek the relationship between internal subsequences. However, the challenges are the hardship of discovering causality from subsequences and utilizing the causal natural structures to improve NNs. To address these challenges, we propose a novel framework called Mining Causal Natural Structure (MCNS), which is automatic and domain-agnostic and helps to find the causal natural structures inside time series via the internal causality scheme. We evaluate the MCNS framework and impregnation NN with MCNS on time series classification tasks. Experimental results illustrate that our impregnation, by refining attention, shape selection classification, and pruning datasets, drives NN, even the data itself preferable accuracy and interpretability. Besides, MCNS provides an in-depth, solid summary of the time series and datasets.
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Authors: Qianyu Long, Christos Anagnostopoulos, Shameem Puthiya Parambath, Daning Bi
Abstract
Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs). However, DNNs are characterized by an extremely large number of parameters, thus, yielding significant challenges in exchanging these parameters among distributed nodes and managing the memory. Although recent DNN compression methods (e.g., sparsification, pruning) tackle such challenges, they do not holistically consider an adaptively controlled reduction of parameter exchange while maintaining high accuracy levels. We, therefore, contribute with a novel FL framework (coined FedDIP), which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange, which contributes to significant performance improvement, with (ii) incremental regularization that can achieve \textit{extreme} sparsity of models. We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods using benchmark data sets and DNN models. Our results showcase that FedDIP not only controls the model sparsity but efficiently achieves similar or better performance compared to other model pruning methods adopting incremental regularization during distributed model training. The code is available at: https://github.com/EricLoong/feddip.
DNNShifter: An Efficient DNN Pruning System for Edge Computing
Authors: Bailey J. Eccles, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese
Abstract
Deep neural networks (DNNs) underpin many machine learning applications. Production quality DNN models achieve high inference accuracy by training millions of DNN parameters which has a significant resource footprint. This presents a challenge for resources operating at the extreme edge of the network, such as mobile and embedded devices that have limited computational and memory resources. To address this, models are pruned to create lightweight, more suitable variants for these devices. Existing pruning methods are unable to provide similar quality models compared to their unpruned counterparts without significant time costs and overheads or are limited to offline use cases. Our work rapidly derives suitable model variants while maintaining the accuracy of the original model. The model variants can be swapped quickly when system and network conditions change to match workload demand. This paper presents DNNShifter, an end-to-end DNN training, spatial pruning, and model switching system that addresses the challenges mentioned above. At the heart of DNNShifter is a novel methodology that prunes sparse models using structured pruning. The pruned model variants generated by DNNShifter are smaller in size and thus faster than dense and sparse model predecessors, making them suitable for inference at the edge while retaining near similar accuracy as of the original dense model. DNNShifter generates a portfolio of model variants that can be swiftly interchanged depending on operational conditions. DNNShifter produces pruned model variants up to 93x faster than conventional training methods. Compared to sparse models, the pruned model variants are up to 5.14x smaller and have a 1.67x inference latency speedup, with no compromise to sparse model accuracy. In addition, DNNShifter has up to 11.9x lower overhead for switching models and up to 3.8x lower memory utilisation than existing approaches.
Keyword: diffusion
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Authors: Siddarth Venkatraman, Shivesh Khaitan, Ravi Tej Akella, John Dolan, Jeff Schneider, Glen Berseth
Abstract
Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approaches use conservative methods that are tricky to tune and struggle with multi-modal data (as we show) or rely on noisy Monte Carlo return-to-go samples for reward conditioning. In this work, we propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills. This facilitates learning a Q-function while avoiding extrapolation error via batch-constraining. The latent space is also expressive and gracefully copes with multi-modal data. We show that the learned temporally-abstract latent space encodes richer task-specific information for offline RL tasks as compared to raw state-actions. This improves credit assignment and facilitates faster reward propagation during Q-learning. Our method demonstrates state-of-the-art performance on the D4RL benchmarks, particularly excelling in long-horizon, sparse-reward tasks.
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Authors: Zhichao Wu, Qiulin Li, Sixing Liu, Qun Yang
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete space significantly lowers the computational consumption of the diffusion model and improves sampling speed; 2) The contrastive learning method based on discrete space is used to enhance the alignment connection between speech and text and improve sampling quality; and 3) It uses an efficient text encoder to simplify the model's parameters and increase computational efficiency. The experimental results demonstrate that the approach proposed in this paper has outstanding speech synthesis quality and sampling speed while significantly reducing the resource consumption of diffusion model. The synthesized samples are available at https://github.com/lawtherWu/DCTTS.
Abstract
Large-scale text-to-image models including Stable Diffusion are capable of generating high-fidelity photorealistic portrait images. There is an active research area dedicated to personalizing these models, aiming to synthesize specific subjects or styles using provided sets of reference images. However, despite the plausible results from these personalization methods, they tend to produce images that often fall short of realism and are not yet on a commercially viable level. This is particularly noticeable in portrait image generation, where any unnatural artifact in human faces is easily discernible due to our inherent human bias. To address this, we introduce MagiCapture, a personalization method for integrating subject and style concepts to generate high-resolution portrait images using just a few subject and style references. For instance, given a handful of random selfies, our fine-tuned model can generate high-quality portrait images in specific styles, such as passport or profile photos. The main challenge with this task is the absence of ground truth for the composed concepts, leading to a reduction in the quality of the final output and an identity shift of the source subject. To address these issues, we present a novel Attention Refocusing loss coupled with auxiliary priors, both of which facilitate robust learning within this weakly supervised learning setting. Our pipeline also includes additional post-processing steps to ensure the creation of highly realistic outputs. MagiCapture outperforms other baselines in both quantitative and qualitative evaluations and can also be generalized to other non-human objects.
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
Abstract
The automatic co-speech gesture generation draws much attention in computer animation. Previous works designed network structures on individual datasets, which resulted in a lack of data volume and generalizability across different motion capture standards. In addition, it is a challenging task due to the weak correlation between speech and gestures. To address these problems, we present UnifiedGesture, a novel diffusion model-based speech-driven gesture synthesis approach, trained on multiple gesture datasets with different skeletons. Specifically, we first present a retargeting network to learn latent homeomorphic graphs for different motion capture standards, unifying the representations of various gestures while extending the dataset. We then capture the correlation between speech and gestures based on a diffusion model architecture using cross-local attention and self-attention to generate better speech-matched and realistic gestures. To further align speech and gesture and increase diversity, we incorporate reinforcement learning on the discrete gesture units with a learned reward function. Extensive experiments show that UnifiedGesture outperforms recent approaches on speech-driven gesture generation in terms of CCA, FGD, and human-likeness. All code, pre-trained models, databases, and demos are available to the public at https://github.com/YoungSeng/UnifiedGesture.
Keyword: adaptive
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Authors: Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhijing Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Optimization and Control (math.OC)
Abstract
Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch. Our implementation enables fast multi-GPU distributed data-parallel training by distributing the memory and computation associated with blocks of each parameter via PyTorch's DTensor data structure and performing an AllGather primitive on the computed search directions at each iteration. This major performance enhancement enables us to achieve at most a 10% performance reduction in per-step wall-clock time compared against standard diagonal-scaling-based adaptive gradient methods. We validate our implementation by performing an ablation study on training ImageNet ResNet50, demonstrating Shampoo's superiority over standard training recipes with minimal hyperparameter tuning.
Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger
Authors: Parick Diehl, Gregor Daiss, Steven R. Brandt, Alireza Kheirkhahan, Hartmut Kaiser, Christopher Taylor, John Leidel
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
In recent years, computers based on the RISC-V architecture have raised broad interest in the high-performance computing (HPC) community. As the RISC-V community develops the core instruction set architecture (ISA) along with ISA extensions, the HPC community has been actively ensuring HPC applications and environments are supported. In this context, assessing the performance of asynchronous many-task runtime systems (AMT) is essential. In this paper, we describe our experience with porting of a full 3D adaptive mesh-refinement, multi-scale, multi-model, and multi-physics application, Octo-Tiger, that is based on the HPX AMT, and we explore its performance characteristics on different RISC-V systems. Considering the (limited) capabilities of the RISC-V test systems we used, Octo-Tiger already shows promising results and good scaling. We, however, expect that exceptional hardware support based on dedicated ISA extensions (such as single-cycle context switches, extended atomic operations, and direct support for HPX's global address space) would allow for even better performance results.
Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for Adaptive Learning
Authors: Atticus Beachy (1), Harok Bae (1), Jose Camberos (2), Ramana Grandhi (2) ((1) Wright State University, Dayton, OH, USA (2) Air Force Institute of Technology, Wright-Patterson AFB, OH, USA)
Abstract
Emulator embedded neural networks, which are a type of physics informed neural network, leverage multi-fidelity data sources for efficient design exploration of aerospace engineering systems. Multiple realizations of the neural network models are trained with different random initializations. The ensemble of model realizations is used to assess epistemic modeling uncertainty caused due to lack of training samples. This uncertainty estimation is crucial information for successful goal-oriented adaptive learning in an aerospace system design exploration. However, the costs of training the ensemble models often become prohibitive and pose a computational challenge, especially when the models are not trained in parallel during adaptive learning. In this work, a new type of emulator embedded neural network is presented using the rapid neural network paradigm. Unlike the conventional neural network training that optimizes the weights and biases of all the network layers by using gradient-based backpropagation, rapid neural network training adjusts only the last layer connection weights by applying a linear regression technique. It is found that the proposed emulator embedded neural network trains near-instantaneously, typically without loss of prediction accuracy. The proposed method is demonstrated on multiple analytical examples, as well as an aerospace flight parameter study of a generic hypersonic vehicle.
Ridge detection for nonstationary multicomponent signals with time-varying wave-shape functions and its applications
Authors: Yan-Wei Su, Gi-Ren Liu, Yuan-Chung Sheu, Hau-Tieng Wu
Abstract
We introduce a novel ridge detection algorithm for time-frequency (TF) analysis, particularly tailored for intricate nonstationary time series encompassing multiple non-sinusoidal oscillatory components. The algorithm is rooted in the distinctive geometric patterns that emerge in the TF domain due to such non-sinusoidal oscillations. We term this method \textit{shape-adaptive mode decomposition-based multiple harmonic ridge detection} (\textsf{SAMD-MHRD}). A swift implementation is available when supplementary information is at hand. We demonstrate the practical utility of \textsf{SAMD-MHRD} through its application to a real-world challenge. We employ it to devise a cutting-edge walking activity detection algorithm, leveraging accelerometer signals from an inertial measurement unit across diverse body locations of a moving subject.
Transparent Object Tracking with Enhanced Fusion Module
Authors: Kalyan Garigapati, Erik Blasch, Jie Wei, Haibin Ling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Accurate tracking of transparent objects, such as glasses, plays a critical role in many robotic tasks such as robot-assisted living. Due to the adaptive and often reflective texture of such objects, traditional tracking algorithms that rely on general-purpose learned features suffer from reduced performance. Recent research has proposed to instill transparency awareness into existing general object trackers by fusing purpose-built features. However, with the existing fusion techniques, the addition of new features causes a change in the latent space making it impossible to incorporate transparency awareness on trackers with fixed latent spaces. For example, many of the current days transformer-based trackers are fully pre-trained and are sensitive to any latent space perturbations. In this paper, we present a new feature fusion technique that integrates transparency information into a fixed feature space, enabling its use in a broader range of trackers. Our proposed fusion module, composed of a transformer encoder and an MLP module, leverages key query-based transformations to embed the transparency information into the tracking pipeline. We also present a new two-step training strategy for our fusion module to effectively merge transparency features. We propose a new tracker architecture that uses our fusion techniques to achieve superior results for transparent object tracking. Our proposed method achieves competitive results with state-of-the-art trackers on TOTB, which is the largest transparent object tracking benchmark recently released. Our results and the implementation of code will be made publicly available at https://github.com/kalyan0510/TOTEM.
Abstract
Similarity measures for time series are important problems for time series classification. To handle the nonlinear time distortions, Dynamic Time Warping (DTW) has been widely used. However, DTW is not learnable and suffers from a trade-off between robustness against time distortion and discriminative power. In this paper, we propose a neural network model for task-adaptive time warping. Specifically, we use the attention model, called the bipartite attention model, to develop an explicit time warping mechanism with greater distortion invariance. Unlike other learnable models using DTW for warping, our model predicts all local correspondences between two time series and is trained based on metric learning, which enables it to learn the optimal data-dependent warping for the target task. We also propose to induce pre-training of our model by DTW to improve the discriminative power. Extensive experiments demonstrate the superior effectiveness of our model over DTW and its state-of-the-art performance in online signature verification.
Dynamic Spectrum Mixer for Visual Recognition
Authors: Zhiqiang Hu, Tao Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Recently, MLP-based vision backbones have achieved promising performance in several visual recognition tasks. However, the existing MLP-based methods directly aggregate tokens with static weights, leaving the adaptability to different images untouched. Moreover, Recent research demonstrates that MLP-Transformer is great at creating long-range dependencies but ineffective at catching high frequencies that primarily transmit local information, which prevents it from applying to the downstream dense prediction tasks, such as semantic segmentation. To address these challenges, we propose a content-adaptive yet computationally efficient structure, dubbed Dynamic Spectrum Mixer (DSM). The DSM represents token interactions in the frequency domain by employing the Discrete Cosine Transform, which can learn long-term spatial dependencies with log-linear complexity. Furthermore, a dynamic spectrum weight generation layer is proposed as the spectrum bands selector, which could emphasize the informative frequency bands while diminishing others. To this end, the technique can efficiently learn detailed features from visual input that contains both high- and low-frequency information. Extensive experiments show that DSM is a powerful and adaptable backbone for a range of visual recognition tasks. Particularly, DSM outperforms previous transformer-based and MLP-based models, on image classification, object detection, and semantic segmentation tasks, such as 83.8 \% top-1 accuracy on ImageNet, and 49.9 \% mIoU on ADE20K.
MTD: Multi-Timestep Detector for Delayed Streaming Perception
Abstract
Autonomous driving systems require real-time environmental perception to ensure user safety and experience. Streaming perception is a task of reporting the current state of the world, which is used to evaluate the delay and accuracy of autonomous driving systems. In real-world applications, factors such as hardware limitations and high temperatures inevitably cause delays in autonomous driving systems, resulting in the offset between the model output and the world state. In order to solve this problem, this paper propose the Multi- Timestep Detector (MTD), an end-to-end detector which uses dynamic routing for multi-branch future prediction, giving model the ability to resist delay fluctuations. A Delay Analysis Module (DAM) is proposed to optimize the existing delay sensing method, continuously monitoring the model inference stack and calculating the delay trend. Moreover, a novel Timestep Branch Module (TBM) is constructed, which includes static flow and adaptive flow to adaptively predict specific timesteps according to the delay trend. The proposed method has been evaluated on the Argoverse-HD dataset, and the experimental results show that it has achieved state-of-the-art performance across various delay settings.
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Authors: Qianyu Long, Christos Anagnostopoulos, Shameem Puthiya Parambath, Daning Bi
Abstract
Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs). However, DNNs are characterized by an extremely large number of parameters, thus, yielding significant challenges in exchanging these parameters among distributed nodes and managing the memory. Although recent DNN compression methods (e.g., sparsification, pruning) tackle such challenges, they do not holistically consider an adaptively controlled reduction of parameter exchange while maintaining high accuracy levels. We, therefore, contribute with a novel FL framework (coined FedDIP), which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange, which contributes to significant performance improvement, with (ii) incremental regularization that can achieve \textit{extreme} sparsity of models. We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods using benchmark data sets and DNN models. Our results showcase that FedDIP not only controls the model sparsity but efficiently achieves similar or better performance compared to other model pruning methods adopting incremental regularization during distributed model training. The code is available at: https://github.com/EricLoong/feddip.
Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning
Authors: Riti Paul, Sahil Vora, Baoxin Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Generalized zero-shot learning(GZSL) aims to classify samples from seen and unseen labels, assuming unseen labels are not accessible during training. Recent advancements in GZSL have been expedited by incorporating contrastive-learning-based (instance-based) embedding in generative networks and leveraging the semantic relationship between data points. However, existing embedding architectures suffer from two limitations: (1) limited discriminability of synthetic features' embedding without considering fine-grained cluster structures; (2) inflexible optimization due to restricted scaling mechanisms on existing contrastive embedding networks, leading to overlapped representations in the embedding space. To enhance the quality of representations in the embedding space, as mentioned in (1), we propose a margin-based prototypical contrastive learning embedding network that reaps the benefits of prototype-data (cluster quality enhancement) and implicit data-data (fine-grained representations) interaction while providing substantial cluster supervision to the embedding network and the generator. To tackle (2), we propose an instance adaptive contrastive loss that leads to generalized representations for unseen labels with increased inter-class margin. Through comprehensive experimental evaluation, we show that our method can outperform the current state-of-the-art on three benchmark datasets. Our approach also consistently achieves the best unseen performance in the GZSL setting.
Abstract
We present Multi-Layer Intensity Map, a novel 3D object representation for robot perception and autonomous navigation. They consist of multiple stacked layers of 2D grid maps each derived from reflected point cloud intensities corresponding to a certain height interval. The different layers of the intensity maps can be used to simultaneously estimate obstacles' height, solidity/density, and opacity. We demonstrate that they can help accurately differentiate obstacles that are safe to navigate through (e.g. beaded/string curtains, pliable tall grass), from ones that must be avoided (e.g. transparent surfaces such as glass walls, bushes, trees, etc.) in indoor and outdoor environments. Further, to handle narrow passages, and navigate through non-solid obstacles in dense environments, we propose an approach to adaptively inflate or enlarge the obstacles detected on intensity maps based on their solidity, and the robot's preferred velocity direction. We demonstrate these improved navigation capabilities in real-world narrow, dense environments using a real Turtlebot and Boston Dynamics Spot. We observe significant increases in success rates (up to 50%), a 9.55% decrease in trajectory length, and up to a 10.9% increase in the F-score compared to current navigation methods using other sensor modalities.
Keyword: quantization
Differentiable JPEG: The Devil is in the Details
Authors: Christoph Reich, Biplob Debnath, Deep Patel, Srimat Chakradhar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract
JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by $3.47$dB (PSNR) on average. For strong compression rates, we can even improve PSNR by $9.51$dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.
Communication-Efficient Laplace Mechanism for Differential Privacy via Random Quantization
Authors: Ali Moradi Shahmiri, Chih Wei Ling, Cheuk Ting Li
Abstract
We propose the first method that realizes the Laplace mechanism exactly (i.e., a Laplace noise is added to the data) that requires only a finite amount of communication (whereas the original Laplace mechanism requires the transmission of a real number) while guaranteeing privacy against the server and database. Our mechanism can serve as a drop-in replacement for local or centralized differential privacy applications where the Laplace mechanism is used. Our mechanism is constructed using a random quantization technique. Unlike the simple and prevalent Laplace-mechanism-then-quantize approach, the quantization in our mechanism does not result in any distortion or degradation of utility. Unlike existing dithered quantization and channel simulation schemes for simulating additive Laplacian noise, our mechanism guarantees privacy not only against the database and downstream, but also against the honest but curious server which attempts to decode the data using the dither signals.
Keyword: efficient
Opportunistic Reflection in Reconfigurable Intelligent Surface-Assisted Wireless Networks
Level Up: Private Non-Interactive Decision Tree Evaluation using Levelled Homomorphic Encryption
Exploring the Benefits of Differentially Private Pre-training and Parameter-Efficient Fine-tuning for Table Transformers
Offline Prompt Evaluation and Optimization with Inverse Reinforcement Learning
An improved protocol for ExactlyN with more than 3 players
METICULOUS: An FPGA-based Main Memory Emulator for System Software Studies
Promises of Deep Kernel Learning for Control Synthesis
Efficient Finite Initialization for Tensorized Neural Networks
Do Generative Large Language Models need billions of parameters?
A Reinforcement Learning Approach for Robotic Unloading from Visual Observations
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity
Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for Adaptive Learning
The Relational Bottleneck as an Inductive Bias for Efficient Abstraction
MCQUIC: Multicast and unicast in a single transport protocol
Collaborative Dynamic 3D Scene Graphs for Automated Driving
ConR: Contrastive Regularizer for Deep Imbalanced Regression
Generalizable Neural Fields as Partially Observed Neural Processes
A fixed-parameter tractable algorithm for combinatorial filter reduction
Scalable Scheduling for Industrial Time-Sensitive Networking: A Hyper-flow Graph Based Scheme
VLSlice: Interactive Vision-and-Language Slice Discovery
Dynamic Spectrum Mixer for Visual Recognition
Scaled Prompt-Tuning for Few-Shot Natural Language Generation
Hierarchical Time-Optimal Planning for Multi-Vehicle Racing
Reliability-Latency-Rate Tradeoff in Low-Latency Communications with Finite-Blocklength Coding
OrdinalFix: Fixing Compilation Errors via Shortest-Path CFL Reachability
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Dynamic NeRFs for Soccer Scenes
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Bounds and Constructions for Generalized Batch Codes
Comparative Analysis of Contextual Relation Extraction based on Deep Learning Models
Time-Optimal Gate-Traversing Planner for Autonomous Drone Racing
Gpachov at CheckThat! 2023: A Diverse Multi-Approach Ensemble for Subjectivity Detection in News Articles
A Wearable Ultra-Low-Power sEMG-Triggered Ultrasound System for Long-Term Muscle Activity Monitoring
Optimal information in Bayesian routing games
Manufacturing Quality Control with Autoencoder-Based Defect Localization and Unsupervised Class Selection
CCSPNet-Joint: Efficient Joint Training Method for Traffic Sihn Detection Under Extreme Conditions
Continual Learning with Dirichlet Generative-based Rehearsal
Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning
Regular Representations of Uniform TC^0
Real-Time Motion Planning for In-Hand Manipulation with a Multi-Fingered Hand
Harvesting Brownian Motion: Zero Energy Computational Sampling
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection
Towards Reliable Dermatology Evaluation Benchmarks
Auto-Regressive Next-Token Predictors are Universal Learners
Finding Morton-Like Layouts for Multi-Dimensional Arrays Using Evolutionary Algorithms
OYXOY: A Modern NLP Test Suite for Modern Greek
Asynchronous Collective Tree Exploration by Tree-Mining
Perfect Roman Domination and Unique Response Roman Domination
Multi-Robot Informative Path Planning from Regression with Sparse Gaussian Processes
CLiFF-LHMP: Using Spatial Dynamics Patterns for Long-Term Human Motion Prediction
Optimized Implementation of Neuromorphic HATS Algorithm on FPGA
Polygon Intersection-over-Union Loss for Viewpoint-Agnostic Monocular 3D Vehicle Detection
Tree-Structured Shading Decomposition
Keyword: faster
Distributionally Robust Transfer Learning
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
Time-Optimal Gate-Traversing Planner for Autonomous Drone Racing
Undetectable Selfish Mining
CCSPNet-Joint: Efficient Joint Training Method for Traffic Sihn Detection Under Extreme Conditions
DNNShifter: An Efficient DNN Pruning System for Edge Computing
Polygon Intersection-over-Union Loss for Viewpoint-Agnostic Monocular 3D Vehicle Detection
Keyword: mobile
An overview of VANET vehicular networks
High Fidelity Fast Simulation of Human in the Loop Human in the Plant (HIL-HIP) Systems
ShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal' to the ICASSP 2024 online submission system
Short reasons for long vectors in HPC CPUs: a study based on RISC-V
DNNShifter: An Efficient DNN Pruning System for Edge Computing
MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems
CLiFF-LHMP: Using Spatial Dynamics Patterns for Long-Term Human Motion Prediction
Keyword: pruning
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity
MCNS: Mining Causal Natural Structures Inside Time Series via A Novel Internal Causality Scheme
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
DNNShifter: An Efficient DNN Pruning System for Edge Computing
Keyword: diffusion
Reasoning with Latent Diffusion in Offline Reinforcement Learning
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
MagiCapture: High-Resolution Multi-Concept Portrait Customization
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
Keyword: adaptive
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger
Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for Adaptive Learning
Ridge detection for nonstationary multicomponent signals with time-varying wave-shape functions and its applications
Transparent Object Tracking with Enhanced Fusion Module
Deep Attentive Time Warping
Dynamic Spectrum Mixer for Visual Recognition
MTD: Multi-Timestep Detector for Delayed Streaming Perception
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning
Using Lidar Intensity for Robot Navigation
Keyword: quantization
Differentiable JPEG: The Devil is in the Details
Communication-Efficient Laplace Mechanism for Differential Privacy via Random Quantization