Abstract
We present GrooveMeter, a novel system that automatically detects vocal and motion reactions to music via earable sensing and supports music engagement-aware applications. To this end, we use smart earbuds as sensing devices, which are already widely used for music listening, and devise reaction detection techniques by leveraging an inertial measurement unit (IMU) and a microphone on earbuds. To explore reactions in daily music-listening situations, we collect the first kind of dataset, MusicReactionSet, containing 926-minute-long IMU and audio data with 30 participants. With the dataset, we discover a set of unique challenges in detecting music listening reactions accurately and robustly using audio and motion sensing. We devise sophisticated processing pipelines to make reaction detection accurate and efficient. We present a comprehensive evaluation to examine the performance of reaction detection and system cost. It shows that GrooveMeter achieves the macro F1 scores of 0.89 for vocal reaction and 0.81 for motion reaction with leave-one-subject-out cross-validation. More importantly, GrooveMeter shows higher accuracy and robustness compared to alternative methods. We also show that our filtering approach reduces 50% or more of the energy overhead. Finally, we demonstrate the potential use cases through a case study.
Identifying Lebesgue-sampled Continuous-time Impulse Response Models: A Kernel-based Approach
Authors: Rodrigo A. González, Koen Tiels, Tom Oomen
Abstract
Control applications are increasingly sampled non-equidistantly in time, including in motion control, networked control, resource-aware control, and event-triggered control. Some of these applications use measurement devices that sample equidistantly in the amplitude domain. The aim of this paper is to develop a non-parametric estimator of the impulse response of continuous-time systems based on such sampling strategy, known as Lebesgue-sampling. To this end, kernel methods are developed to formulate an algorithm that adequately takes into account the output intersample behavior, which ultimately leads to more accurate models and more efficient output sampling compared to the standard approach. The efficacy of this method is demonstrated through a mass-spring damper case study.
Adaptive Decision-Making with Constraints and Dependent Losses: Performance Guarantees and Applications to Online and Nonlinear Identification
Authors: Michael Muehlebach
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
We consider adaptive decision-making problems where an agent optimizes a cumulative performance objective by repeatedly choosing among a finite set of options. Compared to the classical prediction-with-expert-advice set-up, we consider situations where losses are constrained and derive algorithms that exploit the additional structure in optimal and computationally efficient ways. Our algorithm and our analysis is instance dependent, that is, suboptimal choices of the environment are exploited and reflected in our regret bounds. The constraints handle general dependencies between losses (even across time), and are flexible enough to also account for a loss budget, which the environment is not allowed to exceed. The performance of the resulting algorithms is highlighted in two numerical examples, which include a nonlinear and online system identification task.
Hardware-Aware Static Optimization of Hyperdimensional Computations
Authors: Pu Yi, Sara Achour
Subjects: Programming Languages (cs.PL); Information Theory (cs.IT)
Abstract
Hyperdimensional (HD) computing is an highly error-resilient computational paradigm that can be used to efficiently perform language classification, data retrieval, and analogical reasoning tasks on error-prone emerging hardware technologies. HD computation is storage-inefficient and often requires computing over 10,000-dimensional bit vectors. Prior work either leaves hypervectors unoptimized or dynamically tunes HD computation parameters (e.g., hypervector dimension) to deliver the desired accuracy. These approaches are time-consuming, lack accuracy guarantees, and do not generalize well. We present Heim, a framework for statically optimizing HD computation parameters to minimize resource usage in the presence of hardware error. Heim guarantees the optimized computation satisfies a user-provided target accuracy. Heim deploys a novel analysis procedure that unifies theoretical results in HD computing to systematically optimize HD computation. We develop four analysis-amenable data structures that leverage Heim to perform aggressive space-saving optimizations, and optimize these data structures to attain 99% query accuracy on both binary memory and multiple-bit-per-cell resistive memory. Heim-optimized data structures deliver 1.31x-14.51x reductions in hypervector size and 2.191x-27.27x reductions in memory usage while attaining 98.96-99.75% accuracy. Heim-optimized data structures deliver up to 41.40% accuracy improvements over dynamically tuned parameters. Heim computes parameters significantly faster than dynamic approaches.
Spintronic Physical Reservoir for Autonomous Prediction and Long-Term Household Energy Load Forecasting
Authors: Walid Al Misba, Harindra S. Mavikumbure, Md Mahadi Rajib, Daniel L. Marino, Victor Cobilean, Milos Manic, Jayasimha Atulasimha
Abstract
In this study, we have shown autonomous long-term prediction with a spintronic physical reservoir. Due to the short-term memory property of the magnetization dynamics, non-linearity arises in the reservoir states which could be used for long-term prediction tasks using simple linear regression for online training. During the prediction stage, the output is directly fed to the input of the reservoir for autonomous prediction. We employ our proposed reservoir for the modeling of the chaotic time series such as Mackey-Glass and dynamic time-series data, such as household building energy loads. Since only the last layer of a RC needs to be trained with linear regression, it is well suited for learning in real time on edge devices. Here we show that a skyrmion based magnetic tunnel junction can potentially be used as a prototypical RC but any nanomagnetic magnetic tunnel junction with nonlinear magnetization behavior can implement such a RC. By comparing our spintronic physical RC approach with state-of-the-art energy load forecasting algorithms, such as LSTMs and RNNs, we conclude that the proposed framework presents good performance in achieving high predictions accuracy, while also requiring low memory and energy both of which are at a premium in hardware resource and power constrained edge applications. Further, the proposed approach is shown to require very small training datasets and at the same time being at least 16X energy efficient compared to the state-of-the-art sequence to sequence LSTM for accurate household load predictions.
ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators
Abstract
Image processing algorithms are prime targets for hardware acceleration as they are commonly used in resource- and power-limited applications. Today's image processing accelerator designs make rigid assumptions about the algorithm structures and/or on-chip memory resources. As a result, they either have narrow applicability or result in inefficient designs. This paper presents a compiler framework that automatically generates memory- and power-efficient image processing accelerators. We allow programmers to describe generic image processing algorithms (in a domain specific language) and specify on-chip memory structures available. Our framework then formulates a constrained optimization problem that minimizes on-chip memory usage while maintaining theoretical maximum throughput. The key challenge we address is to analytically express the throughput bottleneck, on-chip memory contention, to enable a lightweight compilation. FPGA prototyping and ASIC synthesis show that, compared to existing approaches, accelerators generated by our framework reduce the on-chip memory usage and/or power consumption by double digits.
Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training
Authors: Luís Carvalho, João Lopes Costa, José Mourão, Gonçalo Oliveira
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Probability (math.PR)
Abstract
Recent developments in applications of artificial neural networks with over $n=10^{14}$ parameters make it extremely important to study the large $n$ behaviour of such networks. Most works studying wide neural networks have focused on the infinite width $n \to +\infty$ limit of such networks and have shown that, at initialization, they correspond to Gaussian processes. In this work we will study their behavior for large, but finite $n$. Our main contributions are the following: (1) The computation of the corrections to Gaussianity in terms of an asymptotic series in $n^{-\frac{1}{2}}$. The coefficients in this expansion are determined by the statistics of parameter initialization and by the activation function. (2) Controlling the evolution of the outputs of finite width $n$ networks, during training, by computing deviations from the limiting infinite width case (in which the network evolves through a linear flow). This improves previous estimates and yields sharper decay rates for the (finite width) NTK in terms of $n$, valid during the entire training procedure. As a corollary, we also prove that, with arbitrarily high probability, the training of sufficiently wide neural networks converges to a global minimum of the corresponding quadratic loss function. (3) Estimating how the deviations from Gaussianity evolve with training in terms of $n$. In particular, using a certain metric in the space of measures we find that, along training, the resulting measure is within $n^{-\frac{1}{2}}(\log n)^{1+}$ of the time dependent Gaussian process corresponding to the infinite width network (which is explicitly given by precomposing the initial Gaussian process with the linear flow corresponding to training in the infinite width limit).
An Online Adaptation Strategy for Direct Data-driven Control
Authors: Johannes Teutsch, Sebastian Ellmaier, Sebastian Kerz, Dirk Wollherr, Marion Leibold
Abstract
The fundamental lemma from behavioral systems theory yields a data-driven non-parametric system representation that has shown great potential for the data-efficient control of unknown linear and weakly nonlinear systems, even in the presence of measurement noise. In this work, we strive to extend the applicability of this paradigm to more strongly nonlinear systems by updating the system representation during control. Unlike existing approaches, our method does not impose suitable excitation to the control inputs, but runs as an observer parallel to the controller. Whenever a rank condition is deemed to be fulfilled, the system representation is updated using newly available datapoints. In a reference tracking simulation of a two-link robotic arm, we showcase the performance of the proposed strategy in a predictive control framework.
CAPOT: Creating Robust Dense Query Encoders using Post Training Contrastive Alignment
Authors: Daniel Campos, ChengXiang Zhai, Alessandro Magnani
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
The success of contextual word representations and advances in neural information retrieval have made dense vector-based retrieval a standard approach for passage and document ranking. While effective and efficient, dual-encoders are brittle to variations in query distributions and noisy queries. Data augmentation can make models more robust but introduces overhead to training set generation and requires retraining and index regeneration. We present Contrastive Alignment POst Training (CAPOT), a highly efficient finetuning method that improves model robustness without requiring index regeneration, the training set optimization, or alteration. CAPOT enables robust retrieval by freezing the document encoder while the query encoder learns to align noisy queries with their unaltered root. We evaluate CAPOT noisy variants of MSMARCO, Natural Questions, and Trivia QA passage retrieval, finding CAPOT has a similar impact as data augmentation with none of its overhead.
TinyDet: Accurate Small Object Detection in Lightweight Generic Detectors
Abstract
Small object detection requires the detection head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors. To accurately detect small objects with limited computation, we propose a two-stage lightweight detection framework with extremely low computation complexity, termed as TinyDet. It enables high-resolution feature maps for dense anchoring to better cover small objects, proposes a sparsely-connected convolution for computation reduction, enhances the early stage features in the backbone, and addresses the feature misalignment problem for accurate small object detection. On the COCO benchmark, our TinyDet-M achieves 30.3 AP and 13.5 AP^s with only 991 MFLOPs, which is the first detector that has an AP over 30 with less than 1 GFLOPs; besides, TinyDet-S and TinyDet-L achieve promising performance under different computation limitation.
Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks
Authors: Hongyang Du, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin (Sherman)Shen, H. Vincent Poor
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Driven by advances in generative artificial intelligence (AI) techniques and algorithms, the widespread adoption of AI-generated content (AIGC) has emerged, allowing for the generation of diverse and high-quality content. Especially, the diffusion model-based AIGC technique has been widely used to generate content in a variety of modalities. However, the real-world implementation of AIGC models, particularly on resource-constrained devices such as mobile phones, introduces significant challenges related to energy consumption and privacy concerns. To further promote the realization of ubiquitous AIGC services, we propose a novel collaborative distributed diffusion-based AIGC framework. By capitalizing on collaboration among devices in wireless networks, the proposed framework facilitates the efficient execution of AIGC tasks, optimizing edge computation resource utilization. Furthermore, we examine the practical implementation of the denoising steps on mobile phones, the impact of the proposed approach on the wireless network-aided AIGC landscape, and the future opportunities associated with its real-world integration. The contributions of this paper not only offer a promising solution to the existing limitations of AIGC services but also pave the way for future research in device collaboration, resource optimization, and the seamless delivery of AIGC services across various devices. Our code is available at https://github.com/HongyangDu/DistributedDiffusion.
Does Prompt-Tuning Language Model Ensure Privacy?
Authors: Shangyu Xie, Wei Dai, Esha Ghosh, Sambuddha Roy, Dan Schwartz, Kim Laine
Abstract
Prompt-tuning has received attention as an efficient tuning method in the language domain, i.e., tuning a prompt that is a few tokens long, while keeping the large language model frozen, yet achieving comparable performance with conventional fine-tuning. Considering the emerging privacy concerns with language models, we initiate the study of privacy leakage in the setting of prompt-tuning. We first describe a real-world email service pipeline to provide customized output for various users via prompt-tuning. Then we propose a novel privacy attack framework to infer users' private information by exploiting the prompt module with user-specific signals. We conduct a comprehensive privacy evaluation on the target pipeline to demonstrate the potential leakage from prompt-tuning. The results also demonstrate the effectiveness of the proposed attack.
Can we learn better with hard samples?
Authors: Subin Sahayam, John Zakkam, Umarani Jayaraman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In deep learning, mini-batch training is commonly used to optimize network parameters. However, the traditional mini-batch method may not learn the under-represented samples and complex patterns in the data, leading to a longer time for generalization. To address this problem, a variant of the traditional algorithm has been proposed, which trains the network focusing on mini-batches with high loss. The study evaluates the effectiveness of the proposed training using various deep neural networks trained on three benchmark datasets (CIFAR-10, CIFAR-100, and STL-10). The deep neural networks used in the study are ResNet-18, ResNet-50, Efficient Net B4, EfficientNetV2-S, and MobilenetV3-S. The experimental results showed that the proposed method can significantly improve the test accuracy and speed up the convergence compared to the traditional mini-batch training method. Furthermore, we introduce a hyper-parameter delta ({\delta}) that decides how many mini-batches are considered for training. Experiments on various values of {\delta} found that the performance of the proposed method for smaller {\delta} values generally results in similar test accuracy and faster generalization. We show that the proposed method generalizes in 26.47% less number of epochs than the traditional mini-batch method in EfficientNet-B4 on STL-10. The proposed method also improves the test top-1 accuracy by 7.26% in ResNet-18 on CIFAR-100.
Continuous Input Embedding Size Search For Recommender Systems
Abstract
Latent factor models are the most popular backbones for today's recommender systems owing to their prominent performance. Latent factor models represent users and items as real-valued embedding vectors for pairwise similarity computation, and all embeddings are traditionally restricted to a uniform size that is relatively large (e.g., 256-dimensional). With the exponentially expanding user base and item catalog in contemporary e-commerce, this design is admittedly becoming memory-inefficient. To facilitate lightweight recommendation, reinforcement learning (RL) has recently opened up opportunities for identifying varying embedding sizes for different users/items. However, challenged by search efficiency and learning an optimal RL policy, existing RL-based methods are restricted to highly discrete, predefined embedding size choices. This leads to a largely overlooked potential of introducing finer granularity into embedding sizes to obtain better recommendation effectiveness under a given memory budget. In this paper, we propose continuous input embedding size search (CIESS), a novel RL-based method that operates on a continuous search space with arbitrary embedding sizes to choose from. In CIESS, we further present an innovative random walk-based exploration strategy to allow the RL policy to efficiently explore more candidate embedding sizes and converge to a better decision. CIESS is also model-agnostic and hence generalizable to a variety of latent factor RSs, whilst experiments on two real-world datasets have shown state-of-the-art performance of CIESS under different memory budgets when paired with three popular recommendation models.
Generative Recommendation: Towards Next-generation Recommender Paradigm
Abstract
Recommender systems typically retrieve items from an item corpus for personalized recommendations. However, such a retrieval-based recommender paradigm faces two limitations: 1) the human-generated items in the corpus might fail to satisfy the users' diverse information needs, and 2) users usually adjust the recommendations via passive and inefficient feedback such as clicks. Nowadays, AI-Generated Content (AIGC) has revealed significant success across various domains, offering the potential to overcome these limitations: 1) generative AI can produce personalized items to meet users' specific information needs, and 2) the newly emerged ChatGPT significantly facilitates users to express information needs more precisely via natural language instructions. In this light, the boom of AIGC points the way towards the next-generation recommender paradigm with two new objectives: 1) generating personalized content through generative AI, and 2) integrating user instructions to guide content generation. To this end, we propose a novel Generative Recommender paradigm named GeneRec, which adopts an AI generator to personalize content generation and leverages user instructions to acquire users' information needs. Specifically, we pre-process users' instructions and traditional feedback (e.g., clicks) via an instructor to output the generation guidance. Given the guidance, we instantiate the AI generator through an AI editor and an AI creator to repurpose existing items and create new items, respectively. Eventually, GeneRec can perform content retrieval, repurposing, and creation to meet users' information needs. Besides, to ensure the trustworthiness of the generated items, we emphasize various fidelity checks such as authenticity and legality checks. Lastly, we study the feasibility of implementing the AI editor and AI creator on micro-video generation, showing promising results.
From Retrieval to Generation: Efficient and Effective Entity Set Expansion
Abstract
Entity Set Expansion (ESE) is a critical task aiming to expand entities of the target semantic class described by a small seed entity set. Most existing ESE methods are retrieval-based frameworks that need to extract the contextual features of entities and calculate the similarity between seed entities and candidate entities. To achieve the two purposes, they should iteratively traverse the corpus and the entity vocabulary provided in the datasets, resulting in poor efficiency and scalability. The experimental results indicate that the time consumed by the retrieval-based ESE methods increases linearly with entity vocabulary and corpus size. In this paper, we firstly propose a generative ESE framework, Generative Entity Set Expansion (GenExpan), which utilizes a generative pre-trained language model to accomplish ESE task. Specifically, a prefix tree is employed to guarantee the validity of entity generation, and automatically generated class names are adopted to guide the model to generate target entities. Moreover, we propose Knowledge Calibration and Generative Ranking to further bridge the gap between generic knowledge of the language model and the goal of ESE task. Experiments on publicly available datasets show that GenExpan is efficient and effective. For efficiency, expansion time consumed by GenExpan is independent of entity vocabulary and corpus size, and GenExpan achieves an average 600% speedup compared to strong baselines. For expansion performance, our framework outperforms previous state-of-the-art ESE methods.
A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction
Authors: Xinshun Wang, Shen Zhao, Chen Chen, Mengyuan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The past few years has witnessed the dominance of Graph Convolutional Networks (GCNs) over human motion prediction, while their performance is still far from satisfactory. Recently, MLP-Mixers show competitive results on top of being more efficient and simple. To extract features, GCNs typically follow an aggregate-and-update paradigm, while Mixers rely on token mixing and channel mixing operations. The two research paths have been independently established in the community. In this paper, we develop a novel perspective by unifying Mixers and GCNs. We show that a mixer layer can be seen as a graph convolutional layer applied to a fully-connected graph with parameterized adjacency. Extending this theoretical finding to the practical side, we propose Meta-Mixing Network (M$^2$-Net). Assisted with a novel zero aggregation operation, our network is capable of capturing both the structure-agnostic and the structure-sensitive dependencies in a collaborative manner. Not only is it computationally efficient, but most importantly, it also achieves state-of-the-art performance. An extensive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that M$^2$-Net consistently outperforms all other approaches. We hope our work brings the community one step further towards truly predictable human motion. Our code will be publicly available.
Applicable Methodologies for the Mass Transfer Phenomenon in Tumble Dryers: A Review
Abstract
Tumble dryers offer a fast and convenient way of drying textiles independent of weather conditions and therefore are frequently used in ordinary households. However, artificial drying of textiles consumes considerable amounts of energy, approximately 8.2 percent of the residential electricity consumption is for drying of textiles in northern European countries (Cranston et al., 2019). Several authors have investigated the aspects of the clothes drying cycle with experimental and numerical methods to understand and improve the process. The first turning point study on understanding the physics of evaporation for tumble dryers was presented by Lambert et al. (1991) in the early 90s. With the aid of Chilton_Colburn analogy, they introduced the concept of area-mass transfer coefficient to address evaporation rate. Afterwards, several experimental or numerical studies were published based on this concept, and furthermore, the model was then developed into 0-dimensional (Deans, 2001) and 1-dimensional (Wei et al., 2017) to gain more accuracy. The evaporation rate is considered to be the main system parameter for dryers with which other performance parameters including drying time, effectiveness, moisture content and efficiency can be estimated. More recent literature focused on utilizing dimensional analysis or image processing techniques to correlate drying indices with system parameters. However, the validity of these regressed models is machine-specific, and hence, cannot be generalized yet. All the previous models for estimating the evaporation rate in tumble dryers are discussed. The review of the related literature showed that all of the previous models for the prediction of the evaporation rate in the clothes dryers have some limitations in terms of accuracy and applicability.
Abstract
Hierarchical reinforcement learning is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we propose a novel hierarchical algorithm by generating a curriculum of achievable subgoals for evolving lower-level primitives using reinforcement learning and imitation learning. The lower level primitive periodically performs data relabeling on a handful of expert demonstrations using our primitive informed parsing approach. We provide expressions to bound the sub-optimality of our method and develop a practical algorithm for hierarchical reinforcement learning. Since our approach uses a handful of expert demonstrations, it is suitable for most robotic control tasks. Experimental evaluation on complex maze navigation and robotic manipulation environments show that inducing hierarchical curriculum learning significantly improves sample efficiency, and results in efficient goal conditioned policies for solving temporally extended tasks.
ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions
Authors: Sibei Chen, Hanbing Liu, Weiting Jin, Xiangyu Sun, Xiaoyao Feng, Ju Fan, Xiaoyong Du, Nan Tang
Abstract
Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it is known to be time and effort consuming. Despite the impressive capabilities of large language models like ChatGPT in generating programs by interacting with users through natural language prompts, there are still limitations. Specifically, a user must provide specific prompts to iteratively guide ChatGPT in improving data preparation programs, which requires a certain level of expertise in programming, the dataset used and the ML task. Moreover, once a program has been generated, it is non-trivial to revisit a previous version or make changes to the program without starting the process over again. In this paper, we present ChatPipe, a novel system designed to facilitate seamless interaction between users and ChatGPT. ChatPipe provides users with effective recommendation on next data preparation operations, and guides ChatGPT to generate program for the operations. Also, ChatPipe enables users to easily roll back to previous versions of the program, which facilitates more efficient experimentation and testing. We have developed a web application for ChatPipe and prepared several real-world ML tasks from Kaggle. These tasks can showcase the capabilities of ChatPipe and enable VLDB attendees to easily experiment with our novel features to rapidly orchestrate a high-quality data preparation program.
Towards Automated 3D Search Planning for Emergency Response Missions
Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
The ability to efficiently plan and execute automated and precise search missions using unmanned aerial vehicles (UAVs) during emergency response situations is imperative. Precise navigation between obstacles and time-efficient searching of 3D structures and buildings are essential for locating survivors and people in need in emergency response missions. In this work we address this challenging problem by proposing a unified search planning framework that automates the process of UAV-based search planning in 3D environments. Specifically, we propose a novel search planning framework which enables automated planning and execution of collision-free search trajectories in 3D by taking into account low-level mission constrains (e.g., the UAV dynamical and sensing model), mission objectives (e.g., the mission execution time and the UAV energy efficiency) and user-defined mission specifications (e.g., the 3D structures to be searched and minimum detection probability constraints). The capabilities and performance of the proposed approach are demonstrated through extensive simulated 3D search scenarios.
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Authors: Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao
Abstract
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. With the increasing demands on computational capacity, though numerous studies have explored the efficient training, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated. In this survey, we present a detailed review for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) data-centric: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) model-centric, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters; (3) optimization-centric, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) budgeted training, including some distinctive acceleration methods on source-constrained situations; (5) system-centric, including some efficient open-source distributed libraries/systems which provide adequate hardware support for the implementation of acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction.
ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation
Authors: Xiaoming Zhao, Xingming Wu, Weihai Chen, Peter C. Y. Chen, Qingsong Xu, Zhengguo Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Image keypoints and descriptors play a crucial role in many visual measurement tasks. In recent years, deep neural networks have been widely used to improve the performance of keypoint and descriptor extraction. However, the conventional convolution operations do not provide the geometric invariance required for the descriptor. To address this issue, we propose the Sparse Deformable Descriptor Head (SDDH), which learns the deformable positions of supporting features for each keypoint and constructs deformable descriptors. Furthermore, SDDH extracts descriptors at sparse keypoints instead of a dense descriptor map, which enables efficient extraction of descriptors with strong expressiveness. In addition, we relax the neural reprojection error (NRE) loss from dense to sparse to train the extracted sparse descriptors. Experimental results show that the proposed network is both efficient and powerful in various visual measurement tasks, including image matching, 3D reconstruction, and visual relocalization.
Abstract
In this paper we consider the closest vector problem (CVP) for lattices $\Lambda \subseteq \mathbb{Z}^n$ given by a generator matrix $A\in \mathcal{M}_{n\times n}(\mathbb{Z})$. Let $b>0$ be the maximum of the absolute values of the entries of the matrix $A$. We prove that the CVP can be reduced in polynomial time to a quadratic unconstrained binary optimization (QUBO) problem in $O(n^2(\log(n)+\log(b)))$ binary variables, where the length of the coefficients in the corresponding quadratic form is $O(n(\log(n)+\log(b)))$.
FedDiSC: A Computation-efficient Federated Learning Framework for Power Systems Disturbance and Cyber Attack Discrimination
Authors: Muhammad Akbar Husnoo, Adnan Anwar, Haftu Tasew Reda, Nasser Hosseinzadeh, Shama Naz Islam, Abdun Naser Mahmood, Robin Doss
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract
With the growing concern about the security and privacy of smart grid systems, cyberattacks on critical power grid components, such as state estimation, have proven to be one of the top-priority cyber-related issues and have received significant attention in recent years. However, cyberattack detection in smart grids now faces new challenges, including privacy preservation and decentralized power zones with strategic data owners. To address these technical bottlenecks, this paper proposes a novel Federated Learning-based privacy-preserving and communication-efficient attack detection framework, known as FedDiSC, that enables Discrimination between power System disturbances and Cyberattacks. Specifically, we first propose a Federated Learning approach to enable Supervisory Control and Data Acquisition subsystems of decentralized power grid zones to collaboratively train an attack detection model without sharing sensitive power related data. Secondly, we put forward a representation learning-based Deep Auto-Encoder network to accurately detect power system and cybersecurity anomalies. Lastly, to adapt our proposed framework to the timeliness of real-world cyberattack detection in SGs, we leverage the use of a gradient privacy-preserving quantization scheme known as DP-SIGNSGD to improve its communication efficiency. Extensive simulations of the proposed framework on publicly available Industrial Control Systems datasets demonstrate that the proposed framework can achieve superior detection accuracy while preserving the privacy of sensitive power grid related information. Furthermore, we find that the gradient quantization scheme utilized improves communication efficiency by 40% when compared to a traditional federated learning approach without gradient quantization which suggests suitability in a real-world scenario.
SCART: Simulation of Cyber Attacks for Real-Time
Authors: Kfir Girstein, Eliron Rahimi, Prof. Avi Mendelson
Abstract
Real-Time systems are often implemented as reactive systems that respond to stimuli and complete tasks in a known bounded time. The development process of such systems usually involves using a cycle-accurate simulation environment and even the digital twine system that can accurately simulate the system and the environment it operates in. In addition, many real-time systems require high reliability and strive to be immune against security attacks. Thus, the development environment must support reliability-related events such as the failure of a sensor, malfunction of a subsystem, and foreseen events of Cyber security attacks. This paper presents the SCART framework - an innovative solution that aims to allow extending simulation environments of real-time systems with the capability to incorporate reliability-related events and advanced cyber security attacks, e.g., an attack on a single sensor as well as "complex security attacks" that aim to change the behavior of a group of sensors. We validate our system by applying the new proposed environment on control a drone's flight control system including its navigation system that uses machine learning algorithms. Such a system is very challenging since it requires many experiments that can hardly be achieved by using live systems. We showed that using SCART is very efficient, can increase the model's accuracy, and significantly reduce false-positive rates. Some of these experiments were also validated using a set of "real drones".
DATE: Domain Adaptive Product Seeker for E-commerce
Authors: Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Product Retrieval (PR) and Grounding (PG), aiming to seek image and object-level products respectively according to a textual query, have attracted great interest recently for better shopping experience. Owing to the lack of relevant datasets, we collect two large-scale benchmark datasets from Taobao Mall and Live domains with about 474k and 101k image-query pairs for PR, and manually annotate the object bounding boxes in each image for PG. As annotating boxes is expensive and time-consuming, we attempt to transfer knowledge from annotated domain to unannotated for PG to achieve un-supervised Domain Adaptation (PG-DA). We propose a {\bf D}omain {\bf A}daptive Produc{\bf t} S{\bf e}eker ({\bf DATE}) framework, regarding PR and PG as Product Seeking problem at different levels, to assist the query {\bf date} the product. Concretely, we first design a semantics-aggregated feature extractor for each modality to obtain concentrated and comprehensive features for following efficient retrieval and fine-grained grounding tasks. Then, we present two cooperative seekers to simultaneously search the image for PR and localize the product for PG. Besides, we devise a domain aligner for PG-DA to alleviate uni-modal marginal and multi-modal conditional distribution shift between source and target domains, and design a pseudo box generator to dynamically select reliable instances and generate bounding boxes for further knowledge transfer. Extensive experiments show that our DATE achieves satisfactory performance in fully-supervised PR, PG and un-supervised PG-DA. Our desensitized datasets will be publicly available here\footnote{\url{https://github.com/Taobao-live/Product-Seeking}}.
Sound Dynamic Deadlock Prediction in Linear Time
Authors: Umang Mathur, Andreas Pavlogiannis, Hünkar Can Tunç, Mahesh Viswanathan
Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
Abstract
Deadlocks are one of the most notorious concurrency bugs, and significant research has focused on detecting them efficiently. Dynamic predictive analyses work by observing concurrent executions, and reason about alternative interleavings that can witness concurrency bugs. Such techniques offer scalability and sound bug reports, and have emerged as an effective approach for concurrency bug detection, such as data races. Effective dynamic deadlock prediction, however, has proven a challenging task, as no deadlock predictor currently meets the requirements of soundness, high-precision, and efficiency. In this paper, we first formally establish that this tradeoff is unavoidable, by showing that (a) sound and complete deadlock prediction is intractable, in general, and (b) even the seemingly simpler task of determining the presence of potential deadlocks, which often serve as unsound witnesses for actual predictable deadlocks, is intractable. The main contribution of this work is a new class of predictable deadlocks, called sync(hronization)-preserving deadlocks. Informally, these are deadlocks that can be predicted by reordering the observed execution while preserving the relative order of conflicting critical sections. We present two algorithms for sound deadlock prediction based on this notion. Our first algorithm SyncPDOffline detects all sync-preserving deadlocks, with running time that is linear per abstract deadlock pattern, a novel notion also introduced in this work. Our second algorithm SyncPDOnline predicts all sync-preserving deadlocks that involve two threads in a strictly online fashion, runs in overall linear time, and is better suited for a runtime monitoring setting. We implemented both our algorithms and evaluated their ability to perform offline and online deadlock-prediction on a large dataset of standard benchmarks.
On the Importance of Contrastive Loss in Multimodal Learning
Authors: Yunwei Ren, Yuanzhi Li
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recently, contrastive learning approaches (e.g., CLIP (Radford et al., 2021)) have received huge success in multimodal learning, where the model tries to minimize the distance between the representations of different views (e.g., image and its caption) of the same data point while keeping the representations of different data points away from each other. However, from a theoretical perspective, it is unclear how contrastive learning can learn the representations from different views efficiently, especially when the data is not isotropic. In this work, we analyze the training dynamics of a simple multimodal contrastive learning model and show that contrastive pairs are important for the model to efficiently balance the learned representations. In particular, we show that the positive pairs will drive the model to align the representations at the cost of increasing the condition number, while the negative pairs will reduce the condition number, keeping the learned representations balanced.
Keyword: faster
Hardware-Aware Static Optimization of Hyperdimensional Computations
Authors: Pu Yi, Sara Achour
Subjects: Programming Languages (cs.PL); Information Theory (cs.IT)
Abstract
Hyperdimensional (HD) computing is an highly error-resilient computational paradigm that can be used to efficiently perform language classification, data retrieval, and analogical reasoning tasks on error-prone emerging hardware technologies. HD computation is storage-inefficient and often requires computing over 10,000-dimensional bit vectors. Prior work either leaves hypervectors unoptimized or dynamically tunes HD computation parameters (e.g., hypervector dimension) to deliver the desired accuracy. These approaches are time-consuming, lack accuracy guarantees, and do not generalize well. We present Heim, a framework for statically optimizing HD computation parameters to minimize resource usage in the presence of hardware error. Heim guarantees the optimized computation satisfies a user-provided target accuracy. Heim deploys a novel analysis procedure that unifies theoretical results in HD computing to systematically optimize HD computation. We develop four analysis-amenable data structures that leverage Heim to perform aggressive space-saving optimizations, and optimize these data structures to attain 99% query accuracy on both binary memory and multiple-bit-per-cell resistive memory. Heim-optimized data structures deliver 1.31x-14.51x reductions in hypervector size and 2.191x-27.27x reductions in memory usage while attaining 98.96-99.75% accuracy. Heim-optimized data structures deliver up to 41.40% accuracy improvements over dynamically tuned parameters. Heim computes parameters significantly faster than dynamic approaches.
TopNet: Transformer-based Object Placement Network for Image Compositing
Authors: Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We investigate the problem of automatically placing an object into a background image for image compositing. Given a background image and a segmented object, the goal is to train a model to predict plausible placements (location and scale) of the object for compositing. The quality of the composite image highly depends on the predicted location/scale. Existing works either generate candidate bounding boxes or apply sliding-window search using global representations from background and object images, which fail to model local information in background images. However, local clues in background images are important to determine the compatibility of placing the objects with certain locations/scales. In this paper, we propose to learn the correlation between object features and all local background features with a transformer module so that detailed information can be provided on all possible location/scale configurations. A sparse contrastive loss is further proposed to train our model with sparse supervision. Our new formulation generates a 3D heatmap indicating the plausibility of all location/scale combinations in one network forward pass, which is over 10 times faster than the previous sliding-window method. It also supports interactive search when users provide a pre-defined location or scale. The proposed method can be trained with explicit annotation or in a self-supervised manner using an off-the-shelf inpainting model, and it outperforms state-of-the-art methods significantly. The user study shows that the trained model generalizes well to real-world images with diverse challenging scenes and object categories.
Scalable Causal Discovery with Score Matching
Authors: Francesco Montagna, Nicoletta Noceti, Lorenzo Rosasco, Kun Zhang, Francesco Locatello
Abstract
This paper demonstrates how to discover the whole causal graph from the second derivative of the log-likelihood in observational non-linear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function $\nabla \log p(\mathbf{X})$, we extend the work of Rolland et al. (2022) that only recovers the topological order from the score and requires an expensive pruning step removing spurious edges among those admitted by the ordering. Our analysis leads to DAS (acronym for Discovery At Scale), a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current state-of-the-art while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar.
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Abstract
Recent advances in personalized image generation allow a pre-trained text-to-image model to learn a new concept from a set of images. However, existing personalization approaches usually require heavy test-time finetuning for each concept, which is time-consuming and difficult to scale. We propose InstantBooth, a novel approach built upon pre-trained text-to-image models that enables instant text-guided image personalization without any test-time finetuning. We achieve this with several major components. First, we learn the general concept of the input images by converting them to a textual token with a learnable image encoder. Second, to keep the fine details of the identity, we learn rich visual feature representation by introducing a few adapter layers to the pre-trained model. We train our components only on text-image pairs without using paired images of the same concept. Compared to test-time finetuning-based methods like DreamBooth and Textual-Inversion, our model can generate competitive results on unseen concepts concerning language-image alignment, image fidelity, and identity preservation while being 100 times faster.
Convex Minimization with Integer Minima in $\widetilde O(n^4)$ Time
Authors: Haotian Jiang, Yin Tat Lee, Zhao Song, Lichen Zhang
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Optimization and Control (math.OC)
Abstract
Given a convex function $f$ on $\mathbb{R}^n$ with an integer minimizer, we show how to find an exact minimizer of $f$ using $O(n^2 \log n)$ calls to a separation oracle and $O(n^4 \log n)$ time. The previous best polynomial time algorithm for this problem given in [Jiang, SODA 2021, JACM 2022] achieves $\widetilde{O}(n^2)$ oracle complexity. However, the overall runtime of Jiang's algorithm is at least $\widetilde{\Omega}(n^8)$, due to expensive sub-routines such as the Lenstra-Lenstra-Lov\'asz (LLL) algorithm [Lenstra, Lenstra, Lov\'asz, Math. Ann. 1982] and random walk based cutting plane method [Bertsimas, Vempala, JACM 2004]. Our significant speedup is obtained by a nontrivial combination of a faster version of the LLL algorithm due to [Neumaier, Stehl\'e, ISSAC 2016] that gives similar guarantees, the volumetric center cutting plane method (CPM) by [Vaidya, FOCS 1989] and its fast implementation given in [Jiang, Lee, Song, Wong, STOC 2020]. For the special case of submodular function minimization (SFM), our result implies a strongly polynomial time algorithm for this problem using $O(n^3 \log n)$ calls to an evaluation oracle and $O(n^4 \log n)$ additional arithmetic operations. Both the oracle complexity and the number of arithmetic operations of our more general algorithm are better than the previous best-known runtime algorithms for this specific problem given in [Lee, Sidford, Wong, FOCS 2015] and [Dadush, V\'egh, Zambelli, SODA 2018, MOR 2021].
Can we learn better with hard samples?
Authors: Subin Sahayam, John Zakkam, Umarani Jayaraman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In deep learning, mini-batch training is commonly used to optimize network parameters. However, the traditional mini-batch method may not learn the under-represented samples and complex patterns in the data, leading to a longer time for generalization. To address this problem, a variant of the traditional algorithm has been proposed, which trains the network focusing on mini-batches with high loss. The study evaluates the effectiveness of the proposed training using various deep neural networks trained on three benchmark datasets (CIFAR-10, CIFAR-100, and STL-10). The deep neural networks used in the study are ResNet-18, ResNet-50, Efficient Net B4, EfficientNetV2-S, and MobilenetV3-S. The experimental results showed that the proposed method can significantly improve the test accuracy and speed up the convergence compared to the traditional mini-batch training method. Furthermore, we introduce a hyper-parameter delta ({\delta}) that decides how many mini-batches are considered for training. Experiments on various values of {\delta} found that the performance of the proposed method for smaller {\delta} values generally results in similar test accuracy and faster generalization. We show that the proposed method generalizes in 26.47% less number of epochs than the traditional mini-batch method in EfficientNet-B4 on STL-10. The proposed method also improves the test top-1 accuracy by 7.26% in ResNet-18 on CIFAR-100.
Pallet Detection from Synthetic Data Using Game Engines
Authors: Jouveer Naidoo, Nicholas Bates, Trevor Gee, Mahla Nejati
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
This research sets out to assess the viability of using game engines to generate synthetic training data for machine learning in the context of pallet segmentation. Using synthetic data has been proven in prior research to be a viable means of training neural networks and saves hours of manual labour due to the reduced need for manual image annotation. Machine vision for pallet detection can benefit from synthetic data as the industry increases the development of autonomous warehousing technologies. As per our methodology, we developed a tool capable of automatically generating large amounts of annotated training data from 3D models at pixel-perfect accuracy and a much faster rate than manual approaches. Regarding image segmentation, a Mask R-CNN pipeline was used, which achieved an AP50 of 86% for individual pallets.
Keyword: mobile
Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks
Authors: Hongyang Du, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin (Sherman)Shen, H. Vincent Poor
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Driven by advances in generative artificial intelligence (AI) techniques and algorithms, the widespread adoption of AI-generated content (AIGC) has emerged, allowing for the generation of diverse and high-quality content. Especially, the diffusion model-based AIGC technique has been widely used to generate content in a variety of modalities. However, the real-world implementation of AIGC models, particularly on resource-constrained devices such as mobile phones, introduces significant challenges related to energy consumption and privacy concerns. To further promote the realization of ubiquitous AIGC services, we propose a novel collaborative distributed diffusion-based AIGC framework. By capitalizing on collaboration among devices in wireless networks, the proposed framework facilitates the efficient execution of AIGC tasks, optimizing edge computation resource utilization. Furthermore, we examine the practical implementation of the denoising steps on mobile phones, the impact of the proposed approach on the wireless network-aided AIGC landscape, and the future opportunities associated with its real-world integration. The contributions of this paper not only offer a promising solution to the existing limitations of AIGC services but also pave the way for future research in device collaboration, resource optimization, and the seamless delivery of AIGC services across various devices. Our code is available at https://github.com/HongyangDu/DistributedDiffusion.
Can we learn better with hard samples?
Authors: Subin Sahayam, John Zakkam, Umarani Jayaraman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In deep learning, mini-batch training is commonly used to optimize network parameters. However, the traditional mini-batch method may not learn the under-represented samples and complex patterns in the data, leading to a longer time for generalization. To address this problem, a variant of the traditional algorithm has been proposed, which trains the network focusing on mini-batches with high loss. The study evaluates the effectiveness of the proposed training using various deep neural networks trained on three benchmark datasets (CIFAR-10, CIFAR-100, and STL-10). The deep neural networks used in the study are ResNet-18, ResNet-50, Efficient Net B4, EfficientNetV2-S, and MobilenetV3-S. The experimental results showed that the proposed method can significantly improve the test accuracy and speed up the convergence compared to the traditional mini-batch training method. Furthermore, we introduce a hyper-parameter delta ({\delta}) that decides how many mini-batches are considered for training. Experiments on various values of {\delta} found that the performance of the proposed method for smaller {\delta} values generally results in similar test accuracy and faster generalization. We show that the proposed method generalizes in 26.47% less number of epochs than the traditional mini-batch method in EfficientNet-B4 on STL-10. The proposed method also improves the test top-1 accuracy by 7.26% in ResNet-18 on CIFAR-100.
Cell-Edge Performance Booster in 6G: Cell-Free Massive MIMO vs. Reconfigurable Intelligent Surface
Authors: Wei Jiang, Hans D. Schotten
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
User experience in mobile communications is vulnerable to worse quality at the cell edge, which cannot be compensated by enjoying excellent service at the cell center, according to the principle of risk aversion in behavioral economics. Constrained by weak signal strength and substantial inter-cell interference, the cell edge is always a major bottleneck of any mobile network. Due to their possibility for empowering the next-generation mobile system, reconfigurable intelligent surface (RIS) and cell-free massive MIMO (CFmMIMO) have recently attracted a lot of focus from academia and industry. In addition to a variety of technological advantages, both are highly potential to boost cell-edge performance. To the authors' best knowledge, a performance comparison of RIS and CFmMIMO, especially on the cell edge, is still missing in the literature. To fill this gap, this paper establishes a fair scenario and demonstrates extensive numerical results to clarify their behaviors at the cell edge.
RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking
Abstract
Active Object Tracking (AOT) aims to maintain a specific relation between the tracker and object(s) by autonomously controlling the motion system of a tracker given observations. AOT has wide-ranging applications, such as in mobile robots and autonomous driving. However, building a generalizable active tracker that works robustly across different scenarios remains a challenge, especially in unstructured environments with cluttered obstacles and diverse layouts. We argue that constructing a state representation capable of modeling the geometry structure of the surroundings and the dynamics of the target is crucial for achieving this goal. To address this challenge, we present RSPT, a framework that forms a structure-aware motion representation by Reconstructing the Surroundings and Predicting the target Trajectory. Additionally, we enhance the generalization of the policy network by training in an asymmetric dueling mechanism. We evaluate RSPT on various simulated scenarios and show that it outperforms existing methods in unseen environments, particularly those with complex obstacles and layouts. We also demonstrate the successful transfer of RSPT to real-world settings. Project Website: https://sites.google.com/view/aot-rspt.
Keyword: pruning
Scalable Causal Discovery with Score Matching
Authors: Francesco Montagna, Nicoletta Noceti, Lorenzo Rosasco, Kun Zhang, Francesco Locatello
Abstract
This paper demonstrates how to discover the whole causal graph from the second derivative of the log-likelihood in observational non-linear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function $\nabla \log p(\mathbf{X})$, we extend the work of Rolland et al. (2022) that only recovers the topological order from the score and requires an expensive pruning step removing spurious edges among those admitted by the ordering. Our analysis leads to DAS (acronym for Discovery At Scale), a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current state-of-the-art while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar.
Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting
Authors: Fangyin Wei, Thomas Funkhouser, Szymon Rusinkiewicz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Removing clutter from scenes is essential in many applications, ranging from privacy-concerned content filtering to data augmentation. In this work, we present an automatic system that removes clutter from 3D scenes and inpaints with coherent geometry and texture. We propose techniques for its two key components: 3D segmentation from shared properties and 3D inpainting, both of which are important porblems. The definition of 3D scene clutter (frequently-moving objects) is not well captured by commonly-studied object categories in computer vision. To tackle the lack of well-defined clutter annotations, we group noisy fine-grained labels, leverage virtual rendering, and impose an instance-level area-sensitive loss. Once clutter is removed, we inpaint geometry and texture in the resulting holes by merging inpainted RGB-D images. This requires novel voting and pruning strategies that guarantee multi-view consistency across individually inpainted images for mesh reconstruction. Experiments on ScanNet and Matterport dataset show that our method outperforms baselines for clutter segmentation and 3D inpainting, both visually and quantitatively.
Keyword: voxel
On the Suitability of Representations for Quality Diversity Optimization of Shapes
Authors: Ludovico Scarton, Alexander Hagg
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
The representation, or encoding, utilized in evolutionary algorithms has a substantial effect on their performance. Examination of the suitability of widely used representations for quality diversity optimization (QD) in robotic domains has yielded inconsistent results regarding the most appropriate encoding method. Given the domain-dependent nature of QD, additional evidence from other domains is necessary. This study compares the impact of several representations, including direct encoding, a dictionary-based representation, parametric encoding, compositional pattern producing networks, and cellular automata, on the generation of voxelized meshes in an architecture setting. The results reveal that some indirect encodings outperform direct encodings and can generate more diverse solution sets, especially when considering full phenotypic diversity. The paper introduces a multi-encoding QD approach that incorporates all evaluated representations in the same archive. Species of encodings compete on the basis of phenotypic features, leading to an approach that demonstrates similar performance to the best single-encoding QD approach. This is noteworthy, as it does not always require the contribution of the best-performing single encoding.
Keyword: lidar
There is no result
Keyword: diffusion
Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models
Authors: Guanhua Zhang, Jiabao Ji, Yang Zhang, Mo Yu, Tommi Jaakkola, Shiyu Chang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Image inpainting refers to the task of generating a complete, natural image based on a partially revealed reference image. Recently, many research interests have been focused on addressing this problem using fixed diffusion models. These approaches typically directly replace the revealed region of the intermediate or final generated images with that of the reference image or its variants. However, since the unrevealed regions are not directly modified to match the context, it results in incoherence between revealed and unrevealed regions. To address the incoherence problem, a small number of methods introduce a rigorous Bayesian framework, but they tend to introduce mismatches between the generated and the reference images due to the approximation errors in computing the posterior distributions. In this paper, we propose COPAINT, which can coherently inpaint the whole image without introducing mismatches. COPAINT also uses the Bayesian framework to jointly modify both revealed and unrevealed regions, but approximates the posterior distribution in a way that allows the errors to gradually drop to zero throughout the denoising steps, thus strongly penalizing any mismatches with the reference image. Our experiments verify that COPAINT can outperform the existing diffusion-based methods under both objective and subjective metrics. The codes are available at https://github.com/UCSB-NLP-Chang/CoPaint/.
Training-Free Layout Control with Cross-Attention Guidance
Authors: Minghao Chen, Iro Laina, Andrea Vedaldi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent diffusion-based generators can produce high-quality images based only on textual prompts. However, they do not correctly interpret instructions that specify the spatial layout of the composition. We propose a simple approach that can achieve robust layout control without requiring training or fine-tuning the image generator. Our technique, which we call layout guidance, manipulates the cross-attention layers that the model uses to interface textual and visual information and steers the reconstruction in the desired direction given, e.g., a user-specified layout. In order to determine how to best guide attention, we study the role of different attention maps when generating images and experiment with two alternative strategies, forward and backward guidance. We evaluate our method quantitatively and qualitatively with several experiments, validating its effectiveness. We further demonstrate its versatility by extending layout guidance to the task of editing the layout and context of a given real image.
RoSteALS: Robust Steganography using Autoencoder Latent Space
Authors: Tu Bui, Shruti Agarwal, Ning Yu, John Collomosse
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Data hiding such as steganography and invisible watermarking has important applications in copyright protection, privacy-preserved communication and content provenance. Existing works often fall short in either preserving image quality, or robustness against perturbations or are too complex to train. We propose RoSteALS, a practical steganography technique leveraging frozen pretrained autoencoders to free the payload embedding from learning the distribution of cover images. RoSteALS has a light-weight secret encoder of just 300k parameters, is easy to train, has perfect secret recovery performance and comparable image quality on three benchmarks. Additionally, RoSteALS can be adapted for novel cover-less steganography applications in which the cover image can be sampled from noise or conditioned on text prompts via a denoising diffusion process. Our model and code are available at \url{https://github.com/TuBui/RoSteALS}.
Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks
Authors: Hongyang Du, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin (Sherman)Shen, H. Vincent Poor
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Driven by advances in generative artificial intelligence (AI) techniques and algorithms, the widespread adoption of AI-generated content (AIGC) has emerged, allowing for the generation of diverse and high-quality content. Especially, the diffusion model-based AIGC technique has been widely used to generate content in a variety of modalities. However, the real-world implementation of AIGC models, particularly on resource-constrained devices such as mobile phones, introduces significant challenges related to energy consumption and privacy concerns. To further promote the realization of ubiquitous AIGC services, we propose a novel collaborative distributed diffusion-based AIGC framework. By capitalizing on collaboration among devices in wireless networks, the proposed framework facilitates the efficient execution of AIGC tasks, optimizing edge computation resource utilization. Furthermore, we examine the practical implementation of the denoising steps on mobile phones, the impact of the proposed approach on the wireless network-aided AIGC landscape, and the future opportunities associated with its real-world integration. The contributions of this paper not only offer a promising solution to the existing limitations of AIGC services but also pave the way for future research in device collaboration, resource optimization, and the seamless delivery of AIGC services across various devices. Our code is available at https://github.com/HongyangDu/DistributedDiffusion.
Compressed Regression over Adaptive Networks
Authors: Marco Carpentiero, Vincenzo Matta, Ali H. Sayed
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
In this work we derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem. Agents employ the recently proposed ACTC (adapt-compress-then-combine) diffusion strategy, where the signals exchanged locally by neighboring agents are encoded with randomized differential compression operators. We provide a detailed characterization of the mean-square estimation error, which is shown to comprise a term related to the error that agents would achieve without communication constraints, plus a term arising from compression. The analysis reveals quantitative relationships between the compression loss and fundamental attributes of the distributed regression problem, in particular, the stochastic approximation error caused by the gradient noise and the network topology (through the Perron eigenvector). We show that knowledge of such relationships is critical to allocate optimally the communication resources across the agents, taking into account their individual attributes, such as the quality of their data or their degree of centrality in the network topology. We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents. Illustrative examples show that a significant performance improvement, as compared to a blind (i.e., uniform) resource allocation, can be achieved by optimizing the allocation by means of the provided mean-square-error formulas.
Keyword: dynamic
Adaptive Feature Fusion: Enhancing Generalization in Deep Learning Models
Authors: Neelesh Mungoli
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In recent years, deep learning models have demonstrated remarkable success in various domains, such as computer vision, natural language processing, and speech recognition. However, the generalization capabilities of these models can be negatively impacted by the limitations of their feature fusion techniques. This paper introduces an innovative approach, Adaptive Feature Fusion (AFF), to enhance the generalization of deep learning models by dynamically adapting the fusion process of feature representations. The proposed AFF framework is designed to incorporate fusion layers into existing deep learning architectures, enabling seamless integration and improved performance. By leveraging a combination of data-driven and model-based fusion strategies, AFF is able to adaptively fuse features based on the underlying data characteristics and model requirements. This paper presents a detailed description of the AFF framework, including the design and implementation of fusion layers for various architectures. Extensive experiments are conducted on multiple benchmark datasets, with the results demonstrating the superiority of the AFF approach in comparison to traditional feature fusion techniques. The analysis showcases the effectiveness of AFF in enhancing generalization capabilities, leading to improved performance across different tasks and applications. Finally, the paper discusses various real-world use cases where AFF can be employed, providing insights into its practical applicability. The conclusion highlights the potential for future research directions, including the exploration of advanced fusion strategies and the extension of AFF to other machine learning paradigms.
Hardware-Aware Static Optimization of Hyperdimensional Computations
Authors: Pu Yi, Sara Achour
Subjects: Programming Languages (cs.PL); Information Theory (cs.IT)
Abstract
Hyperdimensional (HD) computing is an highly error-resilient computational paradigm that can be used to efficiently perform language classification, data retrieval, and analogical reasoning tasks on error-prone emerging hardware technologies. HD computation is storage-inefficient and often requires computing over 10,000-dimensional bit vectors. Prior work either leaves hypervectors unoptimized or dynamically tunes HD computation parameters (e.g., hypervector dimension) to deliver the desired accuracy. These approaches are time-consuming, lack accuracy guarantees, and do not generalize well. We present Heim, a framework for statically optimizing HD computation parameters to minimize resource usage in the presence of hardware error. Heim guarantees the optimized computation satisfies a user-provided target accuracy. Heim deploys a novel analysis procedure that unifies theoretical results in HD computing to systematically optimize HD computation. We develop four analysis-amenable data structures that leverage Heim to perform aggressive space-saving optimizations, and optimize these data structures to attain 99% query accuracy on both binary memory and multiple-bit-per-cell resistive memory. Heim-optimized data structures deliver 1.31x-14.51x reductions in hypervector size and 2.191x-27.27x reductions in memory usage while attaining 98.96-99.75% accuracy. Heim-optimized data structures deliver up to 41.40% accuracy improvements over dynamically tuned parameters. Heim computes parameters significantly faster than dynamic approaches.
Spintronic Physical Reservoir for Autonomous Prediction and Long-Term Household Energy Load Forecasting
Authors: Walid Al Misba, Harindra S. Mavikumbure, Md Mahadi Rajib, Daniel L. Marino, Victor Cobilean, Milos Manic, Jayasimha Atulasimha
Abstract
In this study, we have shown autonomous long-term prediction with a spintronic physical reservoir. Due to the short-term memory property of the magnetization dynamics, non-linearity arises in the reservoir states which could be used for long-term prediction tasks using simple linear regression for online training. During the prediction stage, the output is directly fed to the input of the reservoir for autonomous prediction. We employ our proposed reservoir for the modeling of the chaotic time series such as Mackey-Glass and dynamic time-series data, such as household building energy loads. Since only the last layer of a RC needs to be trained with linear regression, it is well suited for learning in real time on edge devices. Here we show that a skyrmion based magnetic tunnel junction can potentially be used as a prototypical RC but any nanomagnetic magnetic tunnel junction with nonlinear magnetization behavior can implement such a RC. By comparing our spintronic physical RC approach with state-of-the-art energy load forecasting algorithms, such as LSTMs and RNNs, we conclude that the proposed framework presents good performance in achieving high predictions accuracy, while also requiring low memory and energy both of which are at a premium in hardware resource and power constrained edge applications. Further, the proposed approach is shown to require very small training datasets and at the same time being at least 16X energy efficient compared to the state-of-the-art sequence to sequence LSTM for accurate household load predictions.
Robust Decision-Focused Learning for Reward Transfer
Authors: Abhishek Sharma, Sonali Parbhoo, Omer Gottesman, Finale Doshi-Velez
Abstract
Decision-focused (DF) model-based reinforcement learning has recently been introduced as a powerful algorithm which can focus on learning the MDP dynamics which are most relevant for obtaining high rewards. While this approach increases the performance of agents by focusing the learning towards optimizing for the reward directly, it does so by learning less accurate dynamics (from a MLE standpoint), and may thus be brittle to changes in the reward function. In this work, we develop the robust decision-focused (RDF) algorithm which leverages the non-identifiability of DF solutions to learn models which maximize expected returns while simultaneously learning models which are robust to changes in the reward function. We demonstrate on a variety of toy example and healthcare simulators that RDF significantly increases the robustness of DF to changes in the reward function, without decreasing the overall return the agent obtains.
Interpretable statistical representations of neural population dynamics and geometry
Authors: Adam Gosztolai, Robert L. Peach, Alexis Arnaudon, Mauricio Barahona, Pierre Vandergheynst
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
Abstract
The dynamics of neuron populations during diverse tasks often evolve on low-dimensional manifolds. However, it remains challenging to discern the contributions of geometry and dynamics for encoding relevant behavioural variables. Here, we introduce an unsupervised geometric deep learning framework for representing non-linear dynamical systems based on statistical distributions of local phase portrait features. Our method provides robust geometry-aware or geometry-agnostic representations for the unbiased comparison of dynamics based on measured trajectories. We demonstrate that our statistical representation can generalise across neural network instances to discriminate computational mechanisms, obtain interpretable embeddings of neural dynamics in a primate reaching task with geometric correspondence to hand kinematics, and develop a decoding algorithm with state-of-the-art accuracy. Our results highlight the importance of using the intrinsic manifold structure over temporal information to develop better decoding algorithms and assimilate data across experiments.
EZClone: Improving DNN Model Extraction Attack via Shape Distillation from GPU Execution Profiles
Abstract
Deep Neural Networks (DNNs) have become ubiquitous due to their performance on prediction and classification problems. However, they face a variety of threats as their usage spreads. Model extraction attacks, which steal DNNs, endanger intellectual property, data privacy, and security. Previous research has shown that system-level side-channels can be used to leak the architecture of a victim DNN, exacerbating these risks. We propose two DNN architecture extraction techniques catering to various threat models. The first technique uses a malicious, dynamically linked version of PyTorch to expose a victim DNN architecture through the PyTorch profiler. The second, called EZClone, exploits aggregate (rather than time-series) GPU profiles as a side-channel to predict DNN architecture, employing a simple approach and assuming little adversary capability as compared to previous work. We investigate the effectiveness of EZClone when minimizing the complexity of the attack, when applied to pruned models, and when applied across GPUs. We find that EZClone correctly predicts DNN architectures for the entire set of PyTorch vision architectures with 100% accuracy. No other work has shown this degree of architecture prediction accuracy with the same adversarial constraints or using aggregate side-channel information. Prior work has shown that, once a DNN has been successfully cloned, further attacks such as model evasion or model inversion can be accelerated significantly.
Runtime Variation in Big Data Analytics
Authors: Yiwen Zhu, Rathijit Sen, Robert Horton, John Mark, Agosta
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
The dynamic nature of resource allocation and runtime conditions on Cloud can result in high variability in a job's runtime across multiple iterations, leading to a poor experience. Identifying the sources of such variation and being able to predict and adjust for them is crucial to cloud service providers to design reliable data processing pipelines, provision and allocate resources, adjust pricing services, meet SLOs and debug performance hazards. In this paper, we analyze the runtime variation of millions of production SCOPE jobs on Cosmos, an exabyte-scale internal analytics platform at Microsoft. We propose an innovative 2-step approach to predict job runtime distribution by characterizing typical distribution shapes combined with a classification model with an average accuracy of >96%, out-performing traditional regression models and better capturing long tails. We examine factors such as job plan characteristics and inputs, resource allocation, physical cluster heterogeneity and utilization, and scheduling policies. To the best of our knowledge, this is the first study on predicting categories of runtime distributions for enterprise analytics workloads at scale. Furthermore, we examine how our methods can be used to analyze what-if scenarios, focusing on the impact of resource allocation, scheduling, and physical cluster provisioning decisions on a job's runtime consistency and predictability.
Large-Scale Analysis of New Employee Network Dynamics
Authors: Yulin Yu, Longqi Yang, Siân Lindley, Mengting Wan
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Abstract
The COVID-19 pandemic has accelerated digital transformations across industries, but also introduced new challenges into workplaces, including the difficulties of effectively socializing with colleagues when working remotely. This challenge is exacerbated for new employees who need to develop workplace networks from the outset. In this paper, by analyzing a large-scale telemetry dataset of more than 10,000 Microsoft employees who joined the company in the first three months of 2022, we describe how new employees interact and telecommute with their colleagues during their ``onboarding'' period. Our results reveal that although new hires are gradually expanding networks over time, there still exists significant gaps between their network statistics and those of tenured employees even after the six-month onboarding phase. We also observe that heterogeneity exists among new employees in how their networks change over time, where employees whose job tasks do not necessarily require extensive and diverse connections could be at a disadvantaged position in this onboarding process. By investigating how web-based people recommendations in organizational knowledge base facilitate new employees naturally expand their networks, we also demonstrate the potential of web-based applications for addressing the aforementioned socialization challenges. Altogether, our findings provide insights on new employee network dynamics in remote and hybrid work environments, which may help guide organizational leaders and web application developers on quantifying and improving the socialization experiences of new employees in digital workplaces.
Generative Agents: Interactive Simulacra of Human Behavior
Authors: Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein
Abstract
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.
Detecting Chinese Fake News on Twitter during the COVID-19 Pandemic
Authors: Yongjun Zhang, Sijia Liu, Yi Wang, Xinguang Fan
Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Abstract
The outbreak of COVID-19 has led to a global surge of Sinophobia partly because of the spread of misinformation, disinformation, and fake news on China. In this paper, we report on the creation of a novel classifier that detects whether Chinese-language social media posts from Twitter are related to fake news about China. The classifier achieves an F1 score of 0.64 and an accuracy rate of 93%. We provide the final model and a new training dataset with 18,425 tweets for researchers to study fake news in the Chinese language during the COVID-19 pandemic. We also introduce a new dataset generated by our classifier that tracks the dynamics of fake news in the Chinese language during the early pandemic.
UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner
Abstract
The universal model emerges as a promising trend for medical image segmentation, paving up the way to build medical imaging large model (MILM). One popular strategy to build universal models is to encode each task as a one-hot vector and generate dynamic convolutional layers at the end of the decoder to extract the interested target. Although successful, it ignores the correlations among tasks and meanwhile is too late to make the model 'aware' of the ongoing task. To address both issues, we propose a prompt-driven Universal Segmentation model (UniSeg) for multi-task medical image segmentation using diverse modalities and domains. We first devise a learnable universal prompt to describe the correlations among all tasks and then convert this prompt and image features into a task-specific prompt, which is fed to the decoder as a part of its input. Thus, we make the model 'aware' of the ongoing task early and boost the task-specific training of the whole decoder. Our results indicate that the proposed UniSeg outperforms other universal models and single-task models on 11 upstream tasks. Moreover, UniSeg also beats other pre-trained models on two downstream datasets, providing the community with a high-quality pre-trained model for 3D medical image segmentation. Code and model are available at https://github.com/yeerwen/UniSeg.
Robust data-driven control for nonlinear systems using the Koopman operator
Authors: Robin Strässer, Julian Berberich, Frank Allgöwer
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Data-driven analysis and control of dynamical systems have gained a lot of interest in recent years. While the class of linear systems is well studied, theoretical results for nonlinear systems are still rare. In this paper, we present a data-driven controller design method for discrete-time control-affine nonlinear systems. Our approach relies on the Koopman operator, which is a linear but infinite-dimensional operator lifting the nonlinear system to a higher-dimensional space. Particularly, we derive a linear fractional representation of a lifted bilinear system representation based on measured data. Further, we restrict the lifting to finite dimensions, but account for the truncation error using a finite-gain argument. We derive a linear matrix inequality based design procedure to guarantee robust local stability for the resulting bilinear system for all error terms satisfying the finite-gain bound and, thus, also for the underlying nonlinear system. Finally, we apply the developed design method to the nonlinear Van der Pol oscillator.
Automated Tuning of Nonlinear Kalman Filters for Optimal Trajectory Tracking Performance of AUVs
Authors: Maximilian Nitsch, David Stenger, Dirk Abel
Abstract
The performance of navigation algorithms significantly determines the trajectory tracking accuracy of the guidance, navigation, and control (GNC) system of an autonomous underwater vehicle (AUV). In closed-loop operation, the interaction among path planning, control, and navigation plays a crucial role in the tracking accuracy of the overall GNC system. A Doppler velocity log (DVL) is often used for AUVs to measure velocity over the ground, positively affecting the closed-loop tracking error. However, a DVL may not be installed in miniaturized AUVs due to limited space and energy. In this paper, a navigation filter for an underactuated miniature AUV (nanoAUV) is considered that is mainly based on acoustic localization using a novel highly-miniaturized ultra-short baseline (USBL) system and a depth pressure sensor. The nanoAUV is being developed for subglacial lake exploration. We compare two unscented Kalman filters (UKF) with different prediction models - the classical strapdown inertial navigation systems (SINS) model and a hydrodynamic motion model (HMM). To enable a fair comparison, filter parameters are auto-tuned with Bayesian optimization (BO) for open and closed-loop performance, which is novel in AUV navigation. The results indicate that BO performs similarly to particle swarm optimization (PSO) regarding sample efficiency for the proposed problem. To quantify the GNC tracking performance, we use extensive Monte Carlo simulations. Results suggest that with BO-tuned navigation filter parameters, the median tracking error is reduced by up to 50% compared to default parametrization.
Towards Automated 3D Search Planning for Emergency Response Missions
Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
The ability to efficiently plan and execute automated and precise search missions using unmanned aerial vehicles (UAVs) during emergency response situations is imperative. Precise navigation between obstacles and time-efficient searching of 3D structures and buildings are essential for locating survivors and people in need in emergency response missions. In this work we address this challenging problem by proposing a unified search planning framework that automates the process of UAV-based search planning in 3D environments. Specifically, we propose a novel search planning framework which enables automated planning and execution of collision-free search trajectories in 3D by taking into account low-level mission constrains (e.g., the UAV dynamical and sensing model), mission objectives (e.g., the mission execution time and the UAV energy efficiency) and user-defined mission specifications (e.g., the 3D structures to be searched and minimum detection probability constraints). The capabilities and performance of the proposed approach are demonstrated through extensive simulated 3D search scenarios.
RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking
Abstract
Active Object Tracking (AOT) aims to maintain a specific relation between the tracker and object(s) by autonomously controlling the motion system of a tracker given observations. AOT has wide-ranging applications, such as in mobile robots and autonomous driving. However, building a generalizable active tracker that works robustly across different scenarios remains a challenge, especially in unstructured environments with cluttered obstacles and diverse layouts. We argue that constructing a state representation capable of modeling the geometry structure of the surroundings and the dynamics of the target is crucial for achieving this goal. To address this challenge, we present RSPT, a framework that forms a structure-aware motion representation by Reconstructing the Surroundings and Predicting the target Trajectory. Additionally, we enhance the generalization of the policy network by training in an asymmetric dueling mechanism. We evaluate RSPT on various simulated scenarios and show that it outperforms existing methods in unseen environments, particularly those with complex obstacles and layouts. We also demonstrate the successful transfer of RSPT to real-world settings. Project Website: https://sites.google.com/view/aot-rspt.
DATE: Domain Adaptive Product Seeker for E-commerce
Authors: Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Product Retrieval (PR) and Grounding (PG), aiming to seek image and object-level products respectively according to a textual query, have attracted great interest recently for better shopping experience. Owing to the lack of relevant datasets, we collect two large-scale benchmark datasets from Taobao Mall and Live domains with about 474k and 101k image-query pairs for PR, and manually annotate the object bounding boxes in each image for PG. As annotating boxes is expensive and time-consuming, we attempt to transfer knowledge from annotated domain to unannotated for PG to achieve un-supervised Domain Adaptation (PG-DA). We propose a {\bf D}omain {\bf A}daptive Produc{\bf t} S{\bf e}eker ({\bf DATE}) framework, regarding PR and PG as Product Seeking problem at different levels, to assist the query {\bf date} the product. Concretely, we first design a semantics-aggregated feature extractor for each modality to obtain concentrated and comprehensive features for following efficient retrieval and fine-grained grounding tasks. Then, we present two cooperative seekers to simultaneously search the image for PR and localize the product for PG. Besides, we devise a domain aligner for PG-DA to alleviate uni-modal marginal and multi-modal conditional distribution shift between source and target domains, and design a pseudo box generator to dynamically select reliable instances and generate bounding boxes for further knowledge transfer. Extensive experiments show that our DATE achieves satisfactory performance in fully-supervised PR, PG and un-supervised PG-DA. Our desensitized datasets will be publicly available here\footnote{\url{https://github.com/Taobao-live/Product-Seeking}}.
Sound Dynamic Deadlock Prediction in Linear Time
Authors: Umang Mathur, Andreas Pavlogiannis, Hünkar Can Tunç, Mahesh Viswanathan
Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
Abstract
Deadlocks are one of the most notorious concurrency bugs, and significant research has focused on detecting them efficiently. Dynamic predictive analyses work by observing concurrent executions, and reason about alternative interleavings that can witness concurrency bugs. Such techniques offer scalability and sound bug reports, and have emerged as an effective approach for concurrency bug detection, such as data races. Effective dynamic deadlock prediction, however, has proven a challenging task, as no deadlock predictor currently meets the requirements of soundness, high-precision, and efficiency. In this paper, we first formally establish that this tradeoff is unavoidable, by showing that (a) sound and complete deadlock prediction is intractable, in general, and (b) even the seemingly simpler task of determining the presence of potential deadlocks, which often serve as unsound witnesses for actual predictable deadlocks, is intractable. The main contribution of this work is a new class of predictable deadlocks, called sync(hronization)-preserving deadlocks. Informally, these are deadlocks that can be predicted by reordering the observed execution while preserving the relative order of conflicting critical sections. We present two algorithms for sound deadlock prediction based on this notion. Our first algorithm SyncPDOffline detects all sync-preserving deadlocks, with running time that is linear per abstract deadlock pattern, a novel notion also introduced in this work. Our second algorithm SyncPDOnline predicts all sync-preserving deadlocks that involve two threads in a strictly online fashion, runs in overall linear time, and is better suited for a runtime monitoring setting. We implemented both our algorithms and evaluated their ability to perform offline and online deadlock-prediction on a large dataset of standard benchmarks.
Sorta Solving the OPF by Not Solving the OPF: DAE Control Theory and the Price of Realtime Regulation
Abstract
This paper presents a new approach to solve or approximate the AC optimal power flow (ACOPF). By eliminating the need to solve the ACOPF every few minutes, the paper showcases how a realtime feedback controller can be utilized in lieu of ACOPF and its variants. By (i) forming the grid dynamics as a system of differential algebraic equations (DAE) that naturally encode the non-convex power flow constraints, (ii) utilizing advanced DAE-Lyapunov theory, and (iii) designing a feedback controller that captures realtime uncertainty while being uncertainty-unaware, the presented approach demonstrates promises of obtaining solutions that are close to the OPF ones without needing to solve the OPF. The proposed controller responds in realtime to deviations in renewables generation and loads, guaranteeing transient stability, while always yielding feasible solutions of the ACOPF with no constraint violations. As the studied approach herein indeed yields slightly more expensive realtime generator setpoints, the corresponding price of realtime control and regulation is examined. Cost-comparisons with the traditional ACOPF are also showcased -- all via case studies on standard power networks.
Optimal Reads-From Consistency Checking for C11-Style Memory Models
Authors: Parosh Aziz Abdulla, Soham Chakraborty, Shankaranarayanan Krishna, Umang Mathur, Andreas Pavlogiannis, Hünkar Can Tunç
Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
Abstract
Over the years, several memory models have been proposed to capture the subtle concurrency semantics of C/C++.One of the most fundamental problems associated with a memory model M is consistency checking: given an execution X, is X consistent with M? This problem lies at the heart of numerous applications, including specification testing and litmus tests, stateless model checking, and dynamic analyses. As such, it has been explored extensively and its complexity is well-understood for traditional models like SC and TSO. However, less is known for the numerous model variants of C/C++, for which the problem becomes challenging due to the intricacies of their concurrency primitives. In this work we study the problem of consistency checking for popular variants of the C11 memory model, in particular, the RC20 model, its release-acquire (RA) fragment, the strong and weak variants of RA (SRA and WRA), as well as the Relaxed fragment of RC20. Motivated by applications in testing and model checking, we focus on reads-from consistency checking. The input is an execution X specifying a set of events, their program order and their reads-from relation, and the task is to decide the existence of a modification order on the writes of X that makes X consistent in a memory model. We draw a rich complexity landscape for this problem; our results include (i)~nearly-linear-time algorithms for certain variants, which improve over prior results, (ii)~fine-grained optimality results, as well as (iii)~matching upper and lower bounds (NP-hardness) for other variants. To our knowledge, this is the first work to characterize the complexity of consistency checking for C11 memory models. We have implemented our algorithms inside the TruSt model checker and the C11Tester testing tool. Experiments on standard benchmarks show that our new algorithms improve consistency checking, often by a significant margin.
On the Importance of Contrastive Loss in Multimodal Learning
Authors: Yunwei Ren, Yuanzhi Li
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recently, contrastive learning approaches (e.g., CLIP (Radford et al., 2021)) have received huge success in multimodal learning, where the model tries to minimize the distance between the representations of different views (e.g., image and its caption) of the same data point while keeping the representations of different data points away from each other. However, from a theoretical perspective, it is unclear how contrastive learning can learn the representations from different views efficiently, especially when the data is not isotropic. In this work, we analyze the training dynamics of a simple multimodal contrastive learning model and show that contrastive pairs are important for the model to efficiently balance the learned representations. In particular, we show that the positive pairs will drive the model to align the representations at the cost of increasing the condition number, while the negative pairs will reduce the condition number, keeping the learned representations balanced.
Responsive Parallelism with Synchronization
Authors: Stefan K. Muller, Kyle Singer, Devyn Terra Keeney, Andrew Neth, Kunal Agrawal, I-Ting Angelina Lee, Umut A. Acar
Abstract
Many concurrent programs assign priorities to threads to improve responsiveness. When used in conjunction with synchronization mechanisms such as mutexes and condition variables, however, priorities can lead to priority inversions, in which high-priority threads are delayed by low-priority ones. Priority inversions in the use of mutexes are easily handled using dynamic techniques such as priority inheritance, but priority inversions in the use of condition variables are not well-studied and dynamic techniques are not suitable. In this work, we use a combination of static and dynamic techniques to prevent priority inversion in code that uses mutexes and condition variables. A type system ensures that condition variables are used safely, even while dynamic techniques change thread priorities at runtime to eliminate priority inversions in the use of mutexes. We prove the soundness of our system, using a model of priority inversions based on cost models for parallel programs. To show that the type system is practical to implement, we encode it within the type systems of Rust and C++, and show that the restrictions are not overly burdensome by writing sizeable case studies using these encodings, including porting the Memcached object server to use our C++ implementation.
The Effect of Robot Skill Level and Communication in Rapid, Proximate Human-Robot Collaboration
Authors: Kin Man Lee, Arjun Krishna, Zulfiqar Zaidi, Rohan Paleja, Letian Chen, Erin Hedlund-Botti, Mariah Schrum, Matthew Gombolay
Abstract
As high-speed, agile robots become more commonplace, these robots will have the potential to better aid and collaborate with humans. However, due to the increased agility and functionality of these robots, close collaboration with humans can create safety concerns that alter team dynamics and degrade task performance. In this work, we aim to enable the deployment of safe and trustworthy agile robots that operate in proximity with humans. We do so by 1) Proposing a novel human-robot doubles table tennis scenario to serve as a testbed for studying agile, proximate human-robot collaboration and 2) Conducting a user-study to understand how attributes of the robot (e.g., robot competency or capacity to communicate) impact team dynamics, perceived safety, and perceived trust, and how these latent factors affect human-robot collaboration (HRC) performance. We find that robot competency significantly increases perceived trust ($p<.001$), extending skill-to-trust assessments in prior studies to agile, proximate HRC. Furthermore, interestingly, we find that when the robot vocalizes its intention to perform a task, it results in a significant decrease in team performance ($p=.037$) and perceived safety of the system ($p=.009$).
Keyword: efficient
Automatic Detection of Reactions to Music via Earable Sensing
Identifying Lebesgue-sampled Continuous-time Impulse Response Models: A Kernel-based Approach
Adaptive Decision-Making with Constraints and Dependent Losses: Performance Guarantees and Applications to Online and Nonlinear Identification
Hardware-Aware Static Optimization of Hyperdimensional Computations
Spintronic Physical Reservoir for Autonomous Prediction and Long-Term Household Energy Load Forecasting
ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators
Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training
An Online Adaptation Strategy for Direct Data-driven Control
CAPOT: Creating Robust Dense Query Encoders using Post Training Contrastive Alignment
TinyDet: Accurate Small Object Detection in Lightweight Generic Detectors
Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks
Does Prompt-Tuning Language Model Ensure Privacy?
Can we learn better with hard samples?
Continuous Input Embedding Size Search For Recommender Systems
Generative Recommendation: Towards Next-generation Recommender Paradigm
From Retrieval to Generation: Efficient and Effective Entity Set Expansion
A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction
Applicable Methodologies for the Mass Transfer Phenomenon in Tumble Dryers: A Review
CRISP: Curriculum inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning
ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions
Towards Automated 3D Search Planning for Emergency Response Missions
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation
Qubo model for the Closest Vector Problem
FedDiSC: A Computation-efficient Federated Learning Framework for Power Systems Disturbance and Cyber Attack Discrimination
SCART: Simulation of Cyber Attacks for Real-Time
DATE: Domain Adaptive Product Seeker for E-commerce
Sound Dynamic Deadlock Prediction in Linear Time
On the Importance of Contrastive Loss in Multimodal Learning
Keyword: faster
Hardware-Aware Static Optimization of Hyperdimensional Computations
TopNet: Transformer-based Object Placement Network for Image Compositing
Scalable Causal Discovery with Score Matching
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Convex Minimization with Integer Minima in $\widetilde O(n^4)$ Time
Can we learn better with hard samples?
Pallet Detection from Synthetic Data Using Game Engines
Keyword: mobile
Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks
Can we learn better with hard samples?
Cell-Edge Performance Booster in 6G: Cell-Free Massive MIMO vs. Reconfigurable Intelligent Surface
RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking
Keyword: pruning
Scalable Causal Discovery with Score Matching
Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting
Keyword: voxel
On the Suitability of Representations for Quality Diversity Optimization of Shapes
Keyword: lidar
There is no result
Keyword: diffusion
Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models
Training-Free Layout Control with Cross-Attention Guidance
RoSteALS: Robust Steganography using Autoencoder Latent Space
Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks
Compressed Regression over Adaptive Networks
Keyword: dynamic
Adaptive Feature Fusion: Enhancing Generalization in Deep Learning Models
Hardware-Aware Static Optimization of Hyperdimensional Computations
Spintronic Physical Reservoir for Autonomous Prediction and Long-Term Household Energy Load Forecasting
Robust Decision-Focused Learning for Reward Transfer
Interpretable statistical representations of neural population dynamics and geometry
EZClone: Improving DNN Model Extraction Attack via Shape Distillation from GPU Execution Profiles
Runtime Variation in Big Data Analytics
Large-Scale Analysis of New Employee Network Dynamics
Generative Agents: Interactive Simulacra of Human Behavior
Detecting Chinese Fake News on Twitter during the COVID-19 Pandemic
UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner
Robust data-driven control for nonlinear systems using the Koopman operator
Automated Tuning of Nonlinear Kalman Filters for Optimal Trajectory Tracking Performance of AUVs
Towards Automated 3D Search Planning for Emergency Response Missions
RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking
DATE: Domain Adaptive Product Seeker for E-commerce
Sound Dynamic Deadlock Prediction in Linear Time
Sorta Solving the OPF by Not Solving the OPF: DAE Control Theory and the Price of Realtime Regulation
Optimal Reads-From Consistency Checking for C11-Style Memory Models
On the Importance of Contrastive Loss in Multimodal Learning
Responsive Parallelism with Synchronization
The Effect of Robot Skill Level and Communication in Rapid, Proximate Human-Robot Collaboration