New submissions for Mon, 24 Jul 23

Keyword: efficient

Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking

Authors: Xuewei Wang, Qiang Jin, Shengyu Huang, Min Zhang, Xi Liu, Zhengli Zhao, Yukun Chen, Zhengyu Zhang, Jiyan Yang, Ellie Wen, Sagar Chordia, Wenlin Chen, Qin Huang
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11096
Pdf link: https://arxiv.org/pdf/2307.11096
Abstract Dividing ads ranking system into retrieval, early, and final stages is a common practice in large scale ads recommendation to balance the efficiency and accuracy. The early stage ranking often uses efficient models to generate candidates out of a set of retrieved ads. The candidates are then fed into a more computationally intensive but accurate final stage ranking system to produce the final ads recommendation. As the early and final stage ranking use different features and model architectures because of system constraints, a serious ranking consistency issue arises where the early stage has a low ads recall, i.e., top ads in the final stage are ranked low in the early stage. In order to pass better ads from the early to the final stage ranking, we propose a multi-task learning framework for early stage ranking to capture multiple final stage ranking components (i.e. ads clicks and ads quality events) and their task relations. With our multi-task learning framework, we can not only achieve serving cost saving from the model consolidation, but also improve the ads recall and ranking consistency. In the online A/B testing, our framework achieves significantly higher click-through rate (CTR), conversion rate (CVR), total value and better ads-quality (e.g. reduced ads cross-out rate) in a large scale industrial ads ranking system.
Flatness-Aware Minimization for Domain Generalization
Authors: Xingxuan Zhang, Renzhe Xu, Han Yu, Yancheng Dong, Pengfei Tian, Peng Cu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11108
Pdf link: https://arxiv.org/pdf/2307.11108
Abstract Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts. As a critical aspect of DG, optimizer selection has not been explored in depth. Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default optimizer for all datasets. However, we reveal that Adam is not necessarily the optimal choice for the majority of current DG methods and datasets. Based on the perspective of loss landscape flatness, we propose a novel approach, Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD) generalization error and convergence. Our experimental results demonstrate the superiority of FAD on various DG datasets. Additionally, we confirm that FAD is capable of discovering flatter optima in comparison to other zeroth-order and first-order flatness-aware optimization methods.
Comparison between transformers and convolutional models for fine-grained classification of insects
Authors: Rita Pucci, Vincent J. Kalkman, Dan Stowell
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11112
Pdf link: https://arxiv.org/pdf/2307.11112
Abstract Fine-grained classification is challenging due to the difficulty of finding discriminatory features. This problem is exacerbated when applied to identifying species within the same taxonomical class. This is because species are often sharing morphological characteristics that make them difficult to differentiate. We consider the taxonomical class of Insecta. The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems. Citizen science is doing brilliant work of collecting images of insects in the wild giving the possibility to experts to create improved distribution maps in all countries. We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks. At the SOTA, the field of deep learning algorithms is extremely fruitful, so how to identify the algorithm to use? We focus on Odonata and Coleoptera orders, and we propose an initial comparative study to analyse the two best-known layer structures for computer vision: transformer and convolutional layers. We compare the performance of T2TViT, a fully transformer-base, EfficientNet, a fully convolutional-base, and ViTAE, a hybrid. We analyse the performance of the three models in identical conditions evaluating the performance per species, per morph together with sex, the inference time, and the overall performance with unbalanced datasets of images from smartphones. Although we observe high performances with all three families of models, our analysis shows that the hybrid model outperforms the fully convolutional-base and fully transformer-base models on accuracy performance and the fully transformer-base model outperforms the others on inference speed and, these prove the transformer to be robust to the shortage of samples and to be faster at inference time.
Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques
Authors: Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris
Subjects: Hardware Architecture (cs.AR); Emerging Technologies (cs.ET); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2307.11124
Pdf link: https://arxiv.org/pdf/2307.11124
Abstract The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
Authors: Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2307.11128
Pdf link: https://arxiv.org/pdf/2307.11128
Abstract The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.
Accurate error estimation for model reduction of nonlinear dynamical systems via data-enhanced error closure
Authors: Sridhar Chellappa, Lihong Feng, Peter Benner
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2307.11138
Pdf link: https://arxiv.org/pdf/2307.11138
Abstract Accurate error estimation is crucial in model order reduction, both to obtain small reduced-order models and to certify their accuracy when deployed in downstream applications such as digital twins. In existing a posteriori error estimation approaches, knowledge about the time integration scheme is mandatory, e.g., the residual-based error estimators proposed for the reduced basis method. This poses a challenge when automatic ordinary differential equation solver libraries are used to perform the time integration. To address this, we present a data-enhanced approach for a posteriori error estimation. Our new formulation enables residual-based error estimators to be independent of any time integration method. To achieve this, we introduce a corrected reduced-order model which takes into account a data-driven closure term for improved accuracy. The closure term, subject to mild assumptions, is related to the local truncation error of the corresponding time integration scheme. We propose efficient computational schemes for approximating the closure term, at the cost of a modest amount of training data. Furthermore, the new error estimator is incorporated within a greedy process to obtain parametric reduced-order models. Numerical results on three different systems show the accuracy of the proposed error estimation approach and its ability to produce ROMs that generalize well.
SMOTEC: An Edge Computing Testbed for Adaptive Smart Mobility Experimentation
Authors: Zeinab Nezami, Evangelos Pournaras, Amir Borzouie, Jie Xu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2307.11181
Pdf link: https://arxiv.org/pdf/2307.11181
Abstract Smart mobility becomes paramount for meeting net-zero targets. However, autonomous, self-driving and electric vehicles require more than ever before an efficient, resilient and trustworthy computational offloading backbone that expands throughout the edge-to-cloud continuum. Utilizing on-demand heterogeneous computational resources for smart mobility is challenging and often cost-ineffective. This paper introduces SMOTEC, a novel open-source testbed for adaptive smart mobility experimentation with edge computing. SMOTEC provides for the first time a modular end-to-end instrumentation for prototyping and optimizing placement of intelligence services on edge devices such as augmented reality and real-time traffic monitoring. SMOTEC supports a plug-and-play Docker container integration of the SUMO simulator for urban mobility, Raspberry Pi edge devices communicating via ZeroMQ and EPOS for an AI-based decentralized load balancing across edge-to-cloud. All components are orchestrated by the K3s lightweight Kubernetes. A proof-of-concept of self-optimized service placements for traffic monitoring from Munich demonstrates in practice the applicability and cost-effectiveness of SMOTEC.
Out-of-Order Sliding-Window Aggregation with Efficient Bulk Evictions and Insertions (Extended Version)
Authors: Kanat Tangwongsan, Martin Hirzel, Scott Schneider
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2307.11210
Pdf link: https://arxiv.org/pdf/2307.11210
Abstract Sliding-window aggregation is a foundational stream processing primitive that efficiently summarizes recent data. The state-of-the-art algorithms for sliding-window aggregation are highly efficient when stream data items are evicted or inserted one at a time, even when some of the insertions occur out-of-order. However, real-world streams are often not only out-of-order but also burtsy, causing data items to be evicted or inserted in larger bulks. This paper introduces a new algorithm for sliding-window aggregation with bulk eviction and bulk insertion. For the special case of single insert and evict, our algorithm matches the theoretical complexity of the best previous out-of-order algorithms. For the case of bulk evict, our algorithm improves upon the theoretical complexity of the best previous algorithm for that case and also outperforms it in practice. For the case of bulk insert, there are no prior algorithms, and our algorithm improves upon the naive approach of emulating bulk insert with a loop over single inserts, both in theory and in practice. Overall, this paper makes high-performance algorithms for sliding window aggregation more broadly applicable by efficiently handling the ubiquitous cases of out-of-order data and bursts.
From Adaptive Query Release to Machine Unlearning
Authors: Enayat Ullah, Raman Arora
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.11228
Pdf link: https://arxiv.org/pdf/2307.11228
Abstract We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. We give efficient unlearning algorithms for linear and prefix-sum query classes. As applications, we show that unlearning in many problems, in particular, stochastic convex optimization (SCO), can be reduced to the above, yielding improved guarantees for the problem. In particular, for smooth Lipschitz losses and any $\rho>0$, our results yield an unlearning algorithm with excess population risk of $\tilde O\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\rho}\big)$ with unlearning query (gradient) complexity $\tilde O(\rho \cdot \text{Retraining Complexity})$, where $d$ is the model dimensionality and $n$ is the initial number of samples. For non-smooth Lipschitz losses, we give an unlearning algorithm with excess population risk $\tilde O\big(\frac{1}{\sqrt{n}}+\big(\frac{\sqrt{d}}{n\rho}\big)^{1/2}\big)$ with the same unlearning query (gradient) complexity. Furthermore, in the special case of Generalized Linear Models (GLMs), such as those in linear and logistic regression, we get dimension-independent rates of $\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{2/3}}\big)$ and $\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{1/3}}\big)$ for smooth Lipschitz and non-smooth Lipschitz losses respectively. Finally, we give generalizations of the above from one unlearning request to \textit{dynamic} streams consisting of insertions and deletions.
Formal-Guided Fuzz Testing: Targeting Security Assurance from Specification to Implementation for 5G and Beyond
Authors: Jingda Yang, Sudhanshu Arya, Ying Wang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.11247
Pdf link: https://arxiv.org/pdf/2307.11247
Abstract Softwarization and virtualization in 5G and beyond necessitate thorough testing to ensure the security of critical infrastructure and networks, requiring the identification of vulnerabilities and unintended emergent behaviors from protocol designs to their software stack implementation. To provide an efficient and comprehensive solution, we propose a novel and first-of-its-kind approach that connects the strengths and coverage of formal and fuzzing methods to efficiently detect vulnerabilities across protocol logic and implementation stacks in a hierarchical manner. We design and implement formal verification to detect attack traces in critical protocols, which are used to guide subsequent fuzz testing and incorporate feedback from fuzz testing to broaden the scope of formal verification. This innovative approach significantly improves efficiency and enables the auto-discovery of vulnerabilities and unintended emergent behaviors from the 3GPP protocols to software stacks. Following this approach, we discover one identifier leakage model, one DoS attack model, and two eavesdrop attack models due to the absence of rudimentary MITM protection within the protocol, despite the existence of a Transport Layer Security (TLS) solution to this issue for over a decade. More remarkably, guided by the identified formal analysis and attack models, we exploit 61 vulnerabilities using fuzz testing demonstrated on srsRAN platforms. These identified vulnerabilities contribute to fortifying protocol-level assumptions and refining the search space. Compared to state-of-the-art fuzz testing, our united formal and fuzzing methodology enables auto-assurance by systematically discovering vulnerabilities. It significantly reduces computational complexity, transforming the non-practical exponential growth in computational cost into linear growth.
GPU-accelerated Parallel Solutions to the Quadratic Assignment Problem
Authors: Clara Novoa, Apan Qasem
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)
Arxiv link: https://arxiv.org/abs/2307.11248
Pdf link: https://arxiv.org/pdf/2307.11248
Abstract The Quadratic Assignment Problem (QAP) is an important combinatorial optimization problem with applications in many areas including logistics and manufacturing. QAP is known to be NP-hard, a computationally challenging problem, which requires the use of sophisticated heuristics in finding acceptable solutions for most real-world data sets. In this paper, we present GPU-accelerated implementations of a 2opt and a tabu search algorithm for solving the QAP. For both algorithms, we extract parallelism at multiple levels and implement novel code optimization techniques that fully utilize the GPU hardware. On a series of experiments on the well-known QAPLIB data sets, our solutions, on average run an order-of-magnitude faster than previous implementations and deliver up to a factor of 63 speedup on specific instances. The quality of the solutions produced by our implementations of 2opt and tabu is within 1.03% and 0.15% of the best known values. The experimental results also provide key insight into the performance characteristics of accelerated QAP solvers. In particular, the results reveal that both algorithmic choice and the shape of the input data sets are key factors in finding efficient implementations.
Reconfigurable cascaded thermal neuristors for neuromorphic computing
Authors: Erbin Qiu, Yuan-Hang Zhang, Massimiliano Di Ventra, Ivan K. Schuller
Subjects: Emerging Technologies (cs.ET); Applied Physics (physics.app-ph)
Arxiv link: https://arxiv.org/abs/2307.11256
Pdf link: https://arxiv.org/pdf/2307.11256
Abstract While the complementary metal-oxide semiconductor (CMOS) technology is the mainstream for the hardware implementation of neural networks, we explore an alternative route based on a new class of spiking oscillators we call thermal neuristors, which operate and interact solely via thermal processes. Utilizing the insulator-to-metal transition in vanadium dioxide, we demonstrate a wide variety of reconfigurable electrical dynamics mirroring biological neurons. Notably, inhibitory functionality is achieved just in a single oxide device, and cascaded information flow is realized exclusively through thermal interactions. To elucidate the underlying mechanisms of the neuristors, a detailed theoretical model is developed, which accurately reflects the experimental results. This study establishes the foundation for scalable and energy-efficient thermal neural networks, fostering progress in brain-inspired computing.
Kernelized Offline Contextual Dueling Bandits
Authors: Viraj Mehta, Ojash Neopane, Vikramjeet Das, Sen Lin, Jeff Schneider, Willie Neiswanger
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.11288
Pdf link: https://arxiv.org/pdf/2307.11288
Abstract Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and introduce the offline contextual dueling bandit setting. We give an upper-confidence-bound style algorithm for this setting and prove a regret bound. We also give empirical confirmation that this method outperforms a similar strategy that uses uniformly sampled contexts.
Energy-Efficient Softwarized Networks: A Survey
Authors: Iwan Setiawan, Binayak Kar, Shan-Hsiang Shen
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2307.11301
Pdf link: https://arxiv.org/pdf/2307.11301
Abstract With the dynamic demands and stringent requirements of various applications, networks need to be high-performance, scalable, and adaptive to changes. Researchers and industries view network softwarization as the best enabler for the evolution of networking to tackle current and prospective challenges. Network softwarization must provide programmability and flexibility to network infrastructures and allow agile management, along with higher control for operators. While satisfying the demands and requirements of network services, energy cannot be overlooked, considering the effects on the sustainability of the environment and business. This paper discusses energy efficiency in modern and future networks with three network softwarization technologies: SDN, NFV, and NS, introduced in an energy-oriented context. With that framework in mind, we review the literature based on network scenarios, control/MANO layers, and energy-efficiency strategies. Following that, we compare the references regarding approach, evaluation method, criterion, and metric attributes to demonstrate the state-of-the-art. Last, we analyze the classified literature, summarize lessons learned, and present ten essential concerns to open discussions about future research opportunities on energy-efficient softwarized networks.
Quantum Software Analytics: Opportunities and Challenges
Authors: Thong Hoang, Hoa Khanh Dam, Tingting Bi, Qinghua Lu, Zhenchang Xing, Liming Zhu, Lam Duc Nguyen, Shiping Chen
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2307.11305
Pdf link: https://arxiv.org/pdf/2307.11305
Abstract Quantum computing systems depend on the principles of quantum mechanics to perform multiple challenging tasks more efficiently than their classical counterparts. In classical software engineering, the software life cycle is used to document and structure the processes of design, implementation, and maintenance of software applications. It helps stakeholders understand how to build an application. In this paper, we summarize a set of software analytics topics and techniques in the development life cycle that can be leveraged and integrated into quantum software application development. The results of this work can assist researchers and practitioners in better understanding the quantum-specific emerging development activities, challenges, and opportunities in the next generation of quantum software.
DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport
Authors: Zezeng Li, ShengHao Li, Zhanpeng Wang, Na Lei, Zhongxuan Luo, Xianfeng Gu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.11308
Pdf link: https://arxiv.org/pdf/2307.11308
Abstract Sampling from diffusion probabilistic models (DPMs) can be viewed as a piecewise distribution transformation, which generally requires hundreds or thousands of steps of the inverse diffusion trajectory to get a high-quality image. Recent progress in designing fast samplers for DPMs achieves a trade-off between sampling speed and sample quality by knowledge distillation or adjusting the variance schedule or the denoising equation. However, it can't be optimal in both aspects and often suffer from mode mixture in short steps. To tackle this problem, we innovatively regard inverse diffusion as an optimal transport (OT) problem between latents at different stages and propose the DPM-OT, a unified learning framework for fast DPMs with a direct expressway represented by OT map, which can generate high-quality samples within around 10 function evaluations. By calculating the semi-discrete optimal transport map between the data latents and the white noise, we obtain an expressway from the prior distribution to the data distribution, while significantly alleviating the problem of mode mixture. In addition, we give the error bound of the proposed method, which theoretically guarantees the stability of the algorithm. Extensive experiments validate the effectiveness and advantages of DPM-OT in terms of speed and quality (FID and mode mixture), thus representing an efficient solution for generative modeling. Source codes are available at https://github.com/cognaclee/DPM-OT
HVDetFusion: A Simple and Robust Camera-Radar Fusion Framework
Authors: Kai Lei, Zhan Chen, Shuman Jia, Xiaoteng Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.11323
Pdf link: https://arxiv.org/pdf/2307.11323
Abstract In the field of autonomous driving, 3D object detection is a very important perception module. Although the current SOTA algorithm combines Camera and Lidar sensors, limited by the high price of Lidar, the current mainstream landing schemes are pure Camera sensors or Camera+Radar sensors. In this study, we propose a new detection algorithm called HVDetFusion, which is a multi-modal detection algorithm that not only supports pure camera data as input for detection, but also can perform fusion input of radar data and camera data. The camera stream does not depend on the input of Radar data, thus addressing the downside of previous methods. In the pure camera stream, we modify the framework of Bevdet4D for better perception and more efficient inference, and this stream has the whole 3D detection output. Further, to incorporate the benefits of Radar signals, we use the prior information of different object positions to filter the false positive information of the original radar data, according to the positioning information and radial velocity information recorded by the radar sensors to supplement and fuse the BEV features generated by the original camera data, and the effect is further improved in the process of fusion training. Finally, HVDetFusion achieves the new state-of-the-art 67.4\% NDS on the challenging nuScenes test set among all camera-radar 3D object detectors. The code is available at https://github.com/HVXLab/HVDetFusion
Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields
Authors: Wenbo Hu, Yuling Wang, Lin Ma, Bangbang Yang, Lin Gao, Xiao Liu, Yuewen Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2307.11335
Pdf link: https://arxiv.org/pdf/2307.11335
Abstract Despite the tremendous progress in neural radiance fields (NeRF), we still face a dilemma of the trade-off between quality and efficiency, e.g., MipNeRF presents fine-detailed and anti-aliased renderings but takes days for training, while Instant-ngp can accomplish the reconstruction in a few minutes but suffers from blurring or aliasing when rendering at various distances or resolutions due to ignoring the sampling area. To this end, we propose a novel Tri-Mip encoding that enables both instant reconstruction and anti-aliased high-fidelity rendering for neural radiance fields. The key is to factorize the pre-filtered 3D feature spaces in three orthogonal mipmaps. In this way, we can efficiently perform 3D area sampling by taking advantage of 2D pre-filtered feature maps, which significantly elevates the rendering quality without sacrificing efficiency. To cope with the novel Tri-Mip representation, we propose a cone-casting rendering technique to efficiently sample anti-aliased 3D features with the Tri-Mip encoding considering both pixel imaging and observing distance. Extensive experiments on both synthetic and real-world datasets demonstrate our method achieves state-of-the-art rendering quality and reconstruction speed while maintaining a compact representation that reduces 25% model size compared against Instant-ngp.
Fundamental CRB-Rate Tradeoff in Multi-Antenna ISAC Systems with Information Multicasting and Multi-Target Sensing
Authors: Zixiang Ren, Yunfei Peng, Xianxin Song, Yuan Fang, Ling Qiu, Liang Liu, Derrick Wing Kwan Ng, Jie Xu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.11337
Pdf link: https://arxiv.org/pdf/2307.11337
Abstract This paper investigates the performance tradeoff for a multi-antenna integrated sensing and communication (ISAC) system with simultaneous information multicasting and multi-target sensing, in which a multi-antenna base station (BS) sends the common information messages to a set of single-antenna communication users (CUs) and estimates the parameters of multiple sensing targets based on the echo signals concurrently. We consider two target sensing scenarios without and with prior target knowledge at the BS, in which the BS is interested in estimating the complete multi-target response matrix and the target reflection coefficients/angles, respectively. First, we consider the capacity-achieving transmission and characterize the fundamental tradeoff between the achievable rate and the multi-target estimation Cram\'er-Rao bound (CRB) accordingly.
Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs
Authors: Zinuo Cai, Hao Wang, Tao Song, Yang Hua, Ruhui Ma, Haibing Guan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.11339
Pdf link: https://arxiv.org/pdf/2307.11339
Abstract Deploying deep learning models in cloud clusters provides efficient and prompt inference services to accommodate the widespread application of deep learning. These clusters are usually equipped with host CPUs and accelerators with distinct responsibilities to handle serving requests, i.e. generalpurpose CPUs for input preprocessing and domain-specific GPUs for forward computation. Recurrent neural networks play an essential role in handling temporal inputs and display distinctive computation characteristics because of their high inter-operator parallelism. Hence, we propose Chrion to optimize recurrent neural network inference by collaboratively utilizing CPUs and GPUs. We formulate the model deployment in the CPU-GPU cluster as an NP-hard scheduling problem of directed acyclic graphs on heterogeneous devices. Given an input model in the ONNX format and user-defined SLO requirement, Chrion firstly preprocesses the model by model parsing and profiling, and then partitions the graph to select execution devices for each operator. When an online request arrives, Chrion performs forward computation according to the graph partition by executing the operators on the CPU and GPU in parallel. Our experimental results show that the execution time can be reduced by 19.4% at most in the latency-optimal pattern and GPU memory footprint by 67.5% in the memory-optimal pattern compared with the execution on the GPU.
Tuning Pre-trained Model via Moment Probing
Authors: Mingze Gao, Qilong Wang, Zhenyi Lin, Pengfei Zhu, Qinghua Hu, Jingbo Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.11342
Pdf link: https://arxiv.org/pdf/2307.11342
Abstract Recently, efficient fine-tuning of large-scale pre-trained models has attracted increasing research interests, where linear probing (LP) as a fundamental module is involved in exploiting the final representations for task-dependent classification. However, most of the existing methods focus on how to effectively introduce a few of learnable parameters, and little work pays attention to the commonly used LP module. In this paper, we propose a novel Moment Probing (MP) method to further explore the potential of LP. Distinguished from LP which builds a linear classification head based on the mean of final features (e.g., word tokens for ViT) or classification tokens, our MP performs a linear classifier on feature distribution, which provides the stronger representation ability by exploiting richer statistical information inherent in features. Specifically, we represent feature distribution by its characteristic function, which is efficiently approximated by using first- and second-order moments of features. Furthermore, we propose a multi-head convolutional cross-covariance (MHC$^3$) to compute second-order moments in an efficient and effective manner. By considering that MP could affect feature learning, we introduce a partially shared module to learn two recalibrating parameters (PSRP) for backbones based on MP, namely MP${+}$. Extensive experiments on ten benchmarks using various models show that our MP significantly outperforms LP and is competitive with counterparts at less training cost, while our MP${+}$ achieves state-of-the-art performance.
Sensing Aided Covert Communications: Turning Interference into Allies
Authors: Xinyi Wang, Zesong Fei, Peng Liu, J. Andrew Zhang, Qingqing Wu, Nan Wu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.11345
Pdf link: https://arxiv.org/pdf/2307.11345
Abstract In this paper, we investigate the realization of covert communication in a general radar-communication cooperation system, which includes integrated sensing and communications as a special example. We explore the possibility of utilizing the sensing ability of radar to track and jam the aerial adversary target attempting to detect the transmission. Based on the echoes from the target, the extended Kalman filtering technique is employed to predict its trajectory as well as the corresponding channels. Depending on the maneuvering altitude of adversary target, two channel models are considered, with the aim of maximizing the covert transmission rate by jointly designing the radar waveform and communication transmit beamforming vector based on the constructed channels. For the free-space propagation model, by decoupling the joint design, we propose an efficient algorithm to guarantee that the target cannot detect the transmission. For the Rician fading model, since the multi-path components cannot be estimated, a robust joint transmission scheme is proposed based on the property of the Kullback-Leibler divergence. The convergence behaviour, tracking MSE, false alarm and missed detection probabilities, and covert transmission rate are evaluated. Simulation results show that the proposed algorithms achieve accurate tracking. For both channel models, the proposed sensing-assisted covert transmission design is able to guarantee the covertness, and significantly outperforms the conventional schemes.
EV-Planner: Energy-Efficient Robot Navigation via Event-Based Physics-Guided Neuromorphic Planner
Authors: Sourav Sanyal, Rohan Kumar Manna, Kaushik Roy
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.11349
Pdf link: https://arxiv.org/pdf/2307.11349
Abstract Vision-based object tracking is an essential precursor to performing autonomous aerial navigation in order to avoid obstacles. Biologically inspired neuromorphic event cameras are emerging as a powerful alternative to frame-based cameras, due to their ability to asynchronously detect varying intensities (even in poor lighting conditions), high dynamic range, and robustness to motion blur. Spiking neural networks (SNNs) have gained traction for processing events asynchronously in an energy-efficient manner. On the other hand, physics-based artificial intelligence (AI) has gained prominence recently, as they enable embedding system knowledge via physical modeling inside traditional analog neural networks (ANNs). In this letter, we present an event-based physics-guided neuromorphic planner (EV-Planner) to perform obstacle avoidance using neuromorphic event cameras and physics-based AI. We consider the task of autonomous drone navigation where the mission is to detect moving gates and fly through them while avoiding a collision. We use event cameras to perform object detection using a shallow spiking neural network in an unsupervised fashion. Utilizing the physical equations of the brushless DC motors present in the drone rotors, we train a lightweight energy-aware physics-guided neural network with depth inputs. This predicts the optimal flight time responsible for generating near-minimum energy paths. We spawn the drone in the Gazebo simulator and implement a sensor-fused vision-to-planning neuro-symbolic framework using Robot Operating System (ROS). Simulation results for safe collision-free flight trajectories are presented with performance analysis and potential future research directions
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Authors: Hengyu Fu, Tianyu Guo, Yu Bai, Song Mei
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.11353
Pdf link: https://arxiv.org/pdf/2307.11353
Abstract Attention layers -- which map a sequence of inputs to a sequence of outputs -- are core building blocks of the Transformer architecture which has achieved significant breakthroughs in modern artificial intelligence. This paper presents a rigorous theoretical study on the learning and generalization of a single multi-head attention layer, with a sequence of key vectors and a separate query vector as input. We consider the random feature setting where the attention layer has a large number of heads, with randomly sampled frozen query and key matrices, and trainable value matrices. We show that such a random-feature attention layer can express a broad class of target functions that are permutation invariant to the key vectors. We further provide quantitative excess risk bounds for learning these target functions from finite samples, using random feature attention with finitely many heads. Our results feature several implications unique to the attention structure compared with existing random features theory for neural networks, such as (1) Advantages in the sample complexity over standard two-layer random-feature networks; (2) Concrete and natural classes of functions that can be learned efficiently by a random-feature attention layer; and (3) The effect of the sampling distribution of the query-key weight matrix (the product of the query and key matrix), where Gaussian random weights with a non-zero mean result in better sample complexities over the zero-mean counterpart for learning certain natural target functions. Experiments on simulated data corroborate our theoretical findings and further illustrate the interplay between the sample size and the complexity of the target function.
A Fair and Memory/Time-efficient Hashmap
Authors: Abolfazl Asudeh, Nima Shahbazi, Stavros Sintos
Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.11355
Pdf link: https://arxiv.org/pdf/2307.11355
Abstract There is a large amount of work constructing hashmaps to minimize the number of collisions. However, to the best of our knowledge no known hashing technique guarantees group fairness among different groups of items. We are given a set $P$ of $n$ tuples in $\mathbb{R}^d$, for a constant dimension $d$ and a set of groups $\mathcal{G}={\mathbf{g}_1,\ldots, \mathbf{g}_k}$ such that every tuple belongs to a unique group. We formally define the fair hashing problem introducing the notions of single fairness ($Pr[h(p)=h(x)\mid p\in \mathbf{g}_i, x\in P]$ for every $i=1,\ldots, k$), pairwise fairness ($Pr[h(p)=h(q)\mid p,q\in \mathbf{g}_i]$ for every $i=1,\ldots, k$), and the well-known collision probability ($Pr[h(p)=h(q)\mid p,q\in P]$). The goal is to construct a hashmap such that the collision probability, the single fairness, and the pairwise fairness are close to $1/m$, where $m$ is the number of buckets in the hashmap. We propose two families of algorithms to design fair hashmaps. First, we focus on hashmaps with optimum memory consumption minimizing the unfairness. We model the input tuples as points in $\mathbb{R}^d$ and the goal is to find the vector $w$ such that the projection of $P$ onto $w$ creates an ordering that is convenient to split to create a fair hashmap. For each projection we design efficient algorithms that find near optimum partitions of exactly (or at most) $m$ buckets. Second, we focus on hashmaps with optimum fairness ($0$-unfairness), minimizing the memory consumption. We make the important observation that the fair hashmap problem is reduced to the necklace splitting problem. By carefully implementing algorithms for solving the necklace splitting problem, we propose faster algorithms constructing hashmaps with $0$-unfairness using $2(m-1)$ boundary points when $k=2$ and $k(m-1)(4+\log_2 (3mn))$ boundary points for $k>2$.
Random Separating Hyperplane Theorem and Learning Polytopes
Authors: Chiranjib Bhattacharyya, Ravindran Kannan, Amit Kumar
Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG)
Arxiv link: https://arxiv.org/abs/2307.11371
Pdf link: https://arxiv.org/pdf/2307.11371
Abstract The Separating Hyperplane theorem is a fundamental result in Convex Geometry with myriad applications. Our first result, Random Separating Hyperplane Theorem (RSH), is a strengthening of this for polytopes. $\rsh$ asserts that if the distance between $a$ and a polytope $K$ with $k$ vertices and unit diameter in $\Re^d$ is at least $\delta$, where $\delta$ is a fixed constant in $(0,1)$, then a randomly chosen hyperplane separates $a$ and $K$ with probability at least $1/poly(k)$ and margin at least $\Omega \left(\delta/\sqrt{d} \right)$. An immediate consequence of our result is the first near optimal bound on the error increase in the reduction from a Separation oracle to an Optimization oracle over a polytope. RSH has algorithmic applications in learning polytopes. We consider a fundamental problem, denoted the ``Hausdorff problem'', of learning a unit diameter polytope $K$ within Hausdorff distance $\delta$, given an optimization oracle for $K$. Using RSH, we show that with polynomially many random queries to the optimization oracle, $K$ can be approximated within error $O(\delta)$. To our knowledge this is the first provable algorithm for the Hausdorff Problem. Building on this result, we show that if the vertices of $K$ are well-separated, then an optimization oracle can be used to generate a list of points, each within Hausdorff distance $O(\delta)$ of $K$, with the property that the list contains a point close to each vertex of $K$. Further, we show how to prune this list to generate a (unique) approximation to each vertex of the polytope. We prove that in many latent variable settings, e.g., topic modeling, LDA, optimization oracles do exist provided we project to a suitable SVD subspace. Thus, our work yields the first efficient algorithm for finding approximations to the vertices of the latent polytope under the well-separatedness assumption.
Towards Better Fairness-Utility Trade-off: A Comprehensive Measurement-Based Reinforcement Learning Framework
Authors: Simiao Zhang, Jitao Bai, Menghong Guan, Yihao Huang, Yueling Zhang, Jun Sun, Geguang Pu
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2307.11379
Pdf link: https://arxiv.org/pdf/2307.11379
Abstract Machine learning is widely used to make decisions with societal impact such as bank loan approving, criminal sentencing, and resume filtering. How to ensure its fairness while maintaining utility is a challenging but crucial issue. Fairness is a complex and context-dependent concept with over 70 different measurement metrics. Since existing regulations are often vague in terms of which metric to use and different organizations may prefer different fairness metrics, it is important to have means of improving fairness comprehensively. Existing mitigation techniques often target at one specific fairness metric and have limitations in improving multiple notions of fairness simultaneously. In this work, we propose CFU (Comprehensive Fairness-Utility), a reinforcement learning-based framework, to efficiently improve the fairness-utility trade-off in machine learning classifiers. A comprehensive measurement that can simultaneously consider multiple fairness notions as well as utility is established, and new metrics are proposed based on an in-depth analysis of the relationship between different fairness metrics. The reward function of CFU is constructed with comprehensive measurement and new metrics. We conduct extensive experiments to evaluate CFU on 6 tasks, 3 machine learning models, and 15 fairness-utility measurements. The results demonstrate that CFU can improve the classifier on multiple fairness metrics without sacrificing its utility. It outperforms all state-of-the-art techniques and has witnessed a 37.5% improvement on average.
Direct and inverse modeling of soft robots by learning a condensed FEM model
Authors: Etienne Ménager, Tanguy Navez, Olivier Goury, Christian Duriez
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11408
Pdf link: https://arxiv.org/pdf/2307.11408
Abstract The Finite Element Method (FEM) is a powerful modeling tool for predicting the behavior of soft robots. However, its use for control can be difficult for non-specialists of numerical computation: it requires an optimization of the computation to make it real-time. In this paper, we propose a learning-based approach to obtain a compact but sufficiently rich mechanical representation. Our choice is based on nonlinear compliance data in the actuator/effector space provided by a condensation of the FEM model. We demonstrate that this compact model can be learned with a reasonable amount of data and, at the same time, be very efficient in terms of modeling, since we can deduce the direct and inverse kinematics of the robot. We also show how to couple some models learned individually in particular on an example of a gripper composed of two soft fingers. Other results are shown by comparing the inverse model derived from the full FEM model and the one from the compact learned version. This work opens new perspectives, namely for the embedded control of soft robots, but also for their design. These perspectives are also discussed in the paper.
Deep Directly-Trained Spiking Neural Networks for Object Detection
Authors: Qiaoyi Su, Yuhong Chou, Yifan Hu, Jianing Li, Shijie Mei, Ziyang Zhang, Guoqi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.11411
Pdf link: https://arxiv.org/pdf/2307.11411
Abstract Spiking neural networks (SNNs) are brain-inspired energy-efficient models that encode information in spatiotemporal dynamics. Recently, deep SNNs trained directly have shown great success in achieving high performance on classification tasks with very few time steps. However, how to design a directly-trained SNN for the regression task of object detection still remains a challenging problem. To address this problem, we propose EMS-YOLO, a novel directly-trained SNN framework for object detection, which is the first trial to train a deep SNN with surrogate gradients for object detection rather than ANN-SNN conversion strategies. Specifically, we design a full-spike residual block, EMS-ResNet, which can effectively extend the depth of the directly-trained SNN with low power consumption. Furthermore, we theoretically analyze and prove the EMS-ResNet could avoid gradient vanishing or exploding. The results demonstrate that our approach outperforms the state-of-the-art ANN-SNN conversion methods (at least 500 time steps) in extremely fewer time steps (only 4 time steps). It is shown that our model could achieve comparable performance to the ANN with the same architecture while consuming 5.83 times less energy on the frame-based COCO Dataset and the event-based Gen1 Dataset.
A Video-based Detector for Suspicious Activity in Examination with OpenPose
Authors: Reuben Moyo, Stanley Ndebvu, Michael Zimba, Jimmy Mbelwa
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.11413
Pdf link: https://arxiv.org/pdf/2307.11413
Abstract Examinations are a crucial part of the learning process, and academic institutions invest significant resources into maintaining their integrity by preventing cheating from students or facilitators. However, cheating has become rampant in examination setups, compromising their integrity. The traditional method of relying on invigilators to monitor every student is impractical and ineffective. To address this issue, there is a need to continuously record exam sessions to monitor students for suspicious activities. However, these recordings are often too lengthy for invigilators to analyze effectively, and fatigue may cause them to miss significant details. To widen the coverage, invigilators could use fixed overhead or wearable cameras. This paper introduces a framework that uses automation to analyze videos and detect suspicious activities during examinations efficiently and effectively. We utilized the OpenPose framework and Convolutional Neural Network (CNN) to identify students exchanging objects during exams. This detection system is vital in preventing cheating and promoting academic integrity, fairness, and quality education for institutions.
Bidding efficiently in Simultaneous Ascending Auctions with budget and eligibility constraints using Simultaneous Move Monte Carlo Tree Search
Authors: Alexandre Pacaud, Aurelien Bechler, Marceau Coupechoux
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2307.11428
Pdf link: https://arxiv.org/pdf/2307.11428
Abstract For decades, Simultaneous Ascending Auction (SAA) has been the most popular mechanism used for spectrum auctions. It has recently been employed by many countries for the allocation of 5G licences. Although SAA presents relatively simple rules, it induces a complex strategical game for which the optimal bidding strategy is unknown. Considering the fact that sometimes billions of euros are at stake in a SAA, establishing an efficient bidding strategy is crucial. In this work, we model the auction as a $n$-player simultaneous move game with complete information and propose the first efficient bidding algorithm that tackles simultaneously its four main strategical issues: the $\textit{exposure problem}$, the $\textit{own price effect}$, $\textit{budget constraints}$ and the $\textit{eligibility management problem}$. Our solution, called $SMS^\alpha$, is based on Simultaneous Move Monte Carlo Tree Search (SM-MCTS) and relies on a new method for the prediction of closing prices. By introducing scalarised rewards in $SMS^\alpha$, we give the possibility to bidders to define their own level of risk-aversion. Through extensive numerical experiments on instances of realistic size, we show that $SMS^\alpha$ largely outperforms state-of-the-art algorithms, notably by achieving higher expected utility while taking less risks.
On the convergence order of the Euler scheme for scalar SDEs with Hölder-type diffusion coefficients
Authors: Annalena Mickel, Andreas Neuenkirch
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2307.11448
Pdf link: https://arxiv.org/pdf/2307.11448
Abstract We study the Euler scheme for scalar non-autonomous stochastic differential equations, whose diffusion coefficient is not globally Lipschitz but a fractional power of a globally Lipschitz function. We analyse the strong error and establish a criterion, which relates the convergence order of the Euler scheme to an inverse moment condition for the diffusion coefficient. Our result in particular applies to Cox-Ingersoll-Ross-, Chan-Karolyi-Longstaff-Sanders- or Wright-Fisher-type stochastic differential equations and thus provides a unifying framework.
Data-Driven Cooperative Adaptive Cruise Control for Unknown Nonlinear Vehicle Platoons
Authors: Jianglin Lan
Subjects: Systems and Control (eess.SY); Robotics (cs.RO); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2307.11505
Pdf link: https://arxiv.org/pdf/2307.11505
Abstract This paper studies cooperative adaptive cruise control (CACC) for vehicle platoons with consideration of the unknown nonlinear vehicle dynamics that are normally ignored in the literature. A unified data-driven CACC design is proposed for platoons of pure automated vehicles (AVs) or of mixed AVs and human-driven vehicles (HVs). The CACC leverages online-collected sufficient data samples of vehicle accelerations, spacing and relative velocities. The data-driven control design is formulated as a semidefinite program (SDP) that can be solved efficiently using off-the-shelf solvers. The efficacy and advantage of the proposed CACC are demonstrated through a comparison with the classic adaptive cruise control (ACC) method on a platoon of pure AVs and a mixed platoon under a representative aggressive driving profile.
CORE: Cooperative Reconstruction for Multi-Agent Perception
Authors: Binglu Wang, Lei Zhang, Zhaozhong Wang, Yongqiang Zhao, Tianfei Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.11514
Pdf link: https://arxiv.org/pdf/2307.11514
Abstract This paper presents CORE, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception. It addresses the task from a novel perspective of cooperative reconstruction, based on two key insights: 1) cooperating agents together provide a more holistic observation of the environment, and 2) the holistic observation can serve as valuable supervision to explicitly guide the model learning how to reconstruct the ideal observation based on collaboration. CORE instantiates the idea with three major components: a compressor for each agent to create more compact feature representation for efficient broadcasting, a lightweight attentive collaboration component for cross-agent message aggregation, and a reconstruction module to reconstruct the observation based on aggregated feature representations. This learning-to-reconstruct idea is task-agnostic, and offers clear and reasonable supervision to inspire more effective collaboration, eventually promoting perception tasks. We validate CORE on OPV2V, a large-scale multi-agent percetion dataset, in two tasks, i.e., 3D object detection and semantic segmentation. Results demonstrate that the model achieves state-of-the-art performance on both tasks, and is more communication-efficient.
Model Reporting for Certifiable AI: A Proposal from Merging EU Regulation into AI Development
Authors: Danilo Brajovic, Niclas Renner, Vincent Philipp Goebels, Philipp Wagner, Benjamin Fresz, Martin Biller, Mara Klaeb, Janika Kutz, Jens Neuhuettler, Marco F. Huber
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.11525
Pdf link: https://arxiv.org/pdf/2307.11525
Abstract Despite large progress in Explainable and Safe AI, practitioners suffer from a lack of regulation and standards for AI safety. In this work we merge recent regulation efforts by the European Union and first proposals for AI guidelines with recent trends in research: data and model cards. We propose the use of standardized cards to document AI applications throughout the development process. Our main contribution is the introduction of use-case and operation cards, along with updates for data and model cards to cope with regulatory requirements. We reference both recent research as well as the source of the regulation in our cards and provide references to additional support material and toolboxes whenever possible. The goal is to design cards that help practitioners develop safe AI systems throughout the development process, while enabling efficient third-party auditing of AI applications, being easy to understand, and building trust in the system. Our work incorporates insights from interviews with certification experts as well as developers and individuals working with the developed AI applications.
Solving Pallet loading Problem with Real-World Constraints
Authors: Marko Švaco, Filip Šuligoj, Bojan Šekoranja, Josip Vidaković, Pietro Kristović
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.11531
Pdf link: https://arxiv.org/pdf/2307.11531
Abstract Efficient cargo packing and transport unit stacking play a vital role in enhancing logistics efficiency and reducing costs in the field of logistics. This article focuses on the challenging problem of loading transport units onto pallets, which belongs to the class of NP-hard problems. We propose a novel method for solving the pallet loading problem using a branch and bound algorithm, where there is a loading order of transport units. The derived algorithm considers only a heuristically favourable subset of possible positions of the transport units, which has a positive effect on computability. Furthermore, it is ensured that the pallet configuration meets real-world constraints, such as the stability of the position of transport units under the influence of transport inertial forces and gravity.
Training Latency Minimization for Model-Splitting Allowed Federated Edge Learning
Authors: Yao Wen, Guopeng Zhang, Kezhi Wang, Kun Yang
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.11532
Pdf link: https://arxiv.org/pdf/2307.11532
Abstract To alleviate the shortage of computing power faced by clients in training deep neural networks (DNNs) using federated learning (FL), we leverage the edge computing and split learning to propose a model-splitting allowed FL (SFL) framework, with the aim to minimize the training latency without loss of test accuracy. Under the synchronized global update setting, the latency to complete a round of global training is determined by the maximum latency for the clients to complete a local training session. Therefore, the training latency minimization problem (TLMP) is modelled as a minimizing-maximum problem. To solve this mixed integer nonlinear programming problem, we first propose a regression method to fit the quantitative-relationship between the cut-layer and other parameters of an AI-model, and thus, transform the TLMP into a continuous problem. Considering that the two subproblems involved in the TLMP, namely, the cut-layer selection problem for the clients and the computing resource allocation problem for the parameter-server are relative independence, an alternate-optimization-based algorithm with polynomial time complexity is developed to obtain a high-quality solution to the TLMP. Extensive experiments are performed on a popular DNN-model EfficientNetV2 using dataset MNIST, and the results verify the validity and improved performance of the proposed SFL framework.
A reduced basis method for frictional contact problems formulated with Nitsche's method
Authors: Idrissa Niakh, Guillaume Drouet, Virginie Ehrlacher, Alexandre Ern
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.11541
Pdf link: https://arxiv.org/pdf/2307.11541
Abstract We develop an efficient reduced basis method for the frictional contact problem formulated using Nitsche's method. We focus on the regime of small deformations and on Tresca friction. The key idea ensuring the computational efficiency of the method is to treat the nonlinearity resulting from the contact and friction conditions by means of the Empirical Interpolation Method. The proposed algorithm is applied to the Hertz contact problem between two half-disks with parameter-dependent radius. We also highlight the benefits of the present approach with respect to the mixed (primal-dual) formulation.
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
Authors: Zunnan Xu, Zhihong Chen, Yong Zhang, Yibing Song, Xiang Wan, Guanbin Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.11545
Pdf link: https://arxiv.org/pdf/2307.11545
Abstract Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities. In this paper, we do an investigation of efficient tuning problems on referring image segmentation. We propose a novel adapter called Bridger to facilitate cross-modal information exchange and inject task-specific information into the pre-trained model. We also design a lightweight decoder for image segmentation. Our approach achieves comparable or superior performance with only 1.61\% to 3.38\% backbone parameter updates, evaluated on challenging benchmarks. The code is available at \url{https://github.com/kkakkkka/ETRIS}.
Feature Map Testing for Deep Neural Networks
Authors: Dong Huang, Qingwen Bu, Yahao Qing, Yichao Fu, Heming Cui
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.11563
Pdf link: https://arxiv.org/pdf/2307.11563
Abstract Due to the widespread application of deep neural networks~(DNNs) in safety-critical tasks, deep learning testing has drawn increasing attention. During the testing process, test cases that have been fuzzed or selected using test metrics are fed into the model to find fault-inducing test units (e.g., neurons and feature maps, activating which will almost certainly result in a model error) and report them to the DNN developer, who subsequently repair them~(e.g., retraining the model with test cases). Current test metrics, however, are primarily concerned with the neurons, which means that test cases that are discovered either by guided fuzzing or selection with these metrics focus on detecting fault-inducing neurons while failing to detect fault-inducing feature maps. In this work, we propose DeepFeature, which tests DNNs from the feature map level. When testing is conducted, DeepFeature will scrutinize every internal feature map in the model and identify vulnerabilities that can be enhanced through repairing to increase the model's overall performance. Exhaustive experiments are conducted to demonstrate that (1) DeepFeature is a strong tool for detecting the model's vulnerable feature maps; (2) DeepFeature's test case selection has a high fault detection rate and can detect more types of faults~(comparing DeepFeature to coverage-guided selection techniques, the fault detection rate is increased by 49.32\%). (3) DeepFeature's fuzzer also outperforms current fuzzing techniques and generates valuable test cases more efficiently.
Subset Sampling and Its Extensions
Authors: Jinchao Huang, Sibo Wang
Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.11585
Pdf link: https://arxiv.org/pdf/2307.11585
Abstract This paper studies the \emph{subset sampling} problem. The input is a set $\mathcal{S}$ of $n$ records together with a function $\textbf{p}$ that assigns each record $v\in\mathcal{S}$ a probability $\textbf{p}(v)$. A query returns a random subset $X$ of $\mathcal{S}$, where each record $v\in\mathcal{S}$ is sampled into $X$ independently with probability $\textbf{p}(v)$. The goal is to store $\mathcal{S}$ in a data structure to answer queries efficiently. If $\mathcal{S}$ fits in memory, the problem is interesting when $\mathcal{S}$ is dynamic. We develop a dynamic data structure with $\mathcal{O}(1+\mu{\mathcal{S}})$ expected \emph{query} time, $\mathcal{O}(n)$ space and $\mathcal{O}(1)$ amortized expected \emph{update}, \emph{insert} and \emph{delete} time, where $\mu{\mathcal{S}}=\sum_{v\in\mathcal{S}}\textbf{p}(v)$. The query time and space are optimal. If $\mathcal{S}$ does not fit in memory, the problem is difficult even if $\mathcal{S}$ is static. Under this scenario, we present an I/O-efficient algorithm that answers a \emph{query} in $\mathcal{O}\left((\log^B n)/B+(\mu\mathcal{S}/B)\log_{M/B} (n/B)\right)$ amortized expected I/Os using $\mathcal{O}(n/B)$ space, where $M$ is the memory size, $B$ is the block size and $\log^_B n$ is the number of iterative $\log2(.)$ operations we need to perform on $n$ before going below $B$. In addition, when each record is associated with a real-valued key, we extend the \emph{subset sampling} problem to the \emph{range subset sampling} problem, in which we require that the keys of the sampled records fall within a specified input range $[a,b]$. For this extension, we provide a solution under the dynamic setting, with $\mathcal{O}(\log n+\mu{\mathcal{S}\cap[a,b]})$ expected \emph{query} time, $\mathcal{O}(n)$ space and $\mathcal{O}(\log n)$ amortized expected \emph{update}, \emph{insert} and \emph{delete} time.
Transferability of Convolutional Neural Networks in Stationary Learning Tasks
Authors: Damian Owerko, Charilaos I. Kanatsoulis, Jennifer Bondarchuk, Donald J. Bucci Jr, Alejandro Ribeiro
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.11588
Pdf link: https://arxiv.org/pdf/2307.11588
Abstract Recent advances in hardware and big data acquisition have accelerated the development of deep learning techniques. For an extended period of time, increasing the model complexity has led to performance improvements for various tasks. However, this trend is becoming unsustainable and there is a need for alternative, computationally lighter methods. In this paper, we introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. To accomplish this we investigate the properties of CNNs for tasks where the underlying signals are stationary. We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining. This claim is supported by our theoretical analysis, which provides a bound on the performance degradation. Additionally, we conduct thorough experimental analysis on two tasks: multi-target tracking and mobile infrastructure on demand. Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten. Thus, CNN architectures provide solutions to these problems at previously computationally intractable scales.
Data-based system representations from irregularly measured data
Authors: Mohammad Alsalti, Ivan Markovsky, Victor G. Lopez, Matthias A. Müller
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2307.11589
Pdf link: https://arxiv.org/pdf/2307.11589
Abstract Non-parametric representations of dynamical systems based on the image of a Hankel matrix of data are extensively used for data-driven control. However, if samples of data are missing, obtaining such representations becomes a difficult task. By exploiting the kernel structure of Hankel matrices of irregularly measured data generated by a linear time-invariant system, we provide computational methods for which any complete finite-length behavior of the system can be obtained. For the special case of periodically missing outputs, we provide conditions on the input such that the former result is guaranteed. We illustrate with an example how the resulting representation provides a more computationally efficient method for low-rank matrix completion when compared to an alternative method.
Robust Fully-Asynchronous Methods for Distributed Training over General Architecture
Authors: Zehan Zhu, Ye Tian, Yan Huang, Jinming Xu, Shibo He
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11617
Pdf link: https://arxiv.org/pdf/2307.11617
Abstract Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers. We propose a Robust Fully-Asynchronous Stochastic Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own pace without any form of synchronization. Different from existing asynchronous distributed algorithms, R-FAST can eliminate the impact of data heterogeneity across devices and allow for packet losses by employing a robust gradient tracking strategy that relies on properly designed auxiliary variables for tracking and buffering the overall gradient vector. More importantly, the proposed method utilizes two spanning-tree graphs for communication so long as both share at least one common root, enabling flexible designs in communication architectures. We show that R-FAST converges in expectation to a neighborhood of the optimum with a geometric rate for smooth and strongly convex objectives; and to a stationary point with a sublinear rate for general non-convex settings. Extensive experiments demonstrate that R-FAST runs 1.5-2 times faster than synchronous benchmark algorithms, such as Ring-AllReduce and D-PSGD, while still achieving comparable accuracy, and outperforms existing asynchronous SOTA algorithms, such as AD-PSGD and OSGP, especially in the presence of stragglers.
FEDD -- Fair, Efficient, and Diverse Diffusion-based Lesion Segmentation and Malignancy Classification
Authors: Héctor Carrión, Narges Norouzi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.11654
Pdf link: https://arxiv.org/pdf/2307.11654
Abstract Skin diseases affect millions of people worldwide, across all ethnicities. Increasing diagnosis accessibility requires fair and accurate segmentation and classification of dermatology images. However, the scarcity of annotated medical images, especially for rare diseases and underrepresented skin tones, poses a challenge to the development of fair and accurate models. In this study, we introduce a Fair, Efficient, and Diverse Diffusion-based framework for skin lesion segmentation and malignancy classification. FEDD leverages semantically meaningful feature embeddings learned through a denoising diffusion probabilistic backbone and processes them via linear probes to achieve state-of-the-art performance on Diverse Dermatology Images (DDI). We achieve an improvement in intersection over union of 0.18, 0.13, 0.06, and 0.07 while using only 5%, 10%, 15%, and 20% labeled samples, respectively. Additionally, FEDD trained on 10% of DDI demonstrates malignancy classification accuracy of 81%, 14% higher compared to the state-of-the-art. We showcase high efficiency in data-constrained scenarios while providing fair performance for diverse skin tones and rare malignancy conditions. Our newly annotated DDI segmentation masks and training code can be found on https://github.com/hectorcarrion/fedd.
Improved Approximate Distance Oracles: Bypassing the Thorup-Zwick Bound in Dense Graphs
Authors: Davide Bilò, Shiri Chechik, Keerti Choudhary, Sarel Cohen, Tobias Friedrich, Martin Schirneck
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2307.11677
Pdf link: https://arxiv.org/pdf/2307.11677
Abstract Despite extensive research on distance oracles, there are still large gaps between the best constructions for spanners and distance oracles. Notably, there exist sparse spanners with a multiplicative stretch of $1+\varepsilon$ plus some additive stretch. A fundamental open problem is whether such a bound is achievable for distance oracles as well. Specifically, can we construct a distance oracle with multiplicative stretch better than 2, along with some additive stretch, while maintaining subquadratic space complexity? This question remains a crucial area of investigation, and finding a positive answer would be a significant step forward for distance oracles. Indeed, such oracles have been constructed for sparse graphs. However, in the more general case of dense graphs, it is currently unknown whether such oracles exist. In this paper, we contribute to the field by presenting the first distance oracles that achieve a multiplicative stretch of $1+\varepsilon$ along with a small additive stretch while maintaining subquadratic space complexity. Our results represent an advancement particularly for constructing efficient distance oracles for dense graphs. In addition, we present a whole family of oracles that, for any positive integer $k$, achieve a multiplicative stretch of $2k-1+\varepsilon$ using $o(n^{1+1/k})$ space.
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning
Authors: Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11704
Pdf link: https://arxiv.org/pdf/2307.11704
Abstract In this paper, we present \textsc{JoinGym}, an efficient and lightweight query optimization environment for reinforcement learning (RL). Join order selection (JOS) is a classic NP-hard combinatorial optimization problem from database query optimization and can serve as a practical testbed for the generalization capabilities of RL algorithms. We describe how to formulate each of the left-deep and bushy variants of the JOS problem as a Markov Decision Process (MDP), and we provide an implementation adhering to the standard Gymnasium API. We highlight that our implementation \textsc{JoinGym} is completely based on offline traces of all possible joins, which enables RL practitioners to easily and quickly test their methods on a realistic data management problem without needing to setup any systems. Moreover, we also provide all possible join traces on $3300$ novel SQL queries generated from the IMDB dataset. Upon benchmarking popular RL algorithms, we find that at least one method can obtain near-optimal performance on train-set queries but their performance degrades by several orders of magnitude on test-set queries. This gap motivates further research for RL algorithms that generalize well in multi-task combinatorial optimization problems.
GP-Frontier for Local Mapless Navigation
Authors: Mahmoud Ali, Lantao Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.11717
Pdf link: https://arxiv.org/pdf/2307.11717
Abstract We propose a new frontier concept called the Gaussian Process Frontier (GP-Frontier) that can be used to locally navigate a robot towards a goal without building a map. The GP-Frontier is built on the uncertainty assessment of an efficient variant of sparse Gaussian Process. Based only on local ranging sensing measurement, the GP-Frontier can be used for navigation in both known and unknown environments. The proposed method is validated through intensive evaluations, and the results show that the GP-Frontier can navigate the robot in a safe and persistent way, i.e., the robot moves in the most open space (thus reducing the risk of collision) without relying on a map or a path planner.
Keyword: faster

Technical Challenges of Deploying Reinforcement Learning Agents for Game Testing in AAA Games
Authors: Jonas Gillberg, Joakim Bergdahl, Alessandro Sestini, Andrew Eakins, Linus Gisslen
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11105
Pdf link: https://arxiv.org/pdf/2307.11105
Abstract Going from research to production, especially for large and complex software systems, is fundamentally a hard problem. In large-scale game production, one of the main reasons is that the development environment can be very different from the final product. In this technical paper we describe an effort to add an experimental reinforcement learning system to an existing automated game testing solution based on scripted bots in order to increase its capacity. We report on how this reinforcement learning system was integrated with the aim to increase test coverage similar to [1] in a set of AAA games including Battlefield 2042 and Dead Space (2023). The aim of this technical paper is to show a use-case of leveraging reinforcement learning in game production and cover some of the largest time sinks anyone who wants to make the same journey for their game may encounter. Furthermore, to help the game industry to adopt this technology faster, we propose a few research directions that we believe will be valuable and necessary for making machine learning, and especially reinforcement learning, an effective tool in game production.
Comparison between transformers and convolutional models for fine-grained classification of insects
Authors: Rita Pucci, Vincent J. Kalkman, Dan Stowell
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11112
Pdf link: https://arxiv.org/pdf/2307.11112
Abstract Fine-grained classification is challenging due to the difficulty of finding discriminatory features. This problem is exacerbated when applied to identifying species within the same taxonomical class. This is because species are often sharing morphological characteristics that make them difficult to differentiate. We consider the taxonomical class of Insecta. The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems. Citizen science is doing brilliant work of collecting images of insects in the wild giving the possibility to experts to create improved distribution maps in all countries. We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks. At the SOTA, the field of deep learning algorithms is extremely fruitful, so how to identify the algorithm to use? We focus on Odonata and Coleoptera orders, and we propose an initial comparative study to analyse the two best-known layer structures for computer vision: transformer and convolutional layers. We compare the performance of T2TViT, a fully transformer-base, EfficientNet, a fully convolutional-base, and ViTAE, a hybrid. We analyse the performance of the three models in identical conditions evaluating the performance per species, per morph together with sex, the inference time, and the overall performance with unbalanced datasets of images from smartphones. Although we observe high performances with all three families of models, our analysis shows that the hybrid model outperforms the fully convolutional-base and fully transformer-base models on accuracy performance and the fully transformer-base model outperforms the others on inference speed and, these prove the transformer to be robust to the shortage of samples and to be faster at inference time.
GPU-accelerated Parallel Solutions to the Quadratic Assignment Problem
Authors: Clara Novoa, Apan Qasem
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)
Arxiv link: https://arxiv.org/abs/2307.11248
Pdf link: https://arxiv.org/pdf/2307.11248
Abstract The Quadratic Assignment Problem (QAP) is an important combinatorial optimization problem with applications in many areas including logistics and manufacturing. QAP is known to be NP-hard, a computationally challenging problem, which requires the use of sophisticated heuristics in finding acceptable solutions for most real-world data sets. In this paper, we present GPU-accelerated implementations of a 2opt and a tabu search algorithm for solving the QAP. For both algorithms, we extract parallelism at multiple levels and implement novel code optimization techniques that fully utilize the GPU hardware. On a series of experiments on the well-known QAPLIB data sets, our solutions, on average run an order-of-magnitude faster than previous implementations and deliver up to a factor of 63 speedup on specific instances. The quality of the solutions produced by our implementations of 2opt and tabu is within 1.03% and 0.15% of the best known values. The experimental results also provide key insight into the performance characteristics of accelerated QAP solvers. In particular, the results reveal that both algorithmic choice and the shape of the input data sets are key factors in finding efficient implementations.
A Fair and Memory/Time-efficient Hashmap
Authors: Abolfazl Asudeh, Nima Shahbazi, Stavros Sintos
Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.11355
Pdf link: https://arxiv.org/pdf/2307.11355
Abstract There is a large amount of work constructing hashmaps to minimize the number of collisions. However, to the best of our knowledge no known hashing technique guarantees group fairness among different groups of items. We are given a set $P$ of $n$ tuples in $\mathbb{R}^d$, for a constant dimension $d$ and a set of groups $\mathcal{G}={\mathbf{g}_1,\ldots, \mathbf{g}_k}$ such that every tuple belongs to a unique group. We formally define the fair hashing problem introducing the notions of single fairness ($Pr[h(p)=h(x)\mid p\in \mathbf{g}_i, x\in P]$ for every $i=1,\ldots, k$), pairwise fairness ($Pr[h(p)=h(q)\mid p,q\in \mathbf{g}_i]$ for every $i=1,\ldots, k$), and the well-known collision probability ($Pr[h(p)=h(q)\mid p,q\in P]$). The goal is to construct a hashmap such that the collision probability, the single fairness, and the pairwise fairness are close to $1/m$, where $m$ is the number of buckets in the hashmap. We propose two families of algorithms to design fair hashmaps. First, we focus on hashmaps with optimum memory consumption minimizing the unfairness. We model the input tuples as points in $\mathbb{R}^d$ and the goal is to find the vector $w$ such that the projection of $P$ onto $w$ creates an ordering that is convenient to split to create a fair hashmap. For each projection we design efficient algorithms that find near optimum partitions of exactly (or at most) $m$ buckets. Second, we focus on hashmaps with optimum fairness ($0$-unfairness), minimizing the memory consumption. We make the important observation that the fair hashmap problem is reduced to the necklace splitting problem. By carefully implementing algorithms for solving the necklace splitting problem, we propose faster algorithms constructing hashmaps with $0$-unfairness using $2(m-1)$ boundary points when $k=2$ and $k(m-1)(4+\log_2 (3mn))$ boundary points for $k>2$.
Robust Fully-Asynchronous Methods for Distributed Training over General Architecture
Authors: Zehan Zhu, Ye Tian, Yan Huang, Jinming Xu, Shibo He
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11617
Pdf link: https://arxiv.org/pdf/2307.11617
Abstract Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers. We propose a Robust Fully-Asynchronous Stochastic Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own pace without any form of synchronization. Different from existing asynchronous distributed algorithms, R-FAST can eliminate the impact of data heterogeneity across devices and allow for packet losses by employing a robust gradient tracking strategy that relies on properly designed auxiliary variables for tracking and buffering the overall gradient vector. More importantly, the proposed method utilizes two spanning-tree graphs for communication so long as both share at least one common root, enabling flexible designs in communication architectures. We show that R-FAST converges in expectation to a neighborhood of the optimum with a geometric rate for smooth and strongly convex objectives; and to a stationary point with a sublinear rate for general non-convex settings. Extensive experiments demonstrate that R-FAST runs 1.5-2 times faster than synchronous benchmark algorithms, such as Ring-AllReduce and D-PSGD, while still achieving comparable accuracy, and outperforms existing asynchronous SOTA algorithms, such as AD-PSGD and OSGP, especially in the presence of stragglers.
Keyword: mobile

Adapting to Human Preferences to Lead or Follow in Human-Robot Collaboration: A System Evaluation
Authors: Ali Noormohammadi-Asl, Ali Ayub, Stephen L. Smith, Kerstin Dautenhahn
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.11192
Pdf link: https://arxiv.org/pdf/2307.11192
Abstract With the introduction of collaborative robots, humans and robots can now work together in close proximity and share the same workspace. However, this collaboration presents various challenges that need to be addressed to ensure seamless cooperation between the agents. This paper focuses on task planning for human-robot collaboration, taking into account the human's performance and their preference for following or leading. Unlike conventional task allocation methods, the proposed system allows both the robot and human to select and assign tasks to each other. Our previous studies evaluated the proposed framework in a computer simulation environment. This paper extends the research by implementing the algorithm in a real scenario where a human collaborates with a Fetch mobile manipulator robot. We briefly describe the experimental setup, procedure and implementation of the planned user study. As a first step, in this paper, we report on a system evaluation study where the experimenter enacted different possible behaviours in terms of leader/follower preferences that can occur in a user study. Results show that the robot can adapt and respond appropriately to different human agent behaviours, enacted by the experimenter. A future user study will evaluate the system with human participants.
Underwater 3D positioning on smart devices
Authors: Tuochao Chen, Justin Chan, Shyamnath Gollakota
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.11263
Pdf link: https://arxiv.org/pdf/2307.11263
Abstract The emergence of water-proof mobile and wearable devices (e.g., Garmin Descent and Apple Watch Ultra) designed for underwater activities like professional scuba diving, opens up opportunities for underwater networking and localization capabilities on these devices. Here, we present the first underwater acoustic positioning system for smart devices. Unlike conventional systems that use floating buoys as anchors at known locations, we design a system where a dive leader can compute the relative positions of all other divers, without any external infrastructure. Our intuition is that in a well-connected network of devices, if we compute the pairwise distances, we can determine the shape of the network topology. By incorporating orientation information about a single diver who is in the visual range of the leader device, we can then estimate the positions of all the remaining divers, even if they are not within sight. We address various practical problems including detecting erroneous distance estimates, addressing rotational and flipping ambiguities as well as designing a distributed timestamp protocol that scales linearly with the number of devices. Our evaluations show that our distributed system running on underwater deployments of 4-5 commodity smart devices can perform pairwise ranging and localization with median errors of 0.5-0.9 m and 0.9-1.6 m
Supporting Post-disaster Recovery with Agent-based Modeling in Multilayer Socio-physical Networks
Authors: Jiawei Xue, Sangung Park, Washim Uddin Mondal, Sandro Martinelli Reia, Tong Yao, Satish V. Ukkusuri
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2307.11464
Pdf link: https://arxiv.org/pdf/2307.11464
Abstract The examination of post-disaster recovery (PDR) in a socio-physical system enables us to elucidate the complex relationships between humans and infrastructures. Although existing studies have identified many patterns in the PDR process, they fall short of describing how individual recoveries contribute to the overall recovery of the system. To enhance the understanding of individual return behavior and the recovery of point-of-interests (POIs), we propose an agent-based model (ABM), called PostDisasterSim. We apply the model to analyze the recovery of five counties in Texas following Hurricane Harvey in 2017. Specifically, we construct a three-layer network comprising the human layer, the social infrastructure layer, and the physical infrastructure layer, using mobile phone location data and POI data. Based on prior studies and a household survey, we develop the ABM to simulate how evacuated individuals return to their homes, and social and physical infrastructures recover. By implementing the ABM, we unveil the heterogeneity in recovery dynamics in terms of agent types, housing types, household income levels, and geographical locations. Moreover, simulation results across nine scenarios quantitatively demonstrate the positive effects of social and physical infrastructure improvement plans. This study can assist disaster scientists in uncovering nuanced recovery patterns and policymakers in translating policies like resource allocation into practice.
Transferability of Convolutional Neural Networks in Stationary Learning Tasks
Authors: Damian Owerko, Charilaos I. Kanatsoulis, Jennifer Bondarchuk, Donald J. Bucci Jr, Alejandro Ribeiro
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.11588
Pdf link: https://arxiv.org/pdf/2307.11588
Abstract Recent advances in hardware and big data acquisition have accelerated the development of deep learning techniques. For an extended period of time, increasing the model complexity has led to performance improvements for various tasks. However, this trend is becoming unsustainable and there is a need for alternative, computationally lighter methods. In this paper, we introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. To accomplish this we investigate the properties of CNNs for tasks where the underlying signals are stationary. We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining. This claim is supported by our theoretical analysis, which provides a bound on the performance degradation. Additionally, we conduct thorough experimental analysis on two tasks: multi-target tracking and mobile infrastructure on demand. Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten. Thus, CNN architectures provide solutions to these problems at previously computationally intractable scales.
Keyword: pruning

FMT: Removing Backdoor Feature Maps via Feature Map Testing in Deep Neural Networks
Authors: Dong Huang, Qingwen Bu, Yahao Qing, Yichao Fu, Heming Cui
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2307.11565
Pdf link: https://arxiv.org/pdf/2307.11565
Abstract Deep neural networks have been widely used in many critical applications, such as autonomous vehicles and medical diagnosis. However, their security is threatened by backdoor attack, which is achieved by adding artificial patterns to specific training data. Existing defense strategies primarily focus on using reverse engineering to reproduce the backdoor trigger generated by attackers and subsequently repair the DNN model by adding the trigger into inputs and fine-tuning the model with ground-truth labels. However, once the trigger generated by the attackers is complex and invisible, the defender can not successfully reproduce the trigger. Consequently, the DNN model will not be repaired since the trigger is not effectively removed. In this work, we propose Feature Map Testing~(FMT). Different from existing defense strategies, which focus on reproducing backdoor triggers, FMT tries to detect the backdoor feature maps, which are trained to extract backdoor information from the inputs. After detecting these backdoor feature maps, FMT will erase them and then fine-tune the model with a secure subset of training data. Our experiments demonstrate that, compared to existing defense strategies, FMT can effectively reduce the Attack Success Rate (ASR) even against the most complex and invisible attack triggers. Second, unlike conventional defense methods that tend to exhibit low Robust Accuracy (i.e., the model's accuracy on the poisoned data), FMT achieves higher RA, indicating its superiority in maintaining model performance while mitigating the effects of backdoor attacks~(e.g., FMT obtains 87.40\% RA in CIFAR10). Third, compared to existing feature map pruning techniques, FMT can cover more backdoor feature maps~(e.g., FMT removes 83.33\% of backdoor feature maps from the model in the CIFAR10 \& BadNet scenario).
3D Skeletonization of Complex Grapevines for Robotic Pruning
Authors: Eric Schneider, Sushanth Jayanth, Abhisesh Silwal, George Kantor
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.11706
Pdf link: https://arxiv.org/pdf/2307.11706
Abstract Robotic pruning of dormant grapevines is an area of active research in order to promote vine balance and grape quality, but so far robotic efforts have largely focused on planar, simplified vines not representative of commercial vineyards. This paper aims to advance the robotic perception capabilities necessary for pruning in denser and more complex vine structures by extending plant skeletonization techniques. The proposed pipeline generates skeletal grapevine models that have lower reprojection error and higher connectivity than baseline algorithms. We also show how 3D and skeletal information enables prediction accuracy of pruning weight for dense vines surpassing prior work, where pruning weight is an important vine metric influencing pruning site selection.
Keyword: diffusion

Diffusion Sampling with Momentum for Mitigating Divergence Artifacts
Authors: Suttisak Wizadwongsa, Worameth Chinchuthakun, Pramook Khungurn, Amit Raj, Supasorn Suwajanakorn
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.11118
Pdf link: https://arxiv.org/pdf/2307.11118
Abstract Despite the remarkable success of diffusion models in image generation, slow sampling remains a persistent issue. To accelerate the sampling process, prior studies have reformulated diffusion sampling as an ODE/SDE and introduced higher-order numerical methods. However, these methods often produce divergence artifacts, especially with a low number of sampling steps, which limits the achievable acceleration. In this paper, we investigate the potential causes of these artifacts and suggest that the small stability regions of these methods could be the principal cause. To address this issue, we propose two novel techniques. The first technique involves the incorporation of Heavy Ball (HB) momentum, a well-known technique for improving optimization, into existing diffusion numerical methods to expand their stability regions. We also prove that the resulting methods have first-order convergence. The second technique, called Generalized Heavy Ball (GHVB), constructs a new high-order method that offers a variable trade-off between accuracy and artifact suppression. Experimental results show that our techniques are highly effective in reducing artifacts and improving image quality, surpassing state-of-the-art diffusion solvers on both pixel-based and latent-based diffusion models for low-step sampling. Our research provides novel insights into the design of numerical methods for future diffusion work.
QDC: Quantum Diffusion Convolution Kernels on Graphs
Authors: Thomas Markovich
Subjects: Machine Learning (cs.LG); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2307.11234
Pdf link: https://arxiv.org/pdf/2307.11234
Abstract Graph convolutional neural networks (GCNs) operate by aggregating messages over local neighborhoods given the prediction task under interest. Many GCNs can be understood as a form of generalized diffusion of input features on the graph, and significant work has been dedicated to improving predictive accuracy by altering the ways of message passing. In this work, we propose a new convolution kernel that effectively rewires the graph according to the occupation correlations of the vertices by trading on the generalized diffusion paradigm for the propagation of a quantum particle over the graph. We term this new convolution kernel the Quantum Diffusion Convolution (QDC) operator. In addition, we introduce a multiscale variant that combines messages from the QDC operator and the traditional combinatorial Laplacian. To understand our method, we explore the spectral dependence of homophily and the importance of quantum dynamics in the construction of a bandpass filter. Through these studies, as well as experiments on a range of datasets, we observe that QDC improves predictive performance on the widely used benchmark datasets when compared to similar methods.
DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport
Authors: Zezeng Li, ShengHao Li, Zhanpeng Wang, Na Lei, Zhongxuan Luo, Xianfeng Gu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.11308
Pdf link: https://arxiv.org/pdf/2307.11308
Abstract Sampling from diffusion probabilistic models (DPMs) can be viewed as a piecewise distribution transformation, which generally requires hundreds or thousands of steps of the inverse diffusion trajectory to get a high-quality image. Recent progress in designing fast samplers for DPMs achieves a trade-off between sampling speed and sample quality by knowledge distillation or adjusting the variance schedule or the denoising equation. However, it can't be optimal in both aspects and often suffer from mode mixture in short steps. To tackle this problem, we innovatively regard inverse diffusion as an optimal transport (OT) problem between latents at different stages and propose the DPM-OT, a unified learning framework for fast DPMs with a direct expressway represented by OT map, which can generate high-quality samples within around 10 function evaluations. By calculating the semi-discrete optimal transport map between the data latents and the white noise, we obtain an expressway from the prior distribution to the data distribution, while significantly alleviating the problem of mode mixture. In addition, we give the error bound of the proposed method, which theoretically guarantees the stability of the algorithm. Extensive experiments validate the effectiveness and advantages of DPM-OT in terms of speed and quality (FID and mode mixture), thus representing an efficient solution for generative modeling. Source codes are available at https://github.com/cognaclee/DPM-OT
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
Authors: Jian Ma, Junhao Liang, Chen Chen, Haonan Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.11410
Pdf link: https://arxiv.org/pdf/2307.11410
Abstract Recent progress in personalized image generation using diffusion models has been significant. However, development in the area of open-domain and non-fine-tuning personalized image generation is proceeding rather slowly. In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or multi-subject in any domain. Firstly, we construct an automatic data labeling tool and use the LAION-Aesthetics dataset to construct a large-scale dataset consisting of 76M images and their corresponding subject detection bounding boxes, segmentation masks and text descriptions. Secondly, we design a new unified framework that combines text and image semantics by incorporating coarse location and fine-grained reference image control to maximize subject fidelity and generalization. Furthermore, we also adopt an attention control mechanism to support multi-subject generation. Extensive qualitative and quantitative results demonstrate that our method outperforms other SOTA frameworks in single, multiple, and human customized image generation. Please refer to our \href{https://oppo-mente-lab.github.io/subject_diffusion/}{project page}
On the convergence order of the Euler scheme for scalar SDEs with Hölder-type diffusion coefficients
Authors: Annalena Mickel, Andreas Neuenkirch
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2307.11448
Pdf link: https://arxiv.org/pdf/2307.11448
Abstract We study the Euler scheme for scalar non-autonomous stochastic differential equations, whose diffusion coefficient is not globally Lipschitz but a fractional power of a globally Lipschitz function. We analyse the strong error and establish a criterion, which relates the convergence order of the Euler scheme to an inverse moment condition for the diffusion coefficient. Our result in particular applies to Cox-Ingersoll-Ross-, Chan-Karolyi-Longstaff-Sanders- or Wright-Fisher-type stochastic differential equations and thus provides a unifying framework.
Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting
Authors: Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, Yuyang Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.11494
Pdf link: https://arxiv.org/pdf/2307.11494
Abstract Diffusion models have achieved state-of-the-art performance in generative modeling tasks across various domains. Prior works on time series diffusion models have primarily focused on developing conditional models tailored to specific forecasting or imputation tasks. In this work, we explore the potential of task-agnostic, unconditional diffusion models for several time series applications. We propose TSDiff, an unconditionally trained diffusion model for time series. Our proposed self-guidance mechanism enables conditioning TSDiff for downstream tasks during inference, without requiring auxiliary networks or altering the training procedure. We demonstrate the effectiveness of our method on three different time series tasks: forecasting, refinement, and synthetic data generation. First, we show that TSDiff is competitive with several task-specific conditional forecasting methods (predict). Second, we leverage the learned implicit probability density of TSDiff to iteratively refine the predictions of base forecasters with reduced computational overhead over reverse diffusion (refine). Notably, the generative performance of the model remains intact -- downstream forecasters trained on synthetic samples from TSDiff outperform forecasters that are trained on samples from other state-of-the-art generative time series models, occasionally even outperforming models trained on real data (synthesize).
Mixbiotic society measures: Assessment of community well-going as living system
Authors: Takeshi Kato, Jyunichi Miyakoshi, Tadayuki Matsumura, Ryuji Mine, Hiroyuki Mizuno, Yasuo Deguchi
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2307.11594
Pdf link: https://arxiv.org/pdf/2307.11594
Abstract Social isolation is caused by the impoverishment of community (atomism) and fragmentation is caused by the enlargement of in-group (mobism), both of which can be viewed as social problems related to communication. To solve these problems, the philosophical world has proposed the concept of "mixbiotic society," in which individuals with freedom and diverse values mix and mingle to recognize their respective "fundamental incapability" each other and sublimate into solidarity. Based on this concept, this study proposes new mixbiotic society measures to evaluate dynamic communication patterns with reference to classification in cellular automata and particle reaction diffusion that simulate living phenomena. Specifically, the hypothesis of measures corresponding to the four classes was formulated, and the hypothesis was validated by simulating the generation and disappearance of communication. As a result, considering communication patterns as multidimensional vectors, it found that the mean of Euclidean distance for "mobism," the variance of the relative change in distance for "atomism," the composite measure that multiplies the mean and variance of cosine similarity for "mixism," which corresponds to the well-going of mixbiotic society, and the almost zero measures for "nihilism," are suitable. Then, evaluating seven real-society datasets using these measures, we showed that the mixism measure is useful for assessing the livingness of communication, and that it is possible to typify communities based on plural measures. The measures established in this study are superior to conventional analysis in that they can evaluate dynamic patterns, they are simple to calculate, and their meanings are easy to interpret. As a future development, the mixbiotic society measures will be used in the fields of digital democracy and platform cooperativism toward a desirable society.
FEDD -- Fair, Efficient, and Diverse Diffusion-based Lesion Segmentation and Malignancy Classification
Authors: Héctor Carrión, Narges Norouzi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.11654
Pdf link: https://arxiv.org/pdf/2307.11654
Abstract Skin diseases affect millions of people worldwide, across all ethnicities. Increasing diagnosis accessibility requires fair and accurate segmentation and classification of dermatology images. However, the scarcity of annotated medical images, especially for rare diseases and underrepresented skin tones, poses a challenge to the development of fair and accurate models. In this study, we introduce a Fair, Efficient, and Diverse Diffusion-based framework for skin lesion segmentation and malignancy classification. FEDD leverages semantically meaningful feature embeddings learned through a denoising diffusion probabilistic backbone and processes them via linear probes to achieve state-of-the-art performance on Diverse Dermatology Images (DDI). We achieve an improvement in intersection over union of 0.18, 0.13, 0.06, and 0.07 while using only 5%, 10%, 15%, and 20% labeled samples, respectively. Additionally, FEDD trained on 10% of DDI demonstrates malignancy classification accuracy of 81%, 14% higher compared to the state-of-the-art. We showcase high efficiency in data-constrained scenarios while providing fair performance for diverse skin tones and rare malignancy conditions. Our newly annotated DDI segmentation masks and training code can be found on https://github.com/hectorcarrion/fedd.
Keyword: adaptive

CSSL-RHA: Contrastive Self-Supervised Learning for Robust Handwriting Authentication
Authors: Jingyao Wang, Luntian Mou, Changwen Zheng, Wen Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2307.11100
Pdf link: https://arxiv.org/pdf/2307.11100
Abstract Handwriting authentication is a valuable tool used in various fields, such as fraud prevention and cultural heritage protection. However, it remains a challenging task due to the complex features, severe damage, and lack of supervision. In this paper, we propose a novel Contrastive Self-Supervised Learning framework for Robust Handwriting Authentication (CSSL-RHA) to address these issues. It can dynamically learn complex yet important features and accurately predict writer identities. Specifically, to remove the negative effects of imperfections and redundancy, we design an information-theoretic filter for pre-processing and propose a novel adaptive matching scheme to represent images as patches of local regions dominated by more important features. Through online optimization at inference time, the most informative patch embeddings are identified as the "most important" elements. Furthermore, we employ contrastive self-supervised training with a momentum-based paradigm to learn more general statistical structures of handwritten data without supervision. We conduct extensive experiments on five benchmark datasets and our manually annotated dataset EN-HA, which demonstrate the superiority of our CSSL-RHA compared to baselines. Additionally, we show that our proposed model can still effectively achieve authentication even under abnormal circumstances, such as data falsification and corruption.
SMOTEC: An Edge Computing Testbed for Adaptive Smart Mobility Experimentation
Authors: Zeinab Nezami, Evangelos Pournaras, Amir Borzouie, Jie Xu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2307.11181
Pdf link: https://arxiv.org/pdf/2307.11181
Abstract Smart mobility becomes paramount for meeting net-zero targets. However, autonomous, self-driving and electric vehicles require more than ever before an efficient, resilient and trustworthy computational offloading backbone that expands throughout the edge-to-cloud continuum. Utilizing on-demand heterogeneous computational resources for smart mobility is challenging and often cost-ineffective. This paper introduces SMOTEC, a novel open-source testbed for adaptive smart mobility experimentation with edge computing. SMOTEC provides for the first time a modular end-to-end instrumentation for prototyping and optimizing placement of intelligence services on edge devices such as augmented reality and real-time traffic monitoring. SMOTEC supports a plug-and-play Docker container integration of the SUMO simulator for urban mobility, Raspberry Pi edge devices communicating via ZeroMQ and EPOS for an AI-based decentralized load balancing across edge-to-cloud. All components are orchestrated by the K3s lightweight Kubernetes. A proof-of-concept of self-optimized service placements for traffic monitoring from Munich demonstrates in practice the applicability and cost-effectiveness of SMOTEC.
The Effect of Epidemiological Cohort Creation on the Machine Learning Prediction of Homelessness and Police Interaction Outcomes Using Administrative Health Care Data
Authors: Faezehsadat Shahidi, M. Ethan MacDonald, Dallas Seitz, Geoffrey Messier
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2307.11211
Pdf link: https://arxiv.org/pdf/2307.11211
Abstract Background: Mental illness can lead to adverse outcomes such as homelessness and police interaction and understanding of the events leading up to these adverse outcomes is important. Predictive models may help identify individuals at risk of such adverse outcomes. Using a fixed observation window cohort with logistic regression (LR) or machine learning (ML) models can result in lower performance when compared with adaptive and parcellated windows. Method: An administrative healthcare dataset was used, comprising of 240,219 individuals in Calgary, Alberta, Canada who were diagnosed with addiction or mental health (AMH) between April 1, 2013, and March 31, 2018. The cohort was followed for 2 years to identify factors associated with homelessness and police interactions. To understand the benefit of flexible windows to predictive models, an alternative cohort was created. Then LR and ML models, including random forests (RF), and extreme gradient boosting (XGBoost) were compared in the two cohorts. Results: Among 237,602 individuals, 0.8% (1,800) experienced first homelessness, while 0.32% (759) reported initial police interaction among 237,141 individuals. Male sex (AORs: H=1.51, P=2.52), substance disorder (AORs: H=3.70, P=2.83), psychiatrist visits (AORs: H=1.44, P=1.49), and drug abuse (AORs: H=2.67, P=1.83) were associated with initial homelessness (H) and police interaction (P). XGBoost showed superior performance using the flexible method (sensitivity =91%, AUC =90% for initial homelessness, and sensitivity =90%, AUC=89% for initial police interaction) Conclusion: This study identified key features associated with initial homelessness and police interaction and demonstrated that flexible windows can improve predictive modeling.
From Adaptive Query Release to Machine Unlearning
Authors: Enayat Ullah, Raman Arora
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.11228
Pdf link: https://arxiv.org/pdf/2307.11228
Abstract We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. We give efficient unlearning algorithms for linear and prefix-sum query classes. As applications, we show that unlearning in many problems, in particular, stochastic convex optimization (SCO), can be reduced to the above, yielding improved guarantees for the problem. In particular, for smooth Lipschitz losses and any $\rho>0$, our results yield an unlearning algorithm with excess population risk of $\tilde O\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\rho}\big)$ with unlearning query (gradient) complexity $\tilde O(\rho \cdot \text{Retraining Complexity})$, where $d$ is the model dimensionality and $n$ is the initial number of samples. For non-smooth Lipschitz losses, we give an unlearning algorithm with excess population risk $\tilde O\big(\frac{1}{\sqrt{n}}+\big(\frac{\sqrt{d}}{n\rho}\big)^{1/2}\big)$ with the same unlearning query (gradient) complexity. Furthermore, in the special case of Generalized Linear Models (GLMs), such as those in linear and logistic regression, we get dimension-independent rates of $\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{2/3}}\big)$ and $\tilde O\big(\frac{1}{\sqrt{n}} +\frac{1}{(n\rho)^{1/3}}\big)$ for smooth Lipschitz and non-smooth Lipschitz losses respectively. Finally, we give generalizations of the above from one unlearning request to \textit{dynamic} streams consisting of insertions and deletions.
Energy-Efficient Softwarized Networks: A Survey
Authors: Iwan Setiawan, Binayak Kar, Shan-Hsiang Shen
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2307.11301
Pdf link: https://arxiv.org/pdf/2307.11301
Abstract With the dynamic demands and stringent requirements of various applications, networks need to be high-performance, scalable, and adaptive to changes. Researchers and industries view network softwarization as the best enabler for the evolution of networking to tackle current and prospective challenges. Network softwarization must provide programmability and flexibility to network infrastructures and allow agile management, along with higher control for operators. While satisfying the demands and requirements of network services, energy cannot be overlooked, considering the effects on the sustainability of the environment and business. This paper discusses energy efficiency in modern and future networks with three network softwarization technologies: SDN, NFV, and NS, introduced in an energy-oriented context. With that framework in mind, we review the literature based on network scenarios, control/MANO layers, and energy-efficiency strategies. Following that, we compare the references regarding approach, evaluation method, criterion, and metric attributes to demonstrate the state-of-the-art. Last, we analyze the classified literature, summarize lessons learned, and present ten essential concerns to open discussions about future research opportunities on energy-efficient softwarized networks.
Neuromorphic Online Learning for Spatiotemporal Patterns with a Forward-only Timeline
Authors: Zhenhang Zhang, Jingang Jin, Haowen Fang, Qinru Qiu
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11314
Pdf link: https://arxiv.org/pdf/2307.11314
Abstract Spiking neural networks (SNNs) are bio-plausible computing models with high energy efficiency. The temporal dynamics of neurons and synapses enable them to detect temporal patterns and generate sequences. While Backpropagation Through Time (BPTT) is traditionally used to train SNNs, it is not suitable for online learning of embedded applications due to its high computation and memory cost as well as extended latency. Previous works have proposed online learning algorithms, but they often utilize highly simplified spiking neuron models without synaptic dynamics and reset feedback, resulting in subpar performance. In this work, we present Spatiotemporal Online Learning for Synaptic Adaptation (SOLSA), specifically designed for online learning of SNNs composed of Leaky Integrate and Fire (LIF) neurons with exponentially decayed synapses and soft reset. The algorithm not only learns the synaptic weight but also adapts the temporal filters associated to the synapses. Compared to the BPTT algorithm, SOLSA has much lower memory requirement and achieves a more balanced temporal workload distribution. Moreover, SOLSA incorporates enhancement techniques such as scheduled weight update, early stop training and adaptive synapse filter, which speed up the convergence and enhance the learning performance. When compared to other non-BPTT based SNN learning, SOLSA demonstrates an average learning accuracy improvement of 14.2%. Furthermore, compared to BPTT, SOLSA achieves a 5% higher average learning accuracy with a 72% reduction in memory cost.
Character Time-series Matching For Robust License Plate Recognition
Authors: Quang Huy Che, Tung Do Thanh, Cuong Truong Van
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.11336
Pdf link: https://arxiv.org/pdf/2307.11336
Abstract Automatic License Plate Recognition (ALPR) is becoming a popular study area and is applied in many fields such as transportation or smart city. However, there are still several limitations when applying many current methods to practical problems due to the variation in real-world situations such as light changes, unclear License Plate (LP) characters, and image quality. Almost recent ALPR algorithms process on a single frame, which reduces accuracy in case of worse image quality. This paper presents methods to improve license plate recognition accuracy by tracking the license plate in multiple frames. First, the Adaptive License Plate Rotation algorithm is applied to correctly align the detected license plate. Second, we propose a method called Character Time-series Matching to recognize license plate characters from many consequence frames. The proposed method archives high performance in the UFPR-ALPR dataset which is \boldmath$96.7\%$ accuracy in real-time on RTX A5000 GPU card. We also deploy the algorithm for the Vietnamese ALPR system. The accuracy for license plate detection and character recognition are 0.881 and 0.979 $mAP^{test}$@.5 respectively. The source code is available at https://github.com/chequanghuy/Character-Time-series-Matching.git
Channel Estimation for RIS-Aided MIMO Systems: A Partially Decoupled Atomic Norm Minimization Approach
Authors: Y. Chu, Z. Wei, Z. Yang, D. W. K. Ng
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.11403
Pdf link: https://arxiv.org/pdf/2307.11403
Abstract Channel estimation (CE) plays a key role in reconfigurable intelligent surface (RIS)-aided multiple-input multiple-output (MIMO) communication systems, while it poses a challenging task due to the passive nature of RIS and the cascaded channel structures. In this paper, a partially decoupled atomic norm minimization (PDANM) framework is proposed for CE of RIS-aided MIMO systems, which exploits the three-dimensional angular sparsity of the channel. In particular, PDANM partially decouples the differential angles at the RIS from other angles at the base station and user equipment, reducing the computational complexity compared with existing methods. A reweighted PDANM (RPDANM) algorithm is proposed to further improve CE accuracy, which iteratively refines CE through a specifically designed reweighing strategy. Building upon RPDANM, we propose an iterative approach named RPDANM with adaptive phase control (RPDANM-APC), which adaptively adjusts the RIS phases based on previously estimated channel parameters to facilitate CE, achieving superior CE accuracy while reducing training overhead. Numerical simulations demonstrate the superiority of our proposed approaches in terms of running time, CE accuracy, and training overhead. In particular, the RPDANM-APC approach can achieve higher CE accuracy than existing methods within less than 40 percent training overhead while reducing the running time by tens of times.
Adaptive ResNet Architecture for Distributed Inference in Resource-Constrained IoT Systems
Authors: Fazeela Mazhar Khan, Emna Baccour, Aiman Erbad, Mounir Hamdi
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.11499
Pdf link: https://arxiv.org/pdf/2307.11499
Abstract As deep neural networks continue to expand and become more complex, most edge devices are unable to handle their extensive processing requirements. Therefore, the concept of distributed inference is essential to distribute the neural network among a cluster of nodes. However, distribution may lead to additional energy consumption and dependency among devices that suffer from unstable transmission rates. Unstable transmission rates harm real-time performance of IoT devices causing low latency, high energy usage, and potential failures. Hence, for dynamic systems, it is necessary to have a resilient DNN with an adaptive architecture that can downsize as per the available resources. This paper presents an empirical study that identifies the connections in ResNet that can be dropped without significantly impacting the model's performance to enable distribution in case of resource shortage. Based on the results, a multi-objective optimization problem is formulated to minimize latency and maximize accuracy as per available resources. Our experiments demonstrate that an adaptive ResNet architecture can reduce shared data, energy consumption, and latency throughout the distribution while maintaining high accuracy.
Data-Driven Cooperative Adaptive Cruise Control for Unknown Nonlinear Vehicle Platoons
Authors: Jianglin Lan
Subjects: Systems and Control (eess.SY); Robotics (cs.RO); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2307.11505
Pdf link: https://arxiv.org/pdf/2307.11505
Abstract This paper studies cooperative adaptive cruise control (CACC) for vehicle platoons with consideration of the unknown nonlinear vehicle dynamics that are normally ignored in the literature. A unified data-driven CACC design is proposed for platoons of pure automated vehicles (AVs) or of mixed AVs and human-driven vehicles (HVs). The CACC leverages online-collected sufficient data samples of vehicle accelerations, spacing and relative velocities. The data-driven control design is formulated as a semidefinite program (SDP) that can be solved efficiently using off-the-shelf solvers. The efficacy and advantage of the proposed CACC are demonstrated through a comparison with the classic adaptive cruise control (ACC) method on a platoon of pure AVs and a mixed platoon under a representative aggressive driving profile.
An Efficient Interior-Point Method for Online Convex Optimization
Authors: Elad Hazan, Nimrod Megiddo
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2307.11668
Pdf link: https://arxiv.org/pdf/2307.11668
Abstract A new algorithm for regret minimization in online convex optimization is described. The regret of the algorithm after $T$ time periods is $O(\sqrt{T \log T})$ - which is the minimum possible up to a logarithmic term. In addition, the new algorithm is adaptive, in the sense that the regret bounds hold not only for the time periods $1,\ldots,T$ but also for every sub-interval $s,s+1,\ldots,t$. The running time of the algorithm matches that of newly introduced interior point algorithms for regret minimization: in $n$-dimensional space, during each iteration the new algorithm essentially solves a system of linear equations of order $n$, rather than solving some constrained convex optimization problem in $n$ dimensions and possibly many constraints.
Fast Adaptive Test-Time Defense with Robust Features
Authors: Anurag Singh, Mahalakshmi Sabanayagam, Krikamol Muandet, Debarghya Ghoshdastidar
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.11672
Pdf link: https://arxiv.org/pdf/2307.11672
Abstract Adaptive test-time defenses are used to improve the robustness of deep neural networks to adversarial examples. However, existing methods significantly increase the inference time due to additional optimization on the model parameters or the input at test time. In this work, we propose a novel adaptive test-time defense strategy that is easy to integrate with any existing (robust) training procedure without additional test-time computation. Based on the notion of robustness of features that we present, the key idea is to project the trained models to the most robust feature space, thereby reducing the vulnerability to adversarial attacks in non-robust directions. We theoretically show that the top eigenspace of the feature matrix are more robust for a generalized additive model and support our argument for a large width neural network with the Neural Tangent Kernel (NTK) equivalence. We conduct extensive experiments on CIFAR-10 and CIFAR-100 datasets for several robustness benchmarks, including the state-of-the-art methods in RobustBench, and observe that the proposed method outperforms existing adaptive test-time defenses at much lower computation costs.
A Reinforcement Learning Framework with Region-Awareness and Shared Path Experience for Efficient Routing in Networks-on-Chip
Authors: Kamil Khan, Sudeep Pasricha
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.11712
Pdf link: https://arxiv.org/pdf/2307.11712
Abstract Network-on-chip (NoC) architectures provide a scalable, high-performance, and reliable interconnect for emerging manycore systems. The routing policies used in NoCs have a significant impact on overall performance. Prior efforts have proposed reinforcement learning (RL)-based adaptive routing policies to avoid congestion and minimize latency in NoCs. The output quality of RL policies depends on selecting a representative cost function and an effective update mechanism. Unfortunately, existing RL policies for NoC routing fail to represent path contention and regional congestion in the cost function. Moreover, the experience of packet flows sharing the same route is not fully incorporated into the RL update mechanism. In this paper, we present a novel regional congestion-aware RL-based NoC routing policy called Q-RASP that is capable of sharing experience from packets using the same routes. Q-RASP improves average packet latency by up to 18.3% and reduces NoC energy consumption by up to 6.7% with minimal area overheads compared to state-of-the-art RL-based NoC routing implementations.
Differentially Private Heavy Hitter Detection using Federated Analytics
Authors: Karan Chadha, Junye Chen, John Duchi, Vitaly Feldman, Hanieh Hashemi, Omid Javidbakht, Audra McMillan, Kunal Talwar
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.11749
Pdf link: https://arxiv.org/pdf/2307.11749
Abstract In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning algorithm that improves the performance of the algorithm while satisfying computational, communication and privacy constraints. We explore the impact of different data-selection schemes as well as the impact of introducing deny lists during multiple runs of the algorithm. We test these improvements using extensive experimentation on the Reddit dataset~\cite{caldas2018leaf} on the task of learning the most frequent words.
Keyword: quantization

There is no result

A-suozhang / GetArxivDaily

New submissions for Mon, 24 Jul 23 #109

Keyword: efficient

Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking

Flatness-Aware Minimization for Domain Generalization

Comparison between transformers and convolutional models for fine-grained classification of insects

Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

Accurate error estimation for model reduction of nonlinear dynamical systems via data-enhanced error closure

SMOTEC: An Edge Computing Testbed for Adaptive Smart Mobility Experimentation

Out-of-Order Sliding-Window Aggregation with Efficient Bulk Evictions and Insertions (Extended Version)

From Adaptive Query Release to Machine Unlearning

Formal-Guided Fuzz Testing: Targeting Security Assurance from Specification to Implementation for 5G and Beyond

GPU-accelerated Parallel Solutions to the Quadratic Assignment Problem

Reconfigurable cascaded thermal neuristors for neuromorphic computing

Kernelized Offline Contextual Dueling Bandits

Energy-Efficient Softwarized Networks: A Survey

Quantum Software Analytics: Opportunities and Challenges

DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

HVDetFusion: A Simple and Robust Camera-Radar Fusion Framework

Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields

Fundamental CRB-Rate Tradeoff in Multi-Antenna ISAC Systems with Information Multicasting and Multi-Target Sensing

Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs

Tuning Pre-trained Model via Moment Probing

Sensing Aided Covert Communications: Turning Interference into Allies

EV-Planner: Energy-Efficient Robot Navigation via Event-Based Physics-Guided Neuromorphic Planner

What can a Single Attention Layer Learn? A Study Through the Random Features Lens

A Fair and Memory/Time-efficient Hashmap

Random Separating Hyperplane Theorem and Learning Polytopes

Towards Better Fairness-Utility Trade-off: A Comprehensive Measurement-Based Reinforcement Learning Framework

Direct and inverse modeling of soft robots by learning a condensed FEM model

Deep Directly-Trained Spiking Neural Networks for Object Detection

A Video-based Detector for Suspicious Activity in Examination with OpenPose

Bidding efficiently in Simultaneous Ascending Auctions with budget and eligibility constraints using Simultaneous Move Monte Carlo Tree Search

On the convergence order of the Euler scheme for scalar SDEs with Hölder-type diffusion coefficients

Data-Driven Cooperative Adaptive Cruise Control for Unknown Nonlinear Vehicle Platoons

CORE: Cooperative Reconstruction for Multi-Agent Perception

Model Reporting for Certifiable AI: A Proposal from Merging EU Regulation into AI Development

Solving Pallet loading Problem with Real-World Constraints

Training Latency Minimization for Model-Splitting Allowed Federated Edge Learning

A reduced basis method for frictional contact problems formulated with Nitsche's method

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

Feature Map Testing for Deep Neural Networks

Subset Sampling and Its Extensions

Transferability of Convolutional Neural Networks in Stationary Learning Tasks

Data-based system representations from irregularly measured data

Robust Fully-Asynchronous Methods for Distributed Training over General Architecture

FEDD -- Fair, Efficient, and Diverse Diffusion-based Lesion Segmentation and Malignancy Classification

Improved Approximate Distance Oracles: Bypassing the Thorup-Zwick Bound in Dense Graphs

JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

GP-Frontier for Local Mapless Navigation

Keyword: faster

Technical Challenges of Deploying Reinforcement Learning Agents for Game Testing in AAA Games

Comparison between transformers and convolutional models for fine-grained classification of insects

GPU-accelerated Parallel Solutions to the Quadratic Assignment Problem

A Fair and Memory/Time-efficient Hashmap

Robust Fully-Asynchronous Methods for Distributed Training over General Architecture

Keyword: mobile

Adapting to Human Preferences to Lead or Follow in Human-Robot Collaboration: A System Evaluation

Underwater 3D positioning on smart devices

Supporting Post-disaster Recovery with Agent-based Modeling in Multilayer Socio-physical Networks

Transferability of Convolutional Neural Networks in Stationary Learning Tasks

Keyword: pruning

FMT: Removing Backdoor Feature Maps via Feature Map Testing in Deep Neural Networks

3D Skeletonization of Complex Grapevines for Robotic Pruning

Keyword: diffusion

Diffusion Sampling with Momentum for Mitigating Divergence Artifacts

QDC: Quantum Diffusion Convolution Kernels on Graphs

DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning

On the convergence order of the Euler scheme for scalar SDEs with Hölder-type diffusion coefficients

Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting

Mixbiotic society measures: Assessment of community well-going as living system

FEDD -- Fair, Efficient, and Diverse Diffusion-based Lesion Segmentation and Malignancy Classification

Keyword: adaptive

CSSL-RHA: Contrastive Self-Supervised Learning for Robust Handwriting Authentication

SMOTEC: An Edge Computing Testbed for Adaptive Smart Mobility Experimentation

The Effect of Epidemiological Cohort Creation on the Machine Learning Prediction of Homelessness and Police Interaction Outcomes Using Administrative Health Care Data

From Adaptive Query Release to Machine Unlearning

Energy-Efficient Softwarized Networks: A Survey