New submissions for Thu, 23 Nov 23

Keyword: efficient

Frequency Analysis with Multiple Kernels and Complete Dictionary

Authors: Cuiyun Lin, Tao Qian
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.12798
Pdf link: https://arxiv.org/pdf/2311.12798
Abstract In signal analysis, among the effort of seeking for efficient representations of a signal into the basic ones of meaningful frequencies, to extract principal frequency components, consecutively one after another or $n$ at one time, is a fundamental strategy. For this goal, we define the concept of mean-frequency and develop the related frequency decomposition with the complete Szeg\"o kernel dictionary, the latter consisting of the multiple kernels, being defined as the parameter-derivatives of the Szeg\"o kernels. Several major energy matching pursuit type sparse representations, including greedy algorithm (GA), orthogonal greedy algorithm (OGA), adaptive Fourier decomposition (AFD), pre-orthogonal adaptive Fourier decomposition (POAFD), $n$-Best approximation and unwinding Blaschke expansion, are analyzed and compared. Of which an order in re-construction efficiency between the mentioned algorithms is given based on detailed study of their respective remainders. The study spells out the natural connections between the multiple kernels and the related Laguerre system, and in particular shows that both, like the Fourier series, extract out the $O(n^{-\sigma})$ order convergence rate from the functions in the Hardy-Sobolev space of order $\sigma >0.$ Existence of the $n$-Best approximation with the complete Szeg\"o dictionary is proved and the related algorithm aspects are discussed. The included experiments form a significant integration part of the study, for they not only illustrate the theoretical results, but also provide cross comparison between various ways of combination between the matching pursuit algorithms and the dictionaries in use. Experiments show that the complete dictionary remarkably improves approximation efficiency.
High-Power and Safe RF Wireless Charging: Cautious Deployment and Operation
Authors: Onel L. A. López, Osmel M. Rosabal, Amirhossein Azarbahram, A. Basit Khattak, Mehdi Monemi, Richard D. Souza, Petar Popovski, Matti Latva-aho
Subjects: Networking and Internet Architecture (cs.NI); Emerging Technologies (cs.ET); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.12809
Pdf link: https://arxiv.org/pdf/2311.12809
Abstract The wired charging and the need for battery replacements are critical barriers to unlimited, scalable, and sustainable mobile connectivity, motivating the interest in radio frequency (RF) wireless power transfer (WPT) technology. However, the inherently low end-to-end power transfer efficiency (PTE) and health/safety-related apprehensions about the technology are critical obstacles. Indeed, RF-WPT implementation and operation require efficient and cautious strategies and protocols, especially when targeting high-power charging, which constitutes the scope of this work. Herein, we overview the main factors affecting the end-to-end PTE of RF-WPT systems and their multiplicative effect and interdependencies. Moreover, we discuss key electromagnetic field (EMF) exposure metrics, safety limits, and approaches for efficient and EMF-aware deployment and operation. Quantitatively, we show that near-field RF charging may significantly reduce EMF exposure, and thus must be promoted. We also present our vision of a cyber-physical system for efficient and safe wireless charging, specify key components and their interrelation, and illustrate numerically the PTE attained by two modern low-power multi-antenna architectures in a simple setup. Throughout the paper, we highlight the need for high end-to-end PTE architectures and charging protocols transparently complying with EMF exposure regulations and outline relevant challenges and research directions. This work expands the vision and understanding of modern RF-WPT technology and constitutes a step towards making the technology attractive for worldwide commercial exploitation.
Evolution of Convolutional Neural Network (CNN): Compute vs Memory bandwidth for Edge AI
Authors: Dwith Chenna
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12816
Pdf link: https://arxiv.org/pdf/2311.12816
Abstract Convolutional Neural Networks (CNNs) have greatly influenced the field of Embedded Vision and Edge Artificial Intelligence (AI), enabling powerful machine learning capabilities on resource-constrained devices. This article explores the relationship between CNN compute requirements and memory bandwidth in the context of Edge AI. We delve into the historical progression of CNN architectures, from the early pioneering models to the current state-of-the-art designs, highlighting the advancements in compute-intensive operations. We examine the impact of increasing model complexity on both computational requirements and memory access patterns. The paper presents a comparison analysis of the evolving trade-off between compute demands and memory bandwidth requirements in CNNs. This analysis provides insights into designing efficient architectures and potential hardware accelerators in enhancing CNN performance on edge devices.
Semantic Face Compression for Metaverse: A Compact 3D Descriptor Based Approach
Authors: Binzhe Li, Bolin Chen, Zhao Wang, Shiqi Wang, Yan Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.12817
Pdf link: https://arxiv.org/pdf/2311.12817
Abstract In this letter, we envision a new metaverse communication paradigm for virtual avatar faces, and develop the semantic face compression with compact 3D facial descriptors. The fundamental principle is that the communication of virtual avatar faces primarily emphasizes the conveyance of semantic information. In light of this, the proposed scheme offers the advantages of being highly flexible, efficient and semantically meaningful. The semantic face compression, which allows the communication of the descriptors for artificial intelligence based understanding, could facilitate numerous applications without the involvement of humans in metaverse. The promise of the proposed paradigm is also demonstrated by performance comparisons with the state-of-the-art video coding standard, Versatile Video Coding. A significant improvement in terms of rate-accuracy performance has been achieved. The proposed scheme is expected to enable numerous applications, such as digital human communication based on machine analysis, and to form the cornerstone of interaction and communication in the metaverse.
Advancing The Rate-Distortion-Computation Frontier For Neural Image Compression
Authors: David Minnen, Nick Johnston
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.12821
Pdf link: https://arxiv.org/pdf/2311.12821
Abstract The rate-distortion performance of neural image compression models has exceeded the state-of-the-art for non-learned codecs, but neural codecs are still far from widespread deployment and adoption. The largest obstacle is having efficient models that are feasible on a wide variety of consumer hardware. Comparative research and evaluation is difficult due to the lack of standard benchmarking platforms and due to variations in hardware architectures and test environments. Through our rate-distortion-computation (RDC) study we demonstrate that neither floating-point operations (FLOPs) nor runtime are sufficient on their own to accurately rank neural compression methods. We also explore the RDC frontier, which leads to a family of model architectures with the best empirical trade-off between computational requirements and RD performance. Finally, we identify a novel neural compression architecture that yields state-of-the-art RD performance with rate savings of 23.1% over BPG (7.0% over VTM and 3.0% over ELIC) without requiring significantly more FLOPs than other learning-based codecs.
EWasteNet: A Two-Stream Data Efficient Image Transformer Approach for E-Waste Classification
Authors: Niful Islam, Md. Mehedi Hasan Jony, Emam Hasan, Sunny Sutradhar, Atikur Rahman, Md. Motaharul Islam
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.12823
Pdf link: https://arxiv.org/pdf/2311.12823
Abstract Improper disposal of e-waste poses global environmental and health risks, raising serious concerns. The accurate classification of e-waste images is critical for efficient management and recycling. In this paper, we have presented a comprehensive dataset comprised of eight different classes of images of electronic devices named the E-Waste Vision Dataset. We have also presented EWasteNet, a novel two-stream approach for precise e-waste image classification based on a data-efficient image transformer (DeiT). The first stream of EWasteNet passes through a sobel operator that detects the edges while the second stream is directed through an Atrous Spatial Pyramid Pooling and attention block where multi-scale contextual information is captured. We train both of the streams simultaneously and their features are merged at the decision level. The DeiT is used as the backbone of both streams. Extensive analysis of the e-waste dataset indicates the usefulness of our method, providing 96% accuracy in e-waste classification. The proposed approach demonstrates significant usefulness in addressing the global concern of e-waste management. It facilitates efficient waste management and recycling by accurately classifying e-waste images, reducing health and safety hazards associated with improper disposal.
A PSO Based Method to Generate Actionable Counterfactuals for High Dimensional Data
Authors: Shashank Shekhar, Asif Salim, Adesh Bansode, Vivaswan Jinturkar, Anirudha Nayak
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2311.12825
Pdf link: https://arxiv.org/pdf/2311.12825
Abstract Counterfactual explanations (CFE) are methods that explain a machine learning model by giving an alternate class prediction of a data point with some minimal changes in its features. It helps the users to identify their data attributes that caused an undesirable prediction like a loan or credit card rejection. We describe an efficient and an actionable counterfactual (CF) generation method based on particle swarm optimization (PSO). We propose a simple objective function for the optimization of the instance-centric CF generation problem. The PSO brings in a lot of flexibility in terms of carrying out multi-objective optimization in large dimensions, capability for multiple CF generation, and setting box constraints or immutability of data attributes. An algorithm is proposed that incorporates these features and it enables greater control over the proximity and sparsity properties over the generated CFs. The proposed algorithm is evaluated with a set of action-ability metrics in real-world datasets, and the results were superior compared to that of the state-of-the-arts.
ECNR: Efficient Compressive Neural Representation of Time-Varying Volumetric Datasets
Authors: Kaiyuan Tang, Chaoli Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12831
Pdf link: https://arxiv.org/pdf/2311.12831
Abstract Due to its conceptual simplicity and generality, compressive neural representation has emerged as a promising alternative to traditional compression methods for managing massive volumetric datasets. The state-of-the-art neural compression solution, neurcomp, however, utilizes a single large multilayer perceptron (MLP) to encode the global volume, incurring slow training and inference. This paper presents an efficient compressive neural representation (ECNR) solution that improves upon neurcomp to handle large-scale time-varying datasets. At the heart of our approach is a multiscale structure that uses the Laplacian pyramid for adaptive signal fitting via implicit neural representation. We leverage multiple small MLPs at each scale for fitting local content or residual blocks. By assigning similar blocks to the same MLP via size uniformization, we enable balanced parallelization among MLPs to significantly speed up training and inference. A deep compression strategy is then employed to compact the resulting model. We demonstrate the effectiveness of ECNR with multiple datasets and compare it with neurcomp and two state-of-the-art conventional compression methods (SZ3 and TTHRESH). Our results position ECNR as a promising alternative to neurcomp for scientific data compression.
A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling
Authors: Amjad Yousef Majid, Eduard Marin
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12839
Pdf link: https://arxiv.org/pdf/2311.12839
Abstract In the rapidly evolving field of serverless computing, efficient function scheduling and resource scaling are critical for optimizing performance and cost. This paper presents a comprehensive review of the application of Deep Reinforcement Learning (DRL) techniques in these areas. We begin by providing an overview of serverless computing, highlighting its benefits and challenges, with a particular focus on function scheduling and resource scaling. We then delve into the principles of deep reinforcement learning (DRL) and its potential for addressing these challenges. A systematic review of recent studies applying DRL to serverless computing is presented, covering various algorithms, models, and performances. Our analysis reveals that DRL, with its ability to learn and adapt from an environment, shows promising results in improving the efficiency of function scheduling and resource scaling in serverless computing. However, several challenges remain, including the need for more realistic simulation environments, handling of cold starts, and the trade-off between learning time and scheduling performance. We conclude by discussing potential future directions for this research area, emphasizing the need for more robust DRL models, better benchmarking methods, and the exploration of multi-agent reinforcement learning for more complex serverless architectures. This review serves as a valuable resource for researchers and practitioners aiming to understand and advance the application of DRL in serverless computing.
Wafer Map Defect Patterns Semi-Supervised Classification Using Latent Vector Representation
Authors: Qiyu Wei, Wei Zhao, Xiaoyan Zheng, Zeng Zeng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.12840
Pdf link: https://arxiv.org/pdf/2311.12840
Abstract As the globalization of semiconductor design and manufacturing processes continues, the demand for defect detection during integrated circuit fabrication stages is becoming increasingly critical, playing a significant role in enhancing the yield of semiconductor products. Traditional wafer map defect pattern detection methods involve manual inspection using electron microscopes to collect sample images, which are then assessed by experts for defects. This approach is labor-intensive and inefficient. Consequently, there is a pressing need to develop a model capable of automatically detecting defects as an alternative to manual operations. In this paper, we propose a method that initially employs a pre-trained VAE model to obtain the fault distribution information of the wafer map. This information serves as guidance, combined with the original image set for semi-supervised model training. During the semi-supervised training, we utilize a teacher-student network for iterative learning. The model presented in this paper is validated on the benchmark dataset WM-811K wafer dataset. The experimental results demonstrate superior classification accuracy and detection performance compared to state-of-the-art models, fulfilling the requirements for industrial applications. Compared to the original architecture, we have achieved significant performance improvement.
A Novel Defocus-Blur Region Detection Approach Based on DCT Feature and PCNN Structure
Authors: Sadia Basar, Mushtaq Ali, Abdul Waheed, Muneer Ahmad, Mahdi H. Miraz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.12845
Pdf link: https://arxiv.org/pdf/2311.12845
Abstract The motion or out-of-focus effect in digital images is the main reason for the blurred regions in defocused-blurred images. It may adversely affect various image features such as texture, pixel, and region. Therefore, it is important to detect in-focused objects in defocused-blurred images after the segmentation of blurred and non-blurred regions. The state-of-the-art techniques are prone to noisy pixels, and their local descriptors for developing segmentation metrics are also complex. To address these issues, this research, therefore, proposed a novel and hybrid-focused detection approach based on Discrete Cosine Transform (DCT) coefficients and PC Neural Net (PCNN) structure. The proposed approach partially resolves the limitations of the existing contrast schemes to detect in-focused smooth objects from the out-of-focused smooth regions in the defocus dataset. The visual and quantitative evaluation illustrates that the proposed approach outperformed in terms of accuracy and efficiency to referenced algorithms. The highest F-score of the proposed approach on Zhao's dataset is 0.7940 whereas on Shi's dataset is 0.9178.
Meticulously Selecting 1% of the Dataset for Pre-training! Generating Differentially Private Images Data with Semantics Query
Authors: Kecen Li, Chen Gong, Zhixiang Li, Yuzhong Zhao, Xinwen Hou, Tianhao Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12850
Pdf link: https://arxiv.org/pdf/2311.12850
Abstract Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous methods incorporate the advanced techniques of generative models and pre-training on a public dataset to produce exceptional DP image data, but suffer from problems of unstable training and massive computational resource demands. This paper proposes a novel DP image synthesis method, termed PRIVIMAGE, which meticulously selects pre-training data, promoting the efficient creation of DP datasets with high fidelity and utility. PRIVIMAGE first establishes a semantic query function using a public dataset. Then, this function assists in querying the semantic distribution of the sensitive dataset, facilitating the selection of data from the public dataset with analogous semantics for pre-training. Finally, we pre-train an image generative model using the selected data and then fine-tune this model on the sensitive dataset using Differentially Private Stochastic Gradient Descent (DP-SGD). PRIVIMAGE allows us to train a lightly parameterized generative model, reducing the noise in the gradient during DP-SGD training and enhancing training stability. Extensive experiments demonstrate that PRIVIMAGE uses only 1% of the public dataset for pre-training and 7.6% of the parameters in the generative model compared to the state-of-the-art method, whereas achieves superior synthetic performance and conserves more computational resources. On average, PRIVIMAGE achieves 30.1% lower FID and 12.6% higher Classification Accuracy than the state-of-the-art method. The replication package and datasets can be accessed online.
A versatile circuit for emulating active biological dendrites applied to sound localisation and neuron imitation
Authors: Daniel John Mannion
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.12861
Pdf link: https://arxiv.org/pdf/2311.12861
Abstract Sophisticated machine learning struggles to transition onto battery-operated devices due to the high-power consumption of neural networks. Researchers have turned to neuromorphic engineering, inspired by biological neural networks, for more efficient solutions. While previous research focused on artificial neurons and synapses, an essential component has been overlooked: dendrites. Dendrites transmit inputs from synapses to the neuron's soma, applying both passive and active transformations. However, neuromorphic circuits replace these sophisticated computational channels with metallic interconnects. In this study, we introduce a versatile circuit that emulates a segment of a dendrite which exhibits gain, introduces delays, and performs integration. We show how sound localisation - a biological example of dendritic computation - is not possible with the existing passive dendrite circuits but can be achieved using this proposed circuit. We also find that dendrites can form bursting neurons. This significant discovery suggests the potential to fabricate neural networks solely comprised of dendrite circuits.
TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs
Authors: Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2311.12862
Pdf link: https://arxiv.org/pdf/2311.12862
Abstract Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-performance kernels are required. Existing GPU libraries offer two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is easy to implement but not optimal in performance, while the dataflows with overlapped computation and memory access (e.g.implicit GEMM) are highly performant but have very high engineering costs. In this paper, we introduce TorchSparse++, a new GPU library that achieves the best of both worlds. We create a highly efficient Sparse Kernel Generator that generates performant sparse convolution kernels at less than one-tenth of the engineering cost of the current state-of-the-art system. On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads. Consequently, TorchSparse++ achieves 2.9x, 3.3x, 2.2x and 1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3x faster than SpConv v2 in mixed precision training across seven representative autonomous driving benchmarks. It also seamlessly supports graph convolutions, achieving 2.6-7.6x faster inference speed compared with state-of-the-art graph deep learning libraries.
The Case for Universal Basic Computing Power
Authors: Yue Zhu
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12872
Pdf link: https://arxiv.org/pdf/2311.12872
Abstract The Universal Basic Computing Power (UBCP) initiative ensures global, free access to a set amount of computing power specifically for AI research and development (R&D). This initiative comprises three key elements. First, UBCP must be cost free, with its usage limited to AI R&D and minimal additional conditions. Second, UBCP should continually incorporate the state of the art AI advancements, including efficiently distilled, compressed, and deployed training data, foundational models, benchmarks, and governance tools. Lastly, it's essential for UBCP to be universally accessible, ensuring convenience for all users. We urge major stakeholders in AI development large platforms, open source contributors, and policymakers to prioritize the UBCP initiative.
A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs
Authors: Jiageng Zhong, Ming Li, Yinliang Chen, Zihang Wei, Fan Yang, Haoran Shen
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.12893
Pdf link: https://arxiv.org/pdf/2311.12893
Abstract For intelligent quadcopter UAVs, a robust and reliable autonomous planning system is crucial. Most current trajectory planning methods for UAVs are suitable for static environments but struggle to handle dynamic obstacles, which can pose challenges and even dangers to flight. To address this issue, this paper proposes a vision-based planning system that combines tracking and trajectory prediction of dynamic obstacles to achieve efficient and reliable autonomous flight. We use a lightweight object detection algorithm to identify dynamic obstacles and then use Kalman Filtering to track and estimate their motion states. During the planning phase, we not only consider static obstacles but also account for the potential movements of dynamic obstacles. For trajectory generation, we use a B-spline-based trajectory search algorithm, which is further optimized with various constraints to enhance safety and alignment with the UAV's motion characteristics. We conduct experiments in both simulation and real-world environments, and the results indicate that our approach can successfully detect and avoid obstacles in dynamic environments in real-time, offering greater reliability compared to existing approaches. Furthermore, with the advancements in Natural Language Processing (NLP) technology demonstrating exceptional zero-shot generalization capabilities, more user-friendly human-machine interactions have become feasible, and this study also explores the integration of autonomous planning systems with Large Language Models (LLMs).
Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale Fine-Grained Image Retrieval
Authors: Xiu-Shen Wei, Yang Shen, Xuhao Sun, Peng Wang, Yuxin Peng
Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2311.12894
Pdf link: https://arxiv.org/pdf/2311.12894
Abstract Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images depicting the concept of interests (i.e., the same sub-category labels) highest based on the fine-grained details in the query. It is desirable to alleviate the challenges of both fine-grained nature of small inter-class variations with large intra-class variations and explosive growth of fine-grained data for such a practical task. In this paper, we propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes to not only make the retrieval process efficient, but also establish explicit correspondences between hash codes and visual attributes. Specifically, based on the captured visual representations by attention, we develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors from the appearance-specific visual representations without attribute annotations. Our models are also equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities. Then, driven by preserving original entities' similarity, the required hash codes can be generated from these attribute-specific vectors and thus become attribute-aware. Furthermore, to combat simplicity bias in deep hashing, we consider the model design from the perspective of the self-consistency principle and propose to further enhance models' self-consistency by equipping an additional image reconstruction path. Comprehensive quantitative experiments under diverse empirical settings on six fine-grained retrieval datasets and two generic retrieval datasets show the superiority of our models over competing methods.
An Efficient 3D Gaussian Representation for Monocular/Multi-view Dynamic Scenes
Authors: Kai Katsumata, Duc Minh Vo, Hideki Nakayama
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.12897
Pdf link: https://arxiv.org/pdf/2311.12897
Abstract In novel view synthesis of scenes from multiple input views, 3D Gaussian splatting emerges as a viable alternative to existing radiance field approaches, delivering great visual quality and real-time rendering. While successful in static scenes, the present advancement of 3D Gaussian representation, however, faces challenges in dynamic scenes in terms of memory consumption and the need for numerous observations per time step, due to the onus of storing 3D Gaussian parameters per time step. In this study, we present an efficient 3D Gaussian representation tailored for dynamic scenes in which we define positions and rotations as functions of time while leaving other time-invariant properties of the static 3D Gaussian unchanged. Notably, our representation reduces memory usage, which is consistent regardless of the input sequence length. Additionally, it mitigates the risk of overfitting observed frames by accounting for temporal changes. The optimization of our Gaussian representation based on image and flow reconstruction results in a powerful framework for dynamic scene view synthesis in both monocular and multi-view cases. We obtain the highest rendering speed of $118$ frames per second (FPS) at a resolution of $1352 \times 1014$ with a single GPU, showing the practical usability and effectiveness of our proposed method in dynamic scene rendering scenarios.
Local Convolution Enhanced Global Fourier Neural Operator For Multiscale Dynamic Spaces Prediction
Authors: Xuanle Zhao, Yue Sun, Tielin Zhang, Bo Xu
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.12902
Pdf link: https://arxiv.org/pdf/2311.12902
Abstract Neural operators extend the capabilities of traditional neural networks by allowing them to handle mappings between function spaces for the purpose of solving partial differential equations (PDEs). One of the most notable methods is the Fourier Neural Operator (FNO), which is inspired by Green's function method and approximate operator kernel directly in the frequency domain. In this work, we focus on predicting multiscale dynamic spaces, which is equivalent to solving multiscale PDEs. Multiscale PDEs are characterized by rapid coefficient changes and solution space oscillations, which are crucial for modeling atmospheric convection and ocean circulation. To solve this problem, models should have the ability to capture rapid changes and process them at various scales. However, the FNO only approximates kernels in the low-frequency domain, which is insufficient when solving multiscale PDEs. To address this challenge, we propose a novel hierarchical neural operator that integrates improved Fourier layers with attention mechanisms, aiming to capture all details and handle them at various scales. These mechanisms complement each other in the frequency domain and encourage the model to solve multiscale problems. We perform experiments on dynamic spaces governed by forward and reverse problems of multiscale elliptic equations, Navier-Stokes equations and some other physical scenarios, and reach superior performance in existing PDE benchmarks, especially equations characterized by rapid coefficient variations.
Q-Seg: Quantum Annealing-based Unsupervised Image Segmentation
Authors: Supreeth Mysore Venkatesh, Antonio Macaluso, Marlon Nuske, Matthias Klusch, Andreas Dengel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2311.12912
Pdf link: https://arxiv.org/pdf/2311.12912
Abstract In this study, we present Q-Seg, a novel unsupervised image segmentation method based on quantum annealing, tailored for existing quantum hardware. We formulate the pixel-wise segmentation problem, which assimilates spectral and spatial information of the image, as a graph-cut optimization task. Our method efficiently leverages the interconnected qubit topology of the D-Wave Advantage device, offering superior scalability over existing quantum approaches and outperforming state-of-the-art classical methods. Our empirical evaluations on synthetic datasets reveal that Q-Seg offers better runtime performance against the classical optimizer Gurobi. Furthermore, we evaluate our method on segmentation of Earth Observation images, an area of application where the amount of labeled data is usually very limited. In this case, Q-Seg demonstrates near-optimal results in flood mapping detection with respect to classical supervised state-of-the-art machine learning methods. Also, Q-Seg provides enhanced segmentation for forest coverage compared to existing annotated masks. Thus, Q-Seg emerges as a viable alternative for real-world applications using available quantum hardware, particularly in scenarios where the lack of labeled data and computational runtime are critical.
DroneOptiNet: A Framework for Optimal Drone-based Load Redistribution Mechanism for 5G and Beyond Solar Small Cell Networks
Authors: Daksh Dave, Vinay Chamola, Sandeep Joshi, Sherali Zeadally
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2311.12944
Pdf link: https://arxiv.org/pdf/2311.12944
Abstract The power requirements posed by the fifth-generation and beyond cellular networks are an important constraint in network deployment and require energy-efficient solutions. In this work, we propose a novel user load transfer approach using airborne base stations (BS), mounted on drones, for reliable and secure power redistribution across the micro-grid network comprising green small cell BSs. Depending on the user density and the availability of an aerial BS, the energy requirement of a cell with an energy deficit is accommodated by migrating the aerial BS from a high-energy to a low-energy cell. The proposed hybrid drone-based framework integrates long short-term memory with unique cost functions using an evolutionary neural network for drones and BSs, and efficiently manages energy and load redistribution. The proposed algorithm reduces power outages at BSs and maintains consistent throughput stability, thereby demonstrating its capability to boost the reliability and robustness of wireless communication systems.
PINNs-Based Uncertainty Quantification for Transient Stability Analysis
Authors: Ren Wang, Ming Zhong, Kaidi Xu, Lola Giráldez Sánchez-Cortés, Ignacio de Cominges Guerra
Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.12947
Pdf link: https://arxiv.org/pdf/2311.12947
Abstract This paper addresses the challenge of transient stability in power systems with missing parameters and uncertainty propagation in swing equations. We introduce a novel application of Physics-Informed Neural Networks (PINNs), specifically an Ensemble of PINNs (E-PINNs), to estimate critical parameters like rotor angle and inertia coefficient with enhanced accuracy and reduced computational load. E-PINNs capitalize on the underlying physical principles of swing equations to provide a robust solution. Our approach not only facilitates efficient parameter estimation but also quantifies uncertainties, delivering probabilistic insights into the system behavior. The efficacy of E-PINNs is demonstrated through the analysis of $1$-bus and $2$-bus systems, highlighting the model's ability to handle parameter variability and data scarcity. The study advances the application of machine learning in power system stability, paving the way for reliable and computationally efficient transient stability analysis.
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection
Authors: Ahmed Sharshar, Aleksandr Matsun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12956
Pdf link: https://arxiv.org/pdf/2311.12956
Abstract In the realm of aerial image analysis, object detection plays a pivotal role, with significant implications for areas such as remote sensing, urban planning, and disaster management. This study addresses the inherent challenges in this domain, notably the detection of small objects, managing densely packed elements, and accounting for diverse orientations. We present an in-depth evaluation of an object detection model that integrates the Large Selective Kernel Network (LSKNet)as its backbone with the DiffusionDet head, utilizing the iSAID dataset for empirical analysis. Our approach encompasses the introduction of novel methodologies and extensive ablation studies. These studies critically assess various aspects such as loss functions, box regression techniques, and classification strategies to refine the model's precision in object detection. The paper details the experimental application of the LSKNet backbone in synergy with the DiffusionDet heads, a combination tailored to meet the specific challenges in aerial image object detection. The findings of this research indicate a substantial enhancement in the model's performance, especially in the accuracy-time tradeoff. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement, outperforming the RCNN model by 4.7% on the same dataset. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis, paving the way for more accurate and efficient object detection methodologies. The code is publicly available at https://github.com/SashaMatsun/LSKDiffDet
Terrestrial-Satellite Spectrum Sharing in the Upper Mid-Band with Interference Nulling
Authors: Seongjoon Kang, Giovanni Geraci, Marco Mezzavilla, Sundeep Rangan
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.12965
Pdf link: https://arxiv.org/pdf/2311.12965
Abstract The growing demand for broader bandwidth in cellular networks has turned the upper mid-band (7-24 GHz) into a focal point for expansion. However, the integration of terrestrial cellular and incumbent satellite services, particularly in the 12 GHz band, poses significant interference challenges. This paper investigates the interference dynamics in terrestrial-satellite coexistence scenarios and introduces a novel beamforming approach that leverages available ephemeris data for dynamic interference mitigation. By establishing spatial radiation nulls directed towards visible satellites, our technique ensures the protection of satellite uplink communications without markedly compromising terrestrial downlink quality. Through a practical case study, we demonstrate that our approach maintains the satellite uplink signal-to-noise ratio (SNR) degradation under 1 dB and incurs a median SNR penalty of only 0.1 dB for the terrestrial downlink. Our findings offer a promising pathway for efficient spectrum sharing in the upper mid-band, fostering a concurrent enhancement in both terrestrial and satellite network capacity.
Robustifying Generalizable Implicit Shape Networks with a Tunable Non-Parametric Model
Authors: Amine Ouasfi, Adnane Boukhayma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.12967
Pdf link: https://arxiv.org/pdf/2311.12967
Abstract Feedforward generalizable models for implicit shape reconstruction from unoriented point cloud present multiple advantages, including high performance and inference speed. However, they still suffer from generalization issues, ranging from underfitting the input point cloud, to misrepresenting samples outside of the training data distribution, or with toplogies unseen at training. We propose here an efficient mechanism to remedy some of these limitations at test time. We combine the inter-shape data prior of the network with an intra-shape regularization prior of a Nystr\"om Kernel Ridge Regression, that we further adapt by fitting its hyperprameters to the current shape. The resulting shape function defined in a shape specific Reproducing Kernel Hilbert Space benefits from desirable stability and efficiency properties and grants a shape adaptive expressiveness-robustness trade-off. We demonstrate the improvement obtained through our method with respect to baselines and the state-of-the-art using synthetic and real data.
Volatility and irregularity Capturing in stock price indices using time series Generative adversarial networks (TimeGAN)
Authors: Leonard Mushunje, David Allen, Shelton Peiris
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2311.12987
Pdf link: https://arxiv.org/pdf/2311.12987
Abstract This paper captures irregularities in financial time series data, particularly stock prices, in the presence of COVID-19 shock. We conjectured that jumps and irregularities are embedded in stock data due to the pandemic shock, which brings forth irregular trends in the time series data. We put forward that efficient and robust forecasting methods are needed to predict stock closing prices in the presence of the pandemic shock. This piece of information is helpful to investors as far as confidence risk and return boost are concerned. Generative adversarial networks of a time series nature are used to provide new ways of modeling and learning the proper and suitable distribution for the financial time series data under complex setups. Ideally, these traditional models are liable to producing high forecasting errors, and they need to be more robust to capture dependency structures and other stylized facts like volatility in stock markets. The TimeGAN model is used, effectively dealing with this risk of poor forecasts. Using the DAX stock index from January 2010 to November 2022, we trained the LSTM, GRU, WGAN, and TimeGAN models as benchmarks and forecasting errors were noted, and our TimeGAN outperformed them all as indicated by a small forecasting error.
Fourier pseudospectral methods for the spatial variable-order fractional wave equations
Authors: Shiping Zhou, Xiaofei Zhao, Yanzhi Zhang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13049
Pdf link: https://arxiv.org/pdf/2311.13049
Abstract In this paper, we propose Fourier pseudospectral methods to solve variable-order fractional viscoacoustic wave equations. Our approach involves a Fourier pseudospectral method for spatial discretization and an accelerated matrix-free technique for efficient computation and storage costs, with a computational cost of $\mathcal{O}(MN\log N)$ and storage cost $\mathcal{O}(MN)$ where $M\ll N$. For temporal discretization, we employ the Crank-Nicolson, leap-frog, and time-splitting schemes. Numerical experiments are conducted to assess their performance. The results demonstrate the advantages of our fast method, particularly in computational and storage costs, and its feasibility in high dimensions. The numerical findings reveal that all three temporal discretization methods exhibit second-order accuracy, while the Fourier pseudospectral spatial discretization showcases spectral accuracy.
Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and Optimization
Authors: James Kotary, Vincenzo Di Vito, Jacob Christopher, Pascal Van Hentenryck, Ferdinando Fioretto
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13087
Pdf link: https://arxiv.org/pdf/2311.13087
Abstract Many real-world decision processes are modeled by optimization problems whose defining parameters are unknown and must be inferred from observable data. The Predict-Then-Optimize framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving. Recent works show that decision quality can be improved in this setting by solving and differentiating the optimization problem in the training loop, enabling end-to-end training with loss functions defined directly on the resulting decisions. However, this approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step. This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models. The approach is generic, and based on an adaptation of the Learning-to-Optimize paradigm, from which a rich variety of existing techniques can be employed. Experimental evaluations show the ability of several Learning-to-Optimize methods to provide efficient, accurate, and flexible solutions to an array of challenging Predict-Then-Optimize problems.
AC Power Flow Informed Parameter Learning for DC Power Flow Network Equivalents
Authors: Babak Taheri, Daniel K. Molzahn
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13104
Pdf link: https://arxiv.org/pdf/2311.13104
Abstract This paper presents an algorithm to optimize the parameters of power systems equivalents to enhance the accuracy of the DC power flow approximation in reduced networks. Based on a zonal division of the network, the algorithm produces a reduced power system equivalent that captures inter-zonal flows with aggregated buses and equivalent transmission lines. The algorithm refines coefficient and bias parameters for the DC power flow model of the reduced network, aiming to minimize discrepancies between inter-zonal flows in DC and AC power flow results. Using optimization methods like BFGS, L-BFGS, and TNC in an offline training phase, these parameters boost the accuracy of online DC power flow computations. In contrast to existing network equivalencing methods, the proposed algorithm optimizes accuracy over a specified range of operation as opposed to only considering a single nominal point. Numerical tests demonstrate substantial accuracy improvements over traditional equivalencing and approximation methods.
Towards Better Parameter-Efficient Fine-Tuning for Large Language Models: A Position Paper
Authors: Chengyu Wang, Junbing Yan, Wei Zhang, Jun Huang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.13126
Pdf link: https://arxiv.org/pdf/2311.13126
Abstract This paper delves into the pressing need in Parameter-Efficient Fine-Tuning (PEFT) for Large Language Models (LLMs). While LLMs possess remarkable capabilities, their extensive parameter requirements and associated computational demands hinder their practicality and scalability for real-world applications. Our position paper highlights current states and the necessity of further studying into the topic, and recognizes significant challenges and open issues that must be addressed to fully harness the powerful abilities of LLMs. These challenges encompass novel efficient PEFT architectures, PEFT for different learning settings, PEFT combined with model compression techniques, and the exploration of PEFT for multi-modal LLMs. By presenting this position paper, we aim to stimulate further research and foster discussions surrounding more efficient and accessible PEFT for LLMs.
Enhancing Microgrid Resilience with Green Hydrogen Storage
Authors: Shreshtha Dhankar, Cong Chen, Lang Tong
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13149
Pdf link: https://arxiv.org/pdf/2311.13149
Abstract We consider the problem of hydrogen storage integration in microgrids to improve the electricity supply resilience. Nonlinear effects from electrochemical models of electrolyzers and fuel cells for hydrogen storage are considered, making scheduling under the nonlinear model intractable and the conventional linear approximation infeasible. A piecewise linear model approximation with feasibility projection is proposed, resulting in a computationally efficient model predictive control for hydrogen storage operation. Several resilience performance measures, such as loss-of-load, duration-of-outage, and system cost, are used in performance evaluation. Simulations for the proposed optimization demonstrated a 13%-48% reduction in duration-of-outage, a 6.4%-21.7% reduction in system cost, and a 95% reduction in loss-of-load for critical loads compared to the scheduling algorithm involving linear model approximation. The performance gap of the proposed optimization to the benchmark involving the accurate nonlinear electrochemical model is less than 1% in most metrics.
Testing Closeness of Multivariate Distributions via Ramsey Theory
Authors: Ilias Diakonikolas, Daniel M. Kane, Sihan Liu
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13154
Pdf link: https://arxiv.org/pdf/2311.13154
Abstract We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions. Specifically, given sample access to two unknown distributions $\mathbf p, \mathbf q$ on $\mathbb R^d$, we want to distinguish between the case that $\mathbf p=\mathbf q$ versus $|\mathbf p-\mathbf q|_{Ak} > \epsilon$, where $|\mathbf p-\mathbf q|{A_k}$ denotes the generalized ${A}_k$ distance between $\mathbf p$ and $\mathbf q$ -- measuring the maximum discrepancy between the distributions over any collection of $k$ disjoint, axis-aligned rectangles. Our main result is the first closeness tester for this problem with {\em sub-learning} sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound. In more detail, we provide a computationally efficient closeness tester with sample complexity $O\left((k^{6/7}/ \mathrm{poly}_d(\epsilon)) \log^d(k)\right)$. On the lower bound side, we establish a qualitatively matching sample complexity lower bound of $\Omega(k^{6/7}/\mathrm{poly}(\epsilon))$, even for $d=2$. These sample complexity bounds are surprising because the sample complexity of the problem in the univariate setting is $\Theta(k^{4/5}/\mathrm{poly}(\epsilon))$. This has the interesting consequence that the jump from one to two dimensions leads to a substantial increase in sample complexity, while increases beyond that do not. As a corollary of our general $Ak$ tester, we obtain $d{\mathrm TV}$-closeness testers for pairs of $k$-histograms on $\mathbb R^d$ over a common unknown partition, and pairs of uniform distributions supported on the union of $k$ unknown disjoint axis-aligned rectangles. Both our algorithm and our lower bound make essential use of tools from Ramsey theory.
Top-$L$ Most Influential Community Detection Over Social Networks (Technical Report)
Authors: Nan Zhang, Yutong Ye, Xiang Lian, Mingsong Chen
Subjects: Social and Information Networks (cs.SI); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2311.13162
Pdf link: https://arxiv.org/pdf/2311.13162
Abstract In many real-world applications such as social network analysis and online marketing/advertising, the \textit{community detection} is a fundamental task to identify communities (subgraphs) in social networks with high structural cohesiveness. While previous works focus on detecting communities alone, they do not consider the collective influences of users in these communities on other user nodes in social networks. Inspired by this, in this paper, we investigate the influence propagation from some \textit{seed communities} and their influential effects that result in the \textit{influenced communities}. We propose a novel problem, named \textit{\underline{Top-$L$} most \underline{I}nfluential \underline{C}ommunity \underline{DE}tection} (Top$L$-ICDE) over social networks, which aims to retrieve top-$L$ seed communities with the highest influences, having high structural cohesiveness, and containing user-specified query keywords. In order to efficiently tackle the Top$L$-ICDE problem, we design effective pruning strategies to filter out false alarms of seed communities and propose an effective index mechanism to facilitate efficient Top-$L$ community retrieval. We develop an efficient Top$L$-ICDE answering algorithm by traversing the index and applying our proposed pruning strategies. We also formulate and tackle a variant of Top$L$-ICDE, named \textit{diversified top-$L$ most influential community detection} (DTop$L$-ICDE), which returns a set of $L$ diversified communities with the highest diversity score (i.e., collaborative influences by $L$ communities). We prove that DTop$L$-ICDE is NP-hard, and propose an efficient greedy algorithm with our designed diversity score pruning. Through extensive experiments, we verify the efficiency and effectiveness of our proposed Top$L$-ICDE and DTop$L$-ICDE approaches over real/synthetic social networks under various parameter settings.
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
Authors: Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.13171
Pdf link: https://arxiv.org/pdf/2311.13171
Abstract Parameter-efficient fine-tuning (PEFT) techniques make it possible to efficiently adapt a language model to create "expert" models that specialize to new tasks or domains. Recent techniques in model merging and compositional generalization leverage these expert models by dynamically composing modules to improve zero/few-shot generalization. Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU. To address these issues, we present ComPEFT, a novel method for compressing fine-tuning residuals (task vectors) of PEFT based models. ComPEFT employs sparsification and ternary quantization to reduce the size of the PEFT module without performing any additional retraining while preserving or enhancing model performance. In extensive evaluation across T5, T0, and LLaMA-based models with 200M - 65B parameters, ComPEFT achieves compression ratios of 8x - 50x. In particular, we show that ComPEFT improves with scale - stronger models exhibit higher compressibility and better performance. For example, we show that ComPEFT applied to LLaMA outperforms QLoRA by 4.16% on MMLU with a storage size reduction of up to 26x. In addition, we show that the compressed experts produced by ComPEFT maintain few-shot compositional generalization capabilities, facilitate efficient communication and computation, and exhibit enhanced performance when merged. Lastly, we provide an analysis of different method components, compare it with other PEFT methods, and test ComPEFT's efficacy for compressing the residual of full-finetuning. Our code is available at https://github.com/prateeky2806/compeft.
NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Authors: Chenhao Li, Taishi Ono, Takeshi Uemori, Hajime Mihara, Alexander Gatto, Hajime Nagahara, Yuseke Moriuchi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13187
Pdf link: https://arxiv.org/pdf/2311.13187
Abstract Multi-view inverse rendering is the problem of estimating the scene parameters such as shapes, materials, or illuminations from a sequence of images captured under different viewpoints. Many approaches, however, assume single light bounce and thus fail to recover challenging scenarios like inter-reflections. On the other hand, simply extending those methods to consider multi-bounced light requires more assumptions to alleviate the ambiguity. To address this problem, we propose Neural Incident Stokes Fields (NeISF), a multi-view inverse rendering framework that reduces ambiguities using polarization cues. The primary motivation for using polarization cues is that it is the accumulation of multi-bounced light, providing rich information about geometry and material. Based on this knowledge, the proposed incident Stokes field efficiently models the accumulated polarization effect with the aid of an original physically-based differentiable polarimetric renderer. Lastly, experimental results show that our method outperforms the existing works in synthetic and real scenarios.
Optimal trajectory planning meets network-level routing: Integrated control framework for emerging mobility systems
Authors: Heeseung Bang, Andreas A. Malikopoulos
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13193
Pdf link: https://arxiv.org/pdf/2311.13193
Abstract In this paper, we introduce a hierarchical decision-making framework for emerging mobility systems. Despite numerous studies focusing on optimizing vehicle flow, practical feasibility has often been overlooked. To address this gap, we present a route-recovery method and energy-optimal trajectory planning tailored for connected and automated vehicles (CAVs) to ensure the realization of optimal flow. Our approach identifies the optimal vehicle flow to minimize total travel time while considering consistent mobility demands in urban settings. We deploy a heuristic route-recovery algorithm that assigns routes to CAVs and departure/arrival time at each road segment. Furthermore, we propose an efficient coordination method that rapidly solves constrained optimization problems by flexibly piecing together unconstrained energy-optimal trajectories. The proposed method has the potential to effectively generate optimal vehicle flow, contributing to the reduction of travel time and energy consumption in urban areas.
Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models
Authors: Xiyu Qi, Yifan Wu, Yongqiang Mao, Wenhui Zhang, Yidan Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13200
Pdf link: https://arxiv.org/pdf/2311.13200
Abstract The Segment Anything Model (SAM) exhibits remarkable versatility and zero-shot learning abilities, owing largely to its extensive training data (SA-1B). Recognizing SAM's dependency on manual guidance given its category-agnostic nature, we identified unexplored potential within few-shot semantic segmentation tasks for remote sensing imagery. This research introduces a structured framework designed for the automation of few-shot semantic segmentation. It utilizes the SAM model and facilitates a more efficient generation of semantically discernible segmentation outcomes. Central to our methodology is a novel automatic prompt learning approach, leveraging prior guided masks to produce coarse pixel-wise prompts for SAM. Extensive experiments on the DLRSD datasets underline the superiority of our approach, outperforming other available few-shot methodologies.
Towards Detecting, Recognizing, and Parsing the Address Information from Bangla Signboard: A Deep Learning-based Approach
Authors: Hasan Murad, Mohammed Eunus Ali
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13222
Pdf link: https://arxiv.org/pdf/2311.13222
Abstract Retrieving textual information from natural scene images is an active research area in the field of computer vision with numerous practical applications. Detecting text regions and extracting text from signboards is a challenging problem due to special characteristics like reflecting lights, uneven illumination, or shadows found in real-life natural scene images. With the advent of deep learning-based methods, different sophisticated techniques have been proposed for text detection and text recognition from the natural scene. Though a significant amount of effort has been devoted to extracting natural scene text for resourceful languages like English, little has been done for low-resource languages like Bangla. In this research work, we have proposed an end-to-end system with deep learning-based models for efficiently detecting, recognizing, correcting, and parsing address information from Bangla signboards. We have created manually annotated datasets and synthetic datasets to train signboard detection, address text detection, address text recognition, address text correction, and address text parser models. We have conducted a comparative study among different CTC-based and Encoder-Decoder model architectures for Bangla address text recognition. Moreover, we have designed a novel address text correction model using a sequence-to-sequence transformer-based network to improve the performance of Bangla address text recognition model by post-correction. Finally, we have developed a Bangla address text parser using the state-of-the-art transformer-based pre-trained language model.
NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments
Authors: Xin Ai, Qiange Wang, Chunyu Cao, Yanfeng Zhang, Chaoyi Chen, Hao Yuan, Yu Gu, Ge Yu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13225
Pdf link: https://arxiv.org/pdf/2311.13225
Abstract Graph Neural Networks (GNNs) have demonstrated outstanding performance in various applications. Existing frameworks utilize CPU-GPU heterogeneous environments to train GNN models and integrate mini-batch and sampling techniques to overcome the GPU memory limitation. In CPU-GPU heterogeneous environments, we can divide sample-based GNN training into three steps: sample, gather, and train. Existing GNN systems use different task orchestrating methods to employ each step on CPU or GPU. After extensive experiments and analysis, we find that existing task orchestrating methods fail to fully utilize the heterogeneous resources, limited by inefficient CPU processing or GPU resource contention. In this paper, we propose NeutronOrch, a system for sample-based GNN training that incorporates a layer-based task orchestrating method and ensures balanced utilization of the CPU and GPU. NeutronOrch decouples the training process by layer and pushes down the training task of the bottom layer to the CPU. This significantly reduces the computational load and memory footprint of GPU training. To avoid inefficient CPU processing, NeutronOrch only offloads the training of frequently accessed vertices to the CPU and lets GPU reuse their embeddings with bounded staleness. Furthermore, NeutronOrch provides a fine-grained pipeline design for the layer-based task orchestrating method, fully overlapping different tasks on heterogeneous resources while strictly guaranteeing bounded staleness. The experimental results show that compared with the state-of-the-art GNN systems, NeutronOrch can achieve up to 4.61x performance speedup.
Robot at the Mirror: Learning to Imitate via Associating Self-supervised Models
Authors: Andrej Lúčny, Kristína Malinovská, Igor Farkaš
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13226
Pdf link: https://arxiv.org/pdf/2311.13226
Abstract We introduce an approach to building a custom model from ready-made self-supervised models via their associating instead of training and fine-tuning. We demonstrate it with an example of a humanoid robot looking at the mirror and learning to detect the 3D pose of its own body from the image it perceives. To build our model, we first obtain features from the visual input and the postures of the robot's body via models prepared before the robot's operation. Then, we map their corresponding latent spaces by a sample-efficient robot's self-exploration at the mirror. In this way, the robot builds the solicited 3D pose detector, which quality is immediately perfect on the acquired samples instead of obtaining the quality gradually. The mapping, which employs associating the pairs of feature vectors, is then implemented in the same way as the key-value mechanism of the famous transformer models. Finally, deploying our model for imitation to a simulated robot allows us to study, tune up, and systematically evaluate its hyperparameters without the involvement of the human counterpart, advancing our previous research.
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Authors: Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, Luoyi Fu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13230
Pdf link: https://arxiv.org/pdf/2311.13230
Abstract Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. However, LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations in many real-world applications. Existing works for detecting hallucinations in LLMs either rely on external knowledge for reference retrieval or require sampling multiple responses from the LLM for consistency verification, making these methods costly and inefficient. In this paper, we propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs. Our approach imitates human focus in factuality checking from three aspects: 1) focus on the most informative and important keywords in the given text; 2) focus on the unreliable tokens in historical context which may lead to a cascade of hallucinations; and 3) focus on the token properties such as token type and token frequency. Experimental results on relevant datasets demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance across all the evaluation metrics and eliminates the need for additional information.
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Authors: Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, Xiu Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13231
Pdf link: https://arxiv.org/pdf/2311.13231
Abstract Using reinforcement learning with human feedback (RLHF) has shown significant promise in fine-tuning diffusion models. Previous methods start by training a reward model that aligns with human preferences, then leverage RL techniques to fine-tune the underlying models. However, crafting an efficient reward model demands extensive datasets, optimal architecture, and manual hyperparameter tuning, making the process both time and cost-intensive. The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model. However, the extensive GPU memory requirement of the diffusion model's denoising process hinders the direct application of the DPO method. To address this issue, we introduce the Direct Preference for Denoising Diffusion Policy Optimization (D3PO) method to directly fine-tune diffusion models. The theoretical analysis demonstrates that although D3PO omits training a reward model, it effectively functions as the optimal reward model trained using human feedback data to guide the learning process. This approach requires no training of a reward model, proving to be more direct, cost-effective, and minimizing computational overhead. In experiments, our method uses the relative scale of objectives as a proxy for human preference, delivering comparable results to methods using ground-truth rewards. Moreover, D3PO demonstrates the ability to reduce image distortion rates and generate safer images, overcoming challenges lacking robust reward models.
A model-free approach to fingertip slip and disturbance detection for grasp stability inference
Authors: Dounia Kitouni (ISIR), Mahdi Khoramshahi (ISIR), Veronique Perdereau (ISIR)
Subjects: Robotics (cs.RO); Signal Processing (eess.SP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13245
Pdf link: https://arxiv.org/pdf/2311.13245
Abstract Robotic capacities in object manipulation are incomparable to those of humans. Besides years of learning, humans rely heavily on the richness of information from physical interaction with the environment. In particular, tactile sensing is crucial in providing such rich feedback. Despite its potential contributions to robotic manipulation, tactile sensing is less exploited; mainly due to the complexity of the time series provided by tactile sensors. In this work, we propose a method for assessing grasp stability using tactile sensing. More specifically, we propose a methodology to extract task-relevant features and design efficient classifiers to detect object slippage with respect to individual fingertips. We compare two classification models: support vector machine and logistic regression. We use highly sensitive Uskin tactile sensors mounted on an Allegro hand to test and validate our method. Our results demonstrate that the proposed method is effective in slippage detection in an online fashion.
An $hp$-adaptive strategy based on locally predicted error reductions
Authors: Patrick Bammer, Andreas Schröder, Thomas P. Wihler
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13255
Pdf link: https://arxiv.org/pdf/2311.13255
Abstract We introduce a new $hp$-adaptive strategy for self-adjoint elliptic boundary value problems that does not rely on using classical a posteriori error estimators. Instead, our approach is based on a generally applicable prediction strategy for the reduction of the energy error that can be expressed in terms of local modifications of the degrees of freedom in the underlying discrete approximation space. The computations related to the proposed prediction strategy involve low-dimensional linear problems that are computationally inexpensive and highly parallelizable. The mathematical building blocks for this new concept are first developed on an abstract Hilbert space level, before they are employed within the specific context of $hp$-type finite element discretizations. For this particular framework, we discuss an explicit construction of $p$-enrichments and $hp$-refinements by means of an appropriate constraint coefficient technique that can be employed in any dimensions. The applicability and effectiveness of the resulting $hp$-adaptive strategy is illustrated with some $1$- and $2$-dimensional numerical examples.
Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective
Authors: Hao Yuan, Yajiong Liu, Yanfeng Zhang, Xin Ai, Qiange Wang, Chaoyi Chen, Yu Gu, Ge Yu
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.13279
Pdf link: https://arxiv.org/pdf/2311.13279
Abstract Many Graph Neural Network (GNN) training systems have emerged recently to support efficient GNN training. Since GNNs embody complex data dependencies between training samples, the training of GNNs should address distinct challenges different from DNN training in data management, such as data partitioning, batch preparation for mini-batch training, and data transferring between CPUs and GPUs. These factors, which take up a large proportion of training time, make data management in GNN training more significant. This paper reviews GNN training from a data management perspective and provides a comprehensive analysis and evaluation of the representative approaches. We conduct extensive experiments on various benchmark datasets and show many interesting and valuable results. We also provide some practical tips learned from these experiments, which are helpful for designing GNN training systems in the future.
Softmax Acceleration with Adaptive Numeric Format for both Training and Inference
Authors: Tianhua Xia, Sai Qian Zhang
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2311.13290
Pdf link: https://arxiv.org/pdf/2311.13290
Abstract The attention mechanism is a pivotal element within the Transformer architecture, making a substantial contribution to its exceptional performance. Within this attention mechanism, Softmax is an imperative component that enables the model to assess the degree of correlation between various segments of the input. Yet, prior research has shown that Softmax operations can significantly increase processing latency and energy consumption in the Transformer network due to their internal nonlinear operations and data dependencies. In this work, we proposed~\textit{Hyft}, a hardware efficient floating point Softmax accelerator for both training and inference. Hyft aims to reduce the implementation cost of different nonlinear arithmetic operations by adaptively converting intermediate results into the most suitable numeric format for each specific operation, leading to reconfigurable accelerator with hybrid numeric format. The evaluation results highlight that Hyft achieves a remarkable $15\times$ reduction in hardware resource utilization and a $20 \times$ reduction in processing latency, all while maintaining a negligible impact on Transformer accuracy.
Probabilistic Inference in Reinforcement Learning Done Right
Authors: Jean Tarbouriech, Tor Lattimore, Brendan O'Donoghue
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13294
Pdf link: https://arxiv.org/pdf/2311.13294
Abstract A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.
On the parallel solution of hydro-mechanical problems with fracture networks and contact conditions
Authors: Jan Stebel, Jakub Kružík, David Horák, Jan Březina, Michal Béreš
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13310
Pdf link: https://arxiv.org/pdf/2311.13310
Abstract The paper presents a numerical method for the simulation of flow and mechanics in fractured rock. The governing equations which couple the effects in the rock mass and in the fractures are obtained using the discrete fracture-matrix approach. The fracture flow is driven by the cubic law, and the non-penetration contact conditions prevent fractures from closing. A stable finite element discretization is proposed for the displacement-pressure-flux formulation. The resulting nonlinear algebraic system of equations and inequalities is decoupled using a robust iterative splitting into the linearized flow subproblem, and the quadratic programming problem for the mechanical part. The non-penetration conditions are solved by means of the MPGP algorithm. The capability of the numerical scheme is demonstrated on a benchmark problem for borehole excavation with hundreds of fractures in 3D. The paper's novelty consists in combination of three crucial ingredients: (i) application of discrete fracture matrix approach, (ii) robust iterative splitting of resulting nonlinear algebraic system working for real-world 3D problems and (iii) efficient solution of its mechanical quadratic programming part with large number of fractures in mutual contact by means of own solvers with known rate of convergence implemented into in-house PERMON library.
Timely and Efficient Information Delivery in Real-Time Industrial IoT Networks
Authors: Hossam Farag, Dejan Vukobratovic, Andrea Munari, Cedomir Stefanovic
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.13329
Pdf link: https://arxiv.org/pdf/2311.13329
Abstract Enabling real-time communication in Industrial Internet of Things (IIoT) networks is crucial to support autonomous, self-organized and re-configurable industrial automation for Industry 4.0 and the forthcoming Industry 5.0. In this paper, we consider a SIC-assisted real-time IIoT network, in which sensor nodes generate reports according to an event-generation probability that is specific for the monitored phenomena. The reports are delivered over a block-fading channel to a common Access Point (AP) in slotted ALOHA fashion, which leverages the imbalances in the received powers among the contending users and applies successive interference cancellation (SIC) to decode user packets from the collisions. We provide an extensive analytical treatment of the setup, deriving the Age of Information (AoI), throughput and deadline violation probability, when the AP has access to both the perfect as well as the imperfect channel-state information. We show that adopting SIC improves all the performance parameters with respect to the standard slotted ALOHA, as well as to an age-dependent access method. The analytical results agree with the simulation based ones, demonstrating that investing in the SIC capability at the receiver enables this simple access method to support timely and efficient information delivery in IIoT networks.
Automated generation of attack trees with optimal shape and labelling
Authors: Olga Gadyatskaya, Sjouke Mauw, Rolando Trujillo-Rasuac, Tim A. C.Willemse
Subjects: Cryptography and Security (cs.CR); Formal Languages and Automata Theory (cs.FL)
Arxiv link: https://arxiv.org/abs/2311.13331
Pdf link: https://arxiv.org/pdf/2311.13331
Abstract The problem this article addresses is, given a formal specification of a system, how to produce an attack tree that correctly and clearly describes the ways the system can be attacked. Correctness means that the attacks displayed by the attack tree are indeed attacks in the system; clarity means that the tree is efficient in communicating the attack scenario. To pursue clarity, we introduce an attack-tree generation algorithm that minimises the tree size and the information length of its labels without sacrificing correctness. We achieve this by establishing a connection between the problem of factorising algebraic expressions and the problem of minimising the tree size. Notably, our generation algorithm can handle complex attacks that execute actions in parallel and sequentially. For completeness, we introduce a system model that integrates well with our generation approach, and validate the resulting framework via a running example.
REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints
Authors: Francesco Corti, Balz Maag, Joachim Schauer, Ulrich Pferschy, Olga Saukh
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13349
Pdf link: https://arxiv.org/pdf/2311.13349
Abstract Deep models deployed on edge devices frequently encounter resource variability, which arises from fluctuating energy levels, timing constraints, or prioritization of other critical tasks within the system. State-of-the-art machine learning pipelines generate resource-agnostic models, not capable to adapt at runtime. In this work we introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. In contrast to the state-of-the-art, REDS use structured sparsity constructively by exploiting permutation invariance of neurons, which allows for hardware-specific optimizations. Specifically, REDS achieve computational efficiency by (1) skipping sequential computational blocks identified by a novel iterative knapsack optimizer, and (2) leveraging simple math to re-arrange the order of operations in REDS computational graph to take advantage of the data cache. REDS support conventional deep networks frequently deployed on the edge and provide computational benefits even for small and simple networks. We evaluate REDS on six benchmark architectures trained on the Google Speech Commands, FMNIST and CIFAR10 datasets, and test on four off-the-shelf mobile and embedded hardware platforms. We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy, and demonstrate an adaptation time in response to dynamic resource constraints of under 40$\mu$s, utilizing a 2-layer fully-connected network on Arduino Nano 33 BLE Sense.
Gradual Verification for Smart Contracts
Authors: Haojia Sun, Kunal Singh, Jan-Paul Ramos-Dávila, Jonathan Aldrich, Jenna DiVincenzo
Subjects: Cryptography and Security (cs.CR); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2311.13351
Pdf link: https://arxiv.org/pdf/2311.13351
Abstract Blockchains facilitate secure resource transactions through smart contracts, yet these digital agreements are prone to vulnerabilities, particularly when interacting with external contracts, leading to substantial monetary losses. Traditional verification techniques fall short in providing comprehensive security assurances, especially against re-entrancy attacks, due to the unavailable implementations of external contracts. This paper introduces an incremental approach: gradual verification. We combine static and dynamic verification techniques to enhance security, guarantee soundness and flexibility, and optimize resource usage in smart contract interactions. By implementing a prototype for gradually verifying Algorand smart contracts via the pyTEAL language, we demonstrate the effectiveness of our approach, contributing to the safe and efficient execution of smart contracts.
Deriving Comprehensible Theories from Probabilistic Circuits
Authors: Sieben Bocklandt, Wannes Meert, Koen Vanderstraeten, Wouter Pijpops, Kurt Jaspers
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13379
Pdf link: https://arxiv.org/pdf/2311.13379
Abstract The field of Explainable AI (XAI) is seeking to shed light on the inner workings of complex AI models and uncover the rationale behind their decisions. One of the models gaining attention are probabilistic circuits (PCs), which are a general and unified framework for tractable probabilistic models that support efficient computation of various probabilistic queries. Probabilistic circuits guarantee inference that is polynomial in the size of the circuit. In this paper, we improve the explainability of probabilistic circuits by computing a comprehensible, readable logical theory that covers the high-density regions generated by a PC. To achieve this, pruning approaches based on generative significance are used in a new method called PUTPUT (Probabilistic circuit Understanding Through Pruning Underlying logical Theories). The method is applied to a real world use case where music playlists are automatically generated and expressed as readable (database) queries. Evaluation shows that this approach can effectively produce a comprehensible logical theory that describes the high-density regions of a PC and outperforms state of the art methods when exploring the performance-comprehensibility trade-off.
Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training
Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Jiming Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.13381
Pdf link: https://arxiv.org/pdf/2311.13381
Abstract Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device's memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings.
Simultaneous uniqueness and numerical inversion for an inverse problem in the time-domain diffuse optical tomography with fluorescence
Authors: Zhiyuan Li, Chunlong Sun
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2311.13391
Pdf link: https://arxiv.org/pdf/2311.13391
Abstract In this work, an inverse problem on the determination of multiple coefficients arising from the time-domain diffuse optical tomography with fluorescence (DOT-FDOT) is investigated. We simultaneously recover the distribution of background absorption coefficient, photon diffusion coefficient as well as the fluorescence absorption in biological tissue by the time-dependent boundary measurements. We build the uniqueness theorem of this multiple coefficients simultaneous inverse problem. After that, the numerical inversions are considered. We introduce an accelerated Landweber iterative algorithm and give several numerical examples illustrating the performance of the proposed inversion schemes.
A Comparative Analysis Between SciTokens, Verifiable Credentials, and Smart Contracts: Novel Approaches for Authentication and Secure Access to Scientific Data
Authors: Md Jobair Hossain Faruk, Bilash Saha, Jim Basney
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.13422
Pdf link: https://arxiv.org/pdf/2311.13422
Abstract Managing and exchanging sensitive information securely is a paramount concern for the scientific and cybersecurity community. The increasing reliance on computing workflows and digital data transactions requires ensuring that sensitive information is protected from unauthorized access, tampering, or misuse. This research paper presents a comparative analysis of three novel approaches for authenticating and securing access to scientific data: SciTokens, Verifiable Credentials, and Smart Contracts. The aim of this study is to investigate the strengths and weaknesses of each approach from trust, revocation, privacy, and security perspectives. We examine the technical features and privacy and security mechanisms of each technology and provide a comparative synthesis with the proposed model. Through our analysis, we demonstrate that each technology offers unique advantages and limitations, and the integration of these technologies can lead to more secure and efficient solutions for authentication and access to scientific data.
Learning-Based Relaxation of Completeness Requirements for Data Entry Forms
Authors: Hichem Belgacem, Xiaochen Li, Domenico Bianculli, Lionel C. Briand
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.13517
Pdf link: https://arxiv.org/pdf/2311.13517
Abstract Data entry forms use completeness requirements to specify the fields that are required or optional to fill for collecting necessary information from different types of users. However, some required fields may not be applicable for certain types of users anymore. Nevertheless, they may still be incorrectly marked as required in the form; we call such fields obsolete required fields. Since obsolete required fields usually have not-null validation checks before submitting the form, users have to enter meaningless values in such fields in order to complete the form submission. These meaningless values threaten the quality of the filled data. To avoid users filling meaningless values, existing techniques usually rely on manually written rules to identify the obsolete required fields and relax their completeness requirements. However, these techniques are ineffective and costly. In this paper, we propose LACQUER, a learning-based automated approach for relaxing the completeness requirements of data entry forms. LACQUER builds Bayesian Network models to automatically learn conditions under which users had to fill meaningless values. To improve its learning ability, LACQUER identifies the cases where a required field is only applicable for a small group of users, and uses SMOTE, an oversampling technique, to generate more instances on such fields for effectively mining dependencies on them. Our experimental results show that LACQUER can accurately relax the completeness requirements of required fields in data entry forms with precision values ranging between 0.76 and 0.90 on different datasets. LACQUER can prevent users from filling 20% to 64% of meaningless values, with negative predictive values between 0.72 and 0.91. Furthermore, LACQUER is efficient; it takes at most 839 ms to predict the completeness requirement of an instance.
Outerplanar and Forest Storyplans
Authors: Jiří Fiala, Oksana Firman, Giuseppe Liotta, Alexander Wolff, Johannes Zink
Subjects: Computational Geometry (cs.CG); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2311.13523
Pdf link: https://arxiv.org/pdf/2311.13523
Abstract We study the problem of gradually representing a complex graph as a sequence of drawings of small subgraphs whose union is the complex graph. The sequence of drawings is called \emph{storyplan}, and each drawing in the sequence is called a \emph{frame}. In an outerplanar storyplan, every frame is outerplanar; in a forest storyplan, every frame is acyclic. We identify graph families that admit such storyplans and families for which such storyplans do not always exist. In the affirmative case, we present efficient algorithms that produce straight-line storyplans.
Leveraging CNNs and Ensemble Learning for Automated Disaster Image Classification
Authors: Archit Rathod, Veer Pariawala, Mokshit Surana, Kumkum Saxena
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13531
Pdf link: https://arxiv.org/pdf/2311.13531
Abstract Natural disasters act as a serious threat globally, requiring effective and efficient disaster management and recovery. This paper focuses on classifying natural disaster images using Convolutional Neural Networks (CNNs). Multiple CNN architectures were built and trained on a dataset containing images of earthquakes, floods, wildfires, and volcanoes. A stacked CNN ensemble approach proved to be the most effective, achieving 95% accuracy and an F1 score going up to 0.96 for individual classes. Tuning hyperparameters of individual models for optimization was critical to maximize the models' performance. The stacking of CNNs with XGBoost acting as the meta-model utilizes the strengths of the CNN and ResNet models to improve the overall accuracy of the classification. Results obtained from the models illustrated the potency of CNN-based models for automated disaster image classification. This lays the foundation for expanding these techniques to build robust systems for disaster response, damage assessment, and recovery management.
Medical Image Retrieval Using Pretrained Embeddings
Authors: Farnaz Khun Jush, Tuan Truong, Steffen Vogler, Matthias Lenga
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13547
Pdf link: https://arxiv.org/pdf/2311.13547
Abstract A wide range of imaging techniques and data formats available for medical images make accurate retrieval from image databases challenging. Efficient retrieval systems are crucial in advancing medical research, enabling large-scale studies and innovative diagnostic tools. Thus, addressing the challenges of medical image retrieval is essential for the continued enhancement of healthcare and research. In this study, we evaluated the feasibility of employing four state-of-the-art pretrained models for medical image retrieval at modality, body region, and organ levels and compared the results of two similarity indexing approaches. Since the employed networks take 2D images, we analyzed the impacts of weighting and sampling strategies to incorporate 3D information during retrieval of 3D volumes. We showed that medical image retrieval is feasible using pretrained networks without any additional training or fine-tuning steps. Using pretrained embeddings, we achieved a recall of 1 for various tasks at modality, body region, and organ level.
Transfer Learning-based Real-time Handgun Detection
Authors: Youssef Elmir, Sid Ahmed Laouar, Larbi Hamdaoui
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2311.13559
Pdf link: https://arxiv.org/pdf/2311.13559
Abstract Traditional surveillance systems rely on human attention, limiting their effectiveness. This study employs convolutional neural networks and transfer learning to develop a real-time computer vision system for automatic handgun detection. Comprehensive analysis of online handgun detection methods is conducted, emphasizing reducing false positives and learning time. Transfer learning is demonstrated as an effective approach. Despite technical challenges, the proposed system achieves a precision rate of 84.74%, demonstrating promising performance comparable to related works, enabling faster learning and accurate automatic handgun detection for enhanced security. This research advances security measures by reducing human monitoring dependence, showcasing the potential of transfer learning-based approaches for efficient and reliable handgun detection.
Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies
Authors: Shabnam Daghaghi, Benjamin Coleman, Benito Geordie, Anshumali Shrivastava
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13583
Pdf link: https://arxiv.org/pdf/2311.13583
Abstract Data sampling is an effective method to improve the training speed of neural networks, with recent results demonstrating that it can even break the neural scaling laws. These results critically rely on high-quality scores to estimate the importance of an input to the network. We observe that there are two dominant strategies: static sampling, where the scores are determined before training, and dynamic sampling, where the scores can depend on the model weights. Static algorithms are computationally inexpensive but less effective than their dynamic counterparts, which can cause end-to-end slowdown due to their need to explicitly compute losses. To address this problem, we propose a novel sampling distribution based on nonparametric kernel regression that learns an effective importance score as the neural network trains. However, nonparametric regression models are too computationally expensive to accelerate end-to-end training. Therefore, we develop an efficient sketch-based approximation to the Nadaraya-Watson estimator. Using recent techniques from high-dimensional statistics and randomized algorithms, we prove that our Nadaraya-Watson sketch approximates the estimator with exponential convergence guarantees. Our sampling algorithm outperforms the baseline in terms of wall-clock time and accuracy on four datasets.
A Survey of Serverless Machine Learning Model Inference
Authors: Kamil Kojs
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13587
Pdf link: https://arxiv.org/pdf/2311.13587
Abstract Recent developments in Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This widespread adoption of AI requires significant efforts in deploying these models in production environments. When hosting machine learning models for real-time predictions, it is important to meet defined Service Level Objectives (SLOs), ensuring reliability, minimal downtime, and optimizing operational costs of the underlying infrastructure. Large machine learning models often demand GPU resources for efficient inference to meet SLOs. In the context of these trends, there is growing interest in hosting AI models in a serverless architecture while still providing GPU access for inference tasks. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems. By providing a novel taxonomy and summarizing recent trends, we hope that this survey could shed light on new optimization perspectives and motivate novel works in large-scale deep learning serving systems.
Risk-sensitive Markov Decision Process and Learning under General Utility Functions
Authors: Zhengqi Wu, Renyuan Xu
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.13589
Pdf link: https://arxiv.org/pdf/2311.13589
Abstract Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to maximize the expected cumulative reward. However, in practical scenarios such as portfolio management and e-commerce recommendations, decision-makers often persist in heterogeneous risk preferences subject to outcome uncertainties, which can not be well-captured by the risk-neural framework. Incorporating these preferences can be approached through utility theory, yet the development of risk-sensitive RL under general utility functions remains an open question for theoretical exploration. In this paper, we consider a scenario where the decision-maker seeks to optimize a general utility function of the cumulative reward in the framework of a Markov decision process (MDP). To facilitate the Dynamic Programming Principle and Bellman equation, we enlarge the state space with an additional dimension that accounts for the cumulative reward. We propose a discretized approximation scheme to the MDP under enlarged state space, which is tractable and key for algorithmic design. We then propose a modified value iteration algorithm that employs an epsilon-covering over the space of cumulative reward. When a simulator is accessible, our algorithm efficiently learns a near-optimal policy with guaranteed sample complexity. In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy while ensuring a guaranteed regret bound. For both algorithms, we match the theoretical lower bounds for the risk-neutral setting.
Triangle-free $2$-matchings
Authors: Katarzyna Paluch
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2311.13590
Pdf link: https://arxiv.org/pdf/2311.13590
Abstract We consider the problem of finding a maximum size triangle-free $2$-matching in a graph $G$. A $2$-matching is any subset of the edges such that each vertex is incident to at most two edges from the subset. We present a fast combinatorial algorithm for the problem. Our algorithm and its analysis are dramatically simpler than the very complicated result by Hartvigsen from 1984. In the design of this algorithm we use several new concepts. It has been proven before that for any triangle-free $2$-matching $M$ which is not maximum the graph contains an $M$-augmenting path, whose application to $M$ results in a bigger triangle-free $2$-matching. It was not known how to efficiently find such a path. A new observation is that the search for an augmenting path $P$ can be restricted to so-called {\em amenable} paths that go through any triangle $t$ contained in $P \cup M$ a limited number of times. To find an augmenting path that is amenable and hence whose application does not create any triangle we forbid some edges to be followed by certain others. This operation can be thought of as using gadgets, in which some pairs of edges get disconnected. To be able to disconnect two edges we employ {\em half-edges}. A {\em half-edge} of edge $e$ is, informally speaking, a half of $e$ containing exactly one of its endpoints. This is another novel application of half-edges that were already been used for TSP and other matching problems. Additionally, gadgets are not fixed during any augmentation phase, but are dynamically changing according to the currently discovered state of reachability by amenable paths.
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Authors: Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13600
Pdf link: https://arxiv.org/pdf/2311.13600
Abstract Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. While recent work explores the combination of separate LoRAs to achieve joint generation of learned styles and subjects, existing techniques do not reliably address the problem; they often compromise either subject fidelity or style fidelity. We propose ZipLoRA, a method to cheaply and effectively merge independently trained style and subject LoRAs in order to achieve generation of any user-provided subject in any user-provided style. Experiments on a wide range of subject and style combinations show that ZipLoRA can generate compelling results with meaningful improvements over baselines in subject and style fidelity while preserving the ability to recontextualize. Project page: https://ziplora.github.io
Keyword: faster

Proposing an intelligent mesh smoothing method with graph neural networks
Authors: Zhichao Wang, Xinhai Chen, Junjun Yan, Jie Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.12815
Pdf link: https://arxiv.org/pdf/2311.12815
Abstract In CFD, mesh smoothing methods are commonly utilized to refine the mesh quality to achieve high-precision numerical simulations. Specifically, optimization-based smoothing is used for high-quality mesh smoothing, but it incurs significant computational overhead. Pioneer works improve its smoothing efficiency by adopting supervised learning to learn smoothing methods from high-quality meshes. However, they pose difficulty in smoothing the mesh nodes with varying degrees and also need data augmentation to address the node input sequence problem. Additionally, the required labeled high-quality meshes further limit the applicability of the proposed method. In this paper, we present GMSNet, a lightweight neural network model for intelligent mesh smoothing. GMSNet adopts graph neural networks to extract features of the node's neighbors and output the optimal node position. During smoothing, we also introduce a fault-tolerance mechanism to prevent GMSNet from generating negative volume elements. With a lightweight model, GMSNet can effectively smoothing mesh nodes with varying degrees and remain unaffected by the order of input data. A novel loss function, MetricLoss, is also developed to eliminate the need for high-quality meshes, which provides a stable and rapid convergence during training. We compare GMSNet with commonly used mesh smoothing methods on two-dimensional triangle meshes. The experimental results show that GMSNet achieves outstanding mesh smoothing performances with 5% model parameters of the previous model, and attains 8.62 times faster than optimization-based smoothing.
TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs
Authors: Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2311.12862
Pdf link: https://arxiv.org/pdf/2311.12862
Abstract Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-performance kernels are required. Existing GPU libraries offer two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is easy to implement but not optimal in performance, while the dataflows with overlapped computation and memory access (e.g.implicit GEMM) are highly performant but have very high engineering costs. In this paper, we introduce TorchSparse++, a new GPU library that achieves the best of both worlds. We create a highly efficient Sparse Kernel Generator that generates performant sparse convolution kernels at less than one-tenth of the engineering cost of the current state-of-the-art system. On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads. Consequently, TorchSparse++ achieves 2.9x, 3.3x, 2.2x and 1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3x faster than SpConv v2 in mixed precision training across seven representative autonomous driving benchmarks. It also seamlessly supports graph convolutions, achieving 2.6-7.6x faster inference speed compared with state-of-the-art graph deep learning libraries.
Fast and Interpretable Mortality Risk Scores for Critical Care Patients
Authors: Chloe Qinyu Zhu, Muhang Tian, Lesia Semenova, Jiachang Liu, Jack Xu, Joseph Scarpa, Cynthia Rudin
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2311.13015
Pdf link: https://arxiv.org/pdf/2311.13015
Abstract Prediction of mortality in intensive care unit (ICU) patients is an important task in critical care medicine. Prior work in creating mortality risk models falls into two major categories: domain-expert-created scoring systems, and black box machine learning (ML) models. Both of these have disadvantages: black box models are unacceptable for use in hospitals, whereas manual creation of models (including hand-tuning of logistic regression parameters) relies on humans to perform high-dimensional constrained optimization, which leads to a loss in performance. In this work, we bridge the gap between accurate black box models and hand-tuned interpretable models. We build on modern interpretable ML techniques to design accurate and interpretable mortality risk scores. We leverage the largest existing public ICU monitoring datasets, namely the MIMIC III and eICU datasets. By evaluating risk across medical centers, we are able to study generalization across domains. In order to customize our risk score models, we develop a new algorithm, GroupFasterRisk, which has several important benefits: (1) it uses hard sparsity constraint, allowing users to directly control the number of features; (2) it incorporates group sparsity to allow more cohesive models; (3) it allows for monotonicity correction on models for including domain knowledge; (4) it produces many equally-good models at once, which allows domain experts to choose among them. GroupFasterRisk creates its risk scores within hours, even on the large datasets we study here. GroupFasterRisk's risk scores perform better than risk scores currently used in hospitals, and have similar prediction performance to black box ML models (despite being much sparser). Because GroupFasterRisk produces a variety of risk scores and handles constraints, it allows design flexibility, which is the key enabler of practical and trustworthy model creation.
Optimal Transport with Cyclic Symmetry
Authors: Shoichiro Takeda, Yasunori Akagi, Naoki Marumo, Kenta Niwa
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13147
Pdf link: https://arxiv.org/pdf/2311.13147
Abstract We propose novel fast algorithms for optimal transport (OT) utilizing a cyclic symmetry structure of input data. Such OT with cyclic symmetry appears universally in various real-world examples: image processing, urban planning, and graph processing. Our main idea is to reduce OT to a small optimization problem that has significantly fewer variables by utilizing cyclic symmetry and various optimization techniques. On the basis of this reduction, our algorithms solve the small optimization problem instead of the original OT. As a result, our algorithms obtain the optimal solution and the objective function value of the original OT faster than solving the original OT directly. In this paper, our focus is on two crucial OT formulations: the linear programming OT (LOT) and the strongly convex-regularized OT, which includes the well-known entropy-regularized OT (EROT). Experiments show the effectiveness of our algorithms for LOT and EROT in synthetic/real-world data that has a strict/approximate cyclic symmetry structure. Through theoretical and experimental results, this paper successfully introduces the concept of symmetry into the OT research field for the first time.
Hierarchical Matrix Factorization for Interpretable Collaborative Filtering
Authors: Kai Sugahara, Kazushi Okamoto
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.13277
Pdf link: https://arxiv.org/pdf/2311.13277
Abstract Matrix factorization (MF) is a simple collaborative filtering technique that achieves superior recommendation accuracy by decomposing the user-item rating matrix into user and item latent matrices. This approach relies on learning from user-item interactions, which may not effectively capture the underlying shared dependencies between users or items. Therefore, there is scope to explicitly capture shared dependencies to further improve recommendation accuracy and the interpretability of learning results by summarizing user-item interactions. Based on these insights, we propose "Hierarchical Matrix Factorization" (HMF), which incorporates clustering concepts to capture the hierarchy, where leaf nodes and other nodes correspond to users/items and clusters, respectively. Central to our approach, called hierarchical embeddings, is the additional decomposition of the user and item latent matrices (embeddings) into probabilistic connection matrices, which link the hierarchy, and a root cluster latent matrix. Thus, each node is represented by the weighted average of the embeddings of its parent clusters. The embeddings are differentiable, allowing simultaneous learning of interactions and clustering using a single gradient descent method. Furthermore, the obtained cluster-specific interactions naturally summarize user-item interactions and provide interpretability. Experimental results on rating and ranking predictions demonstrated the competitiveness of HMF over vanilla and hierarchical MF methods, especially its robustness in sparse interactions. Additionally, it was confirmed that the clustering integration of HMF has the potential for faster learning convergence and mitigation of overfitting compared to MF, and also provides interpretability through a cluster-centered case study.
Transfer Learning-based Real-time Handgun Detection
Authors: Youssef Elmir, Sid Ahmed Laouar, Larbi Hamdaoui
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2311.13559
Pdf link: https://arxiv.org/pdf/2311.13559
Abstract Traditional surveillance systems rely on human attention, limiting their effectiveness. This study employs convolutional neural networks and transfer learning to develop a real-time computer vision system for automatic handgun detection. Comprehensive analysis of online handgun detection methods is conducted, emphasizing reducing false positives and learning time. Transfer learning is demonstrated as an effective approach. Despite technical challenges, the proposed system achieves a precision rate of 84.74%, demonstrating promising performance comparable to related works, enabling faster learning and accurate automatic handgun detection for enhanced security. This research advances security measures by reducing human monitoring dependence, showcasing the potential of transfer learning-based approaches for efficient and reliable handgun detection.
Keyword: mobile

Reducing the Environmental Impact of Wireless Communication via Probabilistic Machine Learning
Authors: A. Ryo Koblitz, Lorenzo Maggi, Matthew Andrews
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12807
Pdf link: https://arxiv.org/pdf/2311.12807
Abstract Machine learning methods are increasingly adopted in communications problems, particularly those arising in next generation wireless settings. Though seen as a key climate mitigation and societal adaptation enabler, communications related energy consumption is high and is expected to grow in future networks in spite of anticipated efficiency gains in 6G due to exponential communications traffic growth. To make meaningful climate mitigation impact in the communications sector, a mindset shift away from maximizing throughput at all cost and towards prioritizing energy efficiency is needed. Moreover, this must be adopted in both existing (without incurring further embodied carbon costs through equipment replacement) and future network infrastructure, given the long development time of mobile generations. To that end, we present summaries of two such problems, from both current and next generation network specifications, where probabilistic inference methods were used to great effect: using Bayesian parameter tuning we are able to safely reduce the energy consumption of existing hardware on a live communications network by $11\%$ whilst maintaining operator specified performance envelopes; through spatiotemporal Gaussian process surrogate modeling we reduce the overhead in a next generation hybrid beamforming system by over $60\%$, greatly improving the networks' ability to target highly mobile users such as autonomous vehicles. The Bayesian paradigm is itself helpful in terms of energy usage, since training a Bayesian optimization model can require much less computation than, say, training a deep neural network.
High-Power and Safe RF Wireless Charging: Cautious Deployment and Operation
Authors: Onel L. A. López, Osmel M. Rosabal, Amirhossein Azarbahram, A. Basit Khattak, Mehdi Monemi, Richard D. Souza, Petar Popovski, Matti Latva-aho
Subjects: Networking and Internet Architecture (cs.NI); Emerging Technologies (cs.ET); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.12809
Pdf link: https://arxiv.org/pdf/2311.12809
Abstract The wired charging and the need for battery replacements are critical barriers to unlimited, scalable, and sustainable mobile connectivity, motivating the interest in radio frequency (RF) wireless power transfer (WPT) technology. However, the inherently low end-to-end power transfer efficiency (PTE) and health/safety-related apprehensions about the technology are critical obstacles. Indeed, RF-WPT implementation and operation require efficient and cautious strategies and protocols, especially when targeting high-power charging, which constitutes the scope of this work. Herein, we overview the main factors affecting the end-to-end PTE of RF-WPT systems and their multiplicative effect and interdependencies. Moreover, we discuss key electromagnetic field (EMF) exposure metrics, safety limits, and approaches for efficient and EMF-aware deployment and operation. Quantitatively, we show that near-field RF charging may significantly reduce EMF exposure, and thus must be promoted. We also present our vision of a cyber-physical system for efficient and safe wireless charging, specify key components and their interrelation, and illustrate numerically the PTE attained by two modern low-power multi-antenna architectures in a simple setup. Throughout the paper, we highlight the need for high end-to-end PTE architectures and charging protocols transparently complying with EMF exposure regulations and outline relevant challenges and research directions. This work expands the vision and understanding of modern RF-WPT technology and constitutes a step towards making the technology attractive for worldwide commercial exploitation.
Personalization of Affective Models to Enable Neuropsychiatric Digital Precision Health Interventions: A Feasibility Study
Authors: Ali Kargarandehkordi, Matti Kaisti, Peter Washington
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12812
Pdf link: https://arxiv.org/pdf/2311.12812
Abstract Mobile digital therapeutics for autism spectrum disorder (ASD) often target emotion recognition and evocation, which is a challenge for children with ASD. While such mobile applications often use computer vision machine learning (ML) models to guide the adaptive nature of the digital intervention, a single model is usually deployed and applied to all children. Here, we explore the potential of model personalization, or training a single emotion recognition model per person, to improve the performance of these underlying emotion recognition models used to guide digital health therapies for children with ASD. We conducted experiments on the Emognition dataset, a video dataset of human subjects evoking a series of emotions. For a subset of 10 individuals in the dataset with a sufficient representation of at least two ground truth emotion labels, we trained a personalized version of three classical ML models on a set of 51 features extracted from each video frame. We measured the importance of each facial feature for all personalized models and observed differing ranked lists of top features across subjects, motivating the need for model personalization. We then compared the personalized models against a generalized model trained using data from all 10 participants. The mean F1-scores achieved by the personalized models were 90.48%, 92.66%, and 86.40%, respectively. By contrast, the mean F1-scores reached by non-personalized models trained on different human subjects and evaluated using the same test set were 88.55%, 91.78%, and 80.42%, respectively. The personalized models outperformed the generalized models for 7 out of 10 participants. PCA analyses on the remaining 3 participants revealed relatively facial configuration differences between emotion labels within each subject, suggesting that personalized ML will fail when the variation among data points within a subjects data is too low.
Fast Deterministic Rendezvous in Labeled Lines
Authors: Avery Miller, Andrzej Pelc
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.12976
Pdf link: https://arxiv.org/pdf/2311.12976
Abstract Two mobile agents, starting from different nodes of a network modeled as a graph, and woken up at possibly different times, have to meet at the same node. This problem is known as rendezvous. We consider deterministic distributed rendezvous in the infinite path. Each node has a distinct label which is a positive integer. The time of rendezvous is the number of rounds until meeting, counted from the starting round of the earlier agent. We consider three scenarios. In the first scenario, each agent knows its position in the line, i.e., each of them knows its initial distance from the smallest-labeled node, on which side of this node it is located, and the direction towards it. For this scenario, we give a rendezvous algorithm working in time $O(D)$, where $D$ is the initial distance between the agents. This complexity is clearly optimal. In the second scenario, each agent initially knows only the label of its starting node and the initial distance $D$ between the agents. In this scenario, we give a rendezvous algorithm working in time $O(D\log^\ell)$, where $\ell$ is the larger label of the starting nodes. We prove a matching lower bound $\Omega(D\log^\ell)$. Finally, in the most general scenario, where each agent initially knows only the label of its starting node, we give a rendezvous algorithm working in time $O(D^2(\log^*\ell)^3)$, which is at most cubic in the lower bound. All our results remain valid (with small changes) for arbitrary finite paths and for cycles. Our algorithms are drastically better than approaches that use graph exploration, whose running times depend on the graph's size or diameter. Our main methodological tool, and the main novelty of the paper, is a two way reduction: from fast colouring of the infinite labeled path using a constant number of colours in the LOCAL model to fast rendezvous in this path, and vice-versa.
FollowMe: a Robust Person Following Framework Based on Re-Identification and Gestures
Authors: Federico Rollo, Andrea Zunino, Gennaro Raiola, Fabio Amadio, Arash Ajoudani, Nikolaos Tsagarakis
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.12992
Pdf link: https://arxiv.org/pdf/2311.12992
Abstract Human-robot interaction (HRI) has become a crucial enabler in houses and industries for facilitating operational flexibility. When it comes to mobile collaborative robots, this flexibility can be further increased due to the autonomous mobility and navigation capacity of the robotic agents, expanding their workspace and consequently, the personalizable assistance they can provide to the human operators. This however requires that the robot is capable of detecting and identifying the human counterpart in all stages of the collaborative task, and in particular while following a human in crowded workplaces. To respond to this need, we developed a unified perception and navigation framework, which enables the robot to identify and follow a target person using a combination of visual Re-Identification (Re-ID), hand gestures detection, and collision-free navigation. The Re-ID module can autonomously learn the features of a target person and use the acquired knowledge to visually re-identify the target. The navigation stack is used to follow the target avoiding obstacles and other individuals in the environment. Experiments are conducted with few subjects in a laboratory setting where some unknown dynamic obstacles are introduced.
REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints
Authors: Francesco Corti, Balz Maag, Joachim Schauer, Ulrich Pferschy, Olga Saukh
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13349
Pdf link: https://arxiv.org/pdf/2311.13349
Abstract Deep models deployed on edge devices frequently encounter resource variability, which arises from fluctuating energy levels, timing constraints, or prioritization of other critical tasks within the system. State-of-the-art machine learning pipelines generate resource-agnostic models, not capable to adapt at runtime. In this work we introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. In contrast to the state-of-the-art, REDS use structured sparsity constructively by exploiting permutation invariance of neurons, which allows for hardware-specific optimizations. Specifically, REDS achieve computational efficiency by (1) skipping sequential computational blocks identified by a novel iterative knapsack optimizer, and (2) leveraging simple math to re-arrange the order of operations in REDS computational graph to take advantage of the data cache. REDS support conventional deep networks frequently deployed on the edge and provide computational benefits even for small and simple networks. We evaluate REDS on six benchmark architectures trained on the Google Speech Commands, FMNIST and CIFAR10 datasets, and test on four off-the-shelf mobile and embedded hardware platforms. We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy, and demonstrate an adaptation time in response to dynamic resource constraints of under 40$\mu$s, utilizing a 2-layer fully-connected network on Arduino Nano 33 BLE Sense.
Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training
Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Jiming Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.13381
Pdf link: https://arxiv.org/pdf/2311.13381
Abstract Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device's memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings.
Keyword: pruning

Top-$L$ Most Influential Community Detection Over Social Networks (Technical Report)
Authors: Nan Zhang, Yutong Ye, Xiang Lian, Mingsong Chen
Subjects: Social and Information Networks (cs.SI); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2311.13162
Pdf link: https://arxiv.org/pdf/2311.13162
Abstract In many real-world applications such as social network analysis and online marketing/advertising, the \textit{community detection} is a fundamental task to identify communities (subgraphs) in social networks with high structural cohesiveness. While previous works focus on detecting communities alone, they do not consider the collective influences of users in these communities on other user nodes in social networks. Inspired by this, in this paper, we investigate the influence propagation from some \textit{seed communities} and their influential effects that result in the \textit{influenced communities}. We propose a novel problem, named \textit{\underline{Top-$L$} most \underline{I}nfluential \underline{C}ommunity \underline{DE}tection} (Top$L$-ICDE) over social networks, which aims to retrieve top-$L$ seed communities with the highest influences, having high structural cohesiveness, and containing user-specified query keywords. In order to efficiently tackle the Top$L$-ICDE problem, we design effective pruning strategies to filter out false alarms of seed communities and propose an effective index mechanism to facilitate efficient Top-$L$ community retrieval. We develop an efficient Top$L$-ICDE answering algorithm by traversing the index and applying our proposed pruning strategies. We also formulate and tackle a variant of Top$L$-ICDE, named \textit{diversified top-$L$ most influential community detection} (DTop$L$-ICDE), which returns a set of $L$ diversified communities with the highest diversity score (i.e., collaborative influences by $L$ communities). We prove that DTop$L$-ICDE is NP-hard, and propose an efficient greedy algorithm with our designed diversity score pruning. Through extensive experiments, we verify the efficiency and effectiveness of our proposed Top$L$-ICDE and DTop$L$-ICDE approaches over real/synthetic social networks under various parameter settings.
AdaptiveFL: Adaptive Heterogeneous Federated Learning for Resource-Constrained AIoT Systems
Authors: Chentao Jia, Ming Hu, Zekai Chen, Yanxin Yang, Xiaofei Xie, Yang Liu, Mingsong Chen
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.13166
Pdf link: https://arxiv.org/pdf/2311.13166
Abstract Although Federated Learning (FL) is promising to enable collaborative learning among Artificial Intelligence of Things (AIoT) devices, it suffers from the problem of low classification performance due to various heterogeneity factors (e.g., computing capacity, memory size) of devices and uncertain operating environments. To address these issues, this paper introduces an effective FL approach named AdaptiveFL based on a novel fine-grained width-wise model pruning strategy, which can generate various heterogeneous local models for heterogeneous AIoT devices. By using our proposed reinforcement learning-based device selection mechanism, AdaptiveFL can adaptively dispatch suitable heterogeneous models to corresponding AIoT devices on the fly based on their available resources for local training. Experimental results show that, compared to state-of-the-art methods, AdaptiveFL can achieve up to 16.83% inference improvements for both IID and non-IID scenarios.
Uncertainty Estimation in Multi-Agent Distributed Learning
Authors: Gleb Radchenko, Victoria Andrea Fill
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13356
Pdf link: https://arxiv.org/pdf/2311.13356
Abstract Traditionally, IoT edge devices have been perceived primarily as low-power components with limited capabilities for autonomous operations. Yet, with emerging advancements in embedded AI hardware design, a foundational shift paves the way for future possibilities. Thus, the aim of the KDT NEUROKIT2E project is to establish a new open-source framework to further facilitate AI applications on edge devices by developing new methods in quantization, pruning-aware training, and sparsification. These innovations hold the potential to expand the functional range of such devices considerably, enabling them to manage complex Machine Learning (ML) tasks utilizing local resources and laying the groundwork for innovative learning approaches. In the context of 6G's transformative potential, distributed learning among independent agents emerges as a pivotal application, attributed to 6G networks' support for ultra-reliable low-latency communication, enhanced data rates, and advanced edge computing capabilities. Our research focuses on the mechanisms and methodologies that allow edge network-enabled agents to engage in collaborative learning in distributed environments. Particularly, one of the key issues within distributed collaborative learning is determining the degree of confidence in the learning results, considering the spatio-temporal locality of data sets perceived by independent agents.
Deriving Comprehensible Theories from Probabilistic Circuits
Authors: Sieben Bocklandt, Wannes Meert, Koen Vanderstraeten, Wouter Pijpops, Kurt Jaspers
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13379
Pdf link: https://arxiv.org/pdf/2311.13379
Abstract The field of Explainable AI (XAI) is seeking to shed light on the inner workings of complex AI models and uncover the rationale behind their decisions. One of the models gaining attention are probabilistic circuits (PCs), which are a general and unified framework for tractable probabilistic models that support efficient computation of various probabilistic queries. Probabilistic circuits guarantee inference that is polynomial in the size of the circuit. In this paper, we improve the explainability of probabilistic circuits by computing a comprehensible, readable logical theory that covers the high-density regions generated by a PC. To achieve this, pruning approaches based on generative significance are used in a new method called PUTPUT (Probabilistic circuit Understanding Through Pruning Underlying logical Theories). The method is applied to a real world use case where music playlists are automatically generated and expressed as readable (database) queries. Evaluation shows that this approach can effectively produce a comprehensible logical theory that describes the high-density regions of a PC and outperforms state of the art methods when exploring the performance-comprehensibility trade-off.
Keyword: diffusion

Investigating Copyright Issues of Diffusion Models under Practical Scenarios
Authors: Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Haonan Wang, Kenji Kawaguchi
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.12803
Pdf link: https://arxiv.org/pdf/2311.12803
Abstract The issue of copyright in generative models, particularly diffusion models, has become a prominent concern in recent years. Previous studies have predominantly focused on copyright violation at the image level, where generative models replicate copyrighted images entirely. Furthermore, these earlier studies have examined copyright infringements mainly using prompts that are semantically similar to target topics. However, copyright infringement can be more nuanced than mere replication of whole images and can be triggered with prompts that are less directly related to copyright topics. In our work, we tackle the limitations of previous studies by delving into partial copyright infringement, which treats parts of images as copyrighted content, using prompts that are considerably different from copyrighted topics. We develop a data generation pipeline that facilitates the creation of datasets for copyright research in diffusion models. Using our pipeline, we create datasets containing copyright infringement samples for different diffusion models. We conduct evaluations on generated data under various criteria. Our results show the prevalence of generating copyright-infringing content across a range of diffusion models, including the latest Stable Diffusion XL.
Toward effective protection against diffusion based mimicry through score distillation
Authors: Haotian Xue, Chumeng Liang, Xiaoyu Wu, Yongxin Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.12832
Pdf link: https://arxiv.org/pdf/2311.12832
Abstract While generative diffusion models excel in producing high-quality images, they can also be misused to mimic authorized images, posing a significant threat to AI systems. Efforts have been made to add calibrated perturbations to protect images from diffusion-based mimicry pipelines. However, most of the existing methods are too ineffective and even impractical to be used by individual users due to their high computation and memory requirements. In this work, we present novel findings on attacking latent diffusion models (LDM) and propose new plug-and-play strategies for more effective protection. In particular, we explore the bottleneck in attacking an LDM, discovering that the encoder module rather than the denoiser module is the vulnerable point. Based on this insight, we present our strategy using Score Distillation Sampling (SDS) to double the speed of protection and reduce memory occupation by half without compromising its strength. Additionally, we provide a robust protection strategy by counterintuitively minimizing the semantic loss, which can assist in generating more natural perturbations. Finally, we conduct extensive experiments to substantiate our findings and comprehensively evaluate our newly proposed strategies. We hope our insights and protective measures can contribute to better defense against malicious diffusion-based mimicry, advancing the development of secure AI systems. The code is available in https://github.com/xavihart/Diff-Protect
CopyScope: Model-level Copyright Infringement Quantification in the Diffusion Workflow
Authors: Junlei Zhou, Jiashi Gao, Ziwei Wang, Xuetao Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.12847
Pdf link: https://arxiv.org/pdf/2311.12847
Abstract Web-based AI image generation has become an innovative art form that can generate novel artworks with the rapid development of the diffusion model. However, this new technique brings potential copyright infringement risks as it may incorporate the existing artworks without the owners' consent. Copyright infringement quantification is the primary and challenging step towards AI-generated image copyright traceability. Previous work only focused on data attribution from the training data perspective, which is unsuitable for tracing and quantifying copyright infringement in practice because of the following reasons: (1) the training datasets are not always available in public; (2) the model provider is the responsible party, not the image. Motivated by this, in this paper, we propose CopyScope, a new framework to quantify the infringement of AI-generated images from the model level. We first rigorously identify pivotal components within the AI image generation pipeline. Then, we propose to take advantage of Fr\'echet Inception Distance (FID) to effectively capture the image similarity that fits human perception naturally. We further propose the FID-based Shapley algorithm to evaluate the infringement contribution among models. Extensive experiments demonstrate that our work not only reveals the intricacies of infringement quantification but also effectively depicts the infringing models quantitatively, thus promoting accountability in AI image-generation tasks.
RAEDiff: Denoising Diffusion Probabilistic Models Based Reversible Adversarial Examples Self-Generation and Self-Recovery
Authors: Fan Xing, Xiaoyi Zhou, Xuefeng Fan, Zhuo Tian, Yan Zhao
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12858
Pdf link: https://arxiv.org/pdf/2311.12858
Abstract Collected and annotated datasets, which are obtained through extensive efforts, are effective for training Deep Neural Network (DNN) models. However, these datasets are susceptible to be misused by unauthorized users, resulting in infringement of Intellectual Property (IP) rights owned by the dataset creators. Reversible Adversarial Exsamples (RAE) can help to solve the issues of IP protection for datasets. RAEs are adversarial perturbed images that can be restored to the original. As a cutting-edge approach, RAE scheme can serve the purposes of preventing unauthorized users from engaging in malicious model training, as well as ensuring the legitimate usage of authorized users. Nevertheless, in the existing work, RAEs still rely on the embedded auxiliary information for restoration, which may compromise their adversarial abilities. In this paper, a novel self-generation and self-recovery method, named as RAEDiff, is introduced for generating RAEs based on a Denoising Diffusion Probabilistic Models (DDPM). It diffuses datasets into a Biased Gaussian Distribution (BGD) and utilizes the prior knowledge of the DDPM for generating and recovering RAEs. The experimental results demonstrate that RAEDiff effectively self-generates adversarial perturbations for DNN models, including Artificial Intelligence Generated Content (AIGC) models, while also exhibiting significant self-recovery capabilities.
Fine-Grained Open Domain Image Animation with Motion Guidance
Authors: Zuozhuo Dai, Zhenghao Zhang, Yao Yao, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.12886
Pdf link: https://arxiv.org/pdf/2311.12886
Abstract Image animation is a key task in computer vision which aims to generate dynamic visual content from static image. Recent image animation methods employ neural based rendering technique to generate realistic animations. Despite these advancements, achieving fine-grained and controllable image animation guided by text remains challenging, particularly for open-domain images captured in diverse real environments. In this paper, we introduce an open domain image animation method that leverages the motion prior of video diffusion model. Our approach introduces targeted motion area guidance and motion strength guidance, enabling precise control the movable area and its motion speed. This results in enhanced alignment between the animated visual elements and the prompting text, thereby facilitating a fine-grained and interactive animation generation process for intricate motion sequences. We validate the effectiveness of our method through rigorous experiments on an open-domain dataset, with the results showcasing its superior performance. The source code and model will be made publicly available upon publication.
Text-Guided Texturing by Synchronized Multi-View Diffusion
Authors: Yuxin Liu, Minshan Xie, Hanyuan Liu, Tien-Tsin Wong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.12891
Pdf link: https://arxiv.org/pdf/2311.12891
Abstract This paper introduces a novel approach to synthesize texture to dress up a given 3D object, given a text prompt. Based on the pretrained text-to-image (T2I) diffusion model, existing methods usually employ a project-and-inpaint approach, in which a view of the given object is first generated and warped to another view for inpainting. But it tends to generate inconsistent texture due to the asynchronous diffusion of multiple views. We believe such asynchronous diffusion and insufficient information sharing among views are the root causes of the inconsistent artifact. In this paper, we propose a synchronized multi-view diffusion approach that allows the diffusion processes from different views to reach a consensus of the generated content early in the process, and hence ensures the texture consistency. To synchronize the diffusion, we share the denoised content among different views in each denoising step, specifically blending the latent content in the texture domain from views with overlap. Our method demonstrates superior performance in generating consistent, seamless, highly detailed textures, comparing to state-of-the-art methods.
Diffusion Model Alignment Using Direct Preference Optimization
Authors: Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12908
Pdf link: https://arxiv.org/pdf/2311.12908
Abstract Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has not been widely explored in text-to-image diffusion models; the best existing approach is to fine-tune a pretrained model using carefully curated high quality images and captions to improve visual appeal and text alignment. We propose Diffusion-DPO, a method to align diffusion models to human preferences by directly optimizing on human comparison data. Diffusion-DPO is adapted from the recently developed Direct Preference Optimization (DPO), a simpler alternative to RLHF which directly optimizes a policy that best satisfies human preferences under a classification objective. We re-formulate DPO to account for a diffusion model notion of likelihood, utilizing the evidence lower bound to derive a differentiable objective. Using the Pick-a-Pic dataset of 851K crowdsourced pairwise preferences, we fine-tune the base model of the state-of-the-art Stable Diffusion XL (SDXL)-1.0 model with Diffusion-DPO. Our fine-tuned base model significantly outperforms both base SDXL-1.0 and the larger SDXL-1.0 model consisting of an additional refinement model in human evaluation, improving visual appeal and prompt alignment. We also develop a variant that uses AI feedback and has comparable performance to training on human preferences, opening the door for scaling of diffusion model alignment methods.
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection
Authors: Ahmed Sharshar, Aleksandr Matsun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12956
Pdf link: https://arxiv.org/pdf/2311.12956
Abstract In the realm of aerial image analysis, object detection plays a pivotal role, with significant implications for areas such as remote sensing, urban planning, and disaster management. This study addresses the inherent challenges in this domain, notably the detection of small objects, managing densely packed elements, and accounting for diverse orientations. We present an in-depth evaluation of an object detection model that integrates the Large Selective Kernel Network (LSKNet)as its backbone with the DiffusionDet head, utilizing the iSAID dataset for empirical analysis. Our approach encompasses the introduction of novel methodologies and extensive ablation studies. These studies critically assess various aspects such as loss functions, box regression techniques, and classification strategies to refine the model's precision in object detection. The paper details the experimental application of the LSKNet backbone in synergy with the DiffusionDet heads, a combination tailored to meet the specific challenges in aerial image object detection. The findings of this research indicate a substantial enhancement in the model's performance, especially in the accuracy-time tradeoff. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement, outperforming the RCNN model by 4.7% on the same dataset. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis, paving the way for more accurate and efficient object detection methodologies. The code is publicly available at https://github.com/SashaMatsun/LSKDiffDet
SD-NAE: Generating Natural Adversarial Examples with Stable Diffusion
Authors: Yueqian Lin, Jingyang Zhang, Yiran Chen, Hai Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.12981
Pdf link: https://arxiv.org/pdf/2311.12981
Abstract Robustly evaluating deep learning image classifiers is challenging due to some limitations of standard datasets. Natural Adversarial Examples (NAEs), arising naturally from the environment and capable of deceiving classifiers, are instrumental in identifying vulnerabilities in trained models. Existing works collect such NAEs by filtering from a huge set of real images, a process that is passive and lacks control. In this work, we propose to actively synthesize NAEs with the state-of-the-art Stable Diffusion. Specifically, our method formulates a controlled optimization process, where we perturb the token embedding that corresponds to a specified class to synthesize NAEs. The generation is guided by the gradient of loss from the target classifier so that the created image closely mimics the ground-truth class yet fools the classifier. Named SD-NAE (Stable Diffusion for Natural Adversarial Examples), our innovative method is effective in producing valid and useful NAEs, which is demonstrated through a meticulously designed experiment. Our work thereby provides a valuable method for obtaining challenging evaluation data, which in turn can potentially advance the development of more robust deep learning models. Code is available at https://github.com/linyueqian/SD-NAE.
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
Authors: Vladimir Arkhipkin, Zein Shaheen, Viacheslav Vasilev, Elizaveta Dakhova, Andrey Kuznetsov, Denis Dimitrov
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2311.13073
Pdf link: https://arxiv.org/pdf/2311.13073
Abstract Multimedia generation approaches occupy a prominent place in artificial intelligence research. Text-to-image models achieved high-quality results over the last few years. However, video synthesis methods recently started to develop. This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model. The first stage concerns keyframes synthesis to figure the storyline of a video, while the second one is devoted to interpolation frames generation to make movements of the scene and objects smooth. We compare several temporal conditioning approaches for keyframes generation. The results show the advantage of using separate temporal blocks over temporal layers in terms of metrics reflecting video generation quality aspects and human preference. The design of our interpolation model significantly reduces computational costs compared to other masked frame interpolation approaches. Furthermore, we evaluate different configurations of MoVQ-based video decoding scheme to improve consistency and achieve higher PSNR, SSIM, MSE, and LPIPS scores. Finally, we compare our pipeline with existing solutions and achieve top-2 scores overall and top-1 among open-source solutions: CLIPSIM = 0.2976 and FVD = 433.054. Project page: https://ai-forever.github.io/kandinsky-video/
On the Limitation of Diffusion Models for Synthesizing Training Datasets
Authors: Shin'ya Yamaguchi, Takuma Fukuda
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13090
Pdf link: https://arxiv.org/pdf/2311.13090
Abstract Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpose of replicating datasets for training discriminative tasks. This paper investigates the gap between synthetic and real samples by analyzing the synthetic samples reconstructed from real samples through the diffusion and reverse process. By varying the time steps starting the reverse process in the reconstruction, we can control the trade-off between the information in the original real data and the information added by diffusion models. Through assessing the reconstructed samples and trained models, we found that the synthetic data are concentrated in modes of the training data distribution as the reverse step increases, and thus, they are difficult to cover the outer edges of the distribution. Our findings imply that modern diffusion models are insufficient to replicate training data distribution perfectly, and there is room for the improvement of generative modeling in the replication of training datasets.
Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis
Authors: Yixin Liu, Chenrui Fan, Yutong Dai, Xun Chen, Pan Zhou, Lichao Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.13127
Pdf link: https://arxiv.org/pdf/2311.13127
Abstract Text-to-image diffusion models allow seamless generation of personalized images from scant reference photos. Yet, these tools, in the wrong hands, can fabricate misleading or harmful content, endangering individuals. To address this problem, existing poisoning-based approaches perturb user images in an imperceptible way to render them "unlearnable" from malicious uses. We identify two limitations of these defending approaches: i) sub-optimal due to the hand-crafted heuristics for solving the intractable bilevel optimization and ii) lack of robustness against simple data transformations like Gaussian filtering. To solve these challenges, we propose MetaCloak, which solves the bi-level poisoning problem with a meta-learning framework with an additional transformation sampling process to craft transferable and robust perturbation. Specifically, we employ a pool of surrogate diffusion models to craft transferable and model-agnostic perturbation. Furthermore, by incorporating an additional transformation process, we design a simple denoising-error maximization loss that is sufficient for causing transformation-robust semantic distortion and degradation in a personalized generation. Extensive experiments on the VGGFace2 and CelebA-HQ datasets show that MetaCloak outperforms existing approaches. Notably, MetaCloak can successfully fool online training services like Replicate, in a black-box manner, demonstrating the effectiveness of MetaCloak in real-world scenarios. Our code is available at https://github.com/liuyixin-louis/MetaCloak.
Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models
Authors: Mengyang Feng, Jinlin Liu, Miaomiao Cui, Xuansong Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13141
Pdf link: https://arxiv.org/pdf/2311.13141
Abstract This is a technical report on the 360-degree panoramic image generation task based on diffusion models. Unlike ordinary 2D images, 360-degree panoramic images capture the entire $360^\circ\times 180^\circ$ field of view. So the rightmost and the leftmost sides of the 360 panoramic image should be continued, which is the main challenge in this field. However, the current diffusion pipeline is not appropriate for generating such a seamless 360-degree panoramic image. To this end, we propose a circular blending strategy on both the denoising and VAE decoding stages to maintain the geometry continuity. Based on this, we present two models for \textbf{Text-to-360-panoramas} and \textbf{Single-Image-to-360-panoramas} tasks. The code has been released as an open-source project at \href{https://github.com/ArcherFMY/SD-T2I-360PanoImage}{https://github.com/ArcherFMY/SD-T2I-360PanoImage} and \href{https://www.modelscope.cn/models/damo/cv_diffusion_text-to-360panorama-image_generation/summary}{ModelScope}
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Authors: Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, Xiu Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13231
Pdf link: https://arxiv.org/pdf/2311.13231
Abstract Using reinforcement learning with human feedback (RLHF) has shown significant promise in fine-tuning diffusion models. Previous methods start by training a reward model that aligns with human preferences, then leverage RL techniques to fine-tune the underlying models. However, crafting an efficient reward model demands extensive datasets, optimal architecture, and manual hyperparameter tuning, making the process both time and cost-intensive. The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model. However, the extensive GPU memory requirement of the diffusion model's denoising process hinders the direct application of the DPO method. To address this issue, we introduce the Direct Preference for Denoising Diffusion Policy Optimization (D3PO) method to directly fine-tune diffusion models. The theoretical analysis demonstrates that although D3PO omits training a reward model, it effectively functions as the optimal reward model trained using human feedback data to guide the learning process. This approach requires no training of a reward model, proving to be more direct, cost-effective, and minimizing computational overhead. In experiments, our method uses the relative scale of objectives as a proxy for human preference, delivering comparable results to methods using ground-truth rewards. Moreover, D3PO demonstrates the ability to reduce image distortion rates and generate safer images, overcoming challenges lacking robust reward models.
Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution
Authors: Yuxuan Zhou, Liangcai Gao, Zhi Tang, Baole Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13317
Pdf link: https://arxiv.org/pdf/2311.13317
Abstract Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images, consequently elevating recognition accuracy in Scene Text Recognition (STR). Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance to address this issue. Nevertheless, they remain deficient when confronted with severely blurred images, due to their insufficient generation capability when little structural or semantic information can be extracted from original images. Therefore, we introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios. Moreover, we propose a Recognition-Guided Denoising Network, to guide the diffusion model generating LR-consistent results through succinct semantic guidance. Experiments on the TextZoom dataset demonstrate the superiority of RGDiffSR over prior state-of-the-art methods in both text recognition accuracy and image fidelity.
A highly efficient finite volume method with a diffusion control parameter for hyperbolic problems
Authors: Wassim Aboussi, Moussa Ziggaf, Imad Kissami, Mohamed Boubekeur
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13344
Pdf link: https://arxiv.org/pdf/2311.13344
Abstract This article proposes a highly accurate and conservative method for hyperbolic systems using the finite volume approach. This innovative scheme constructs the intermediate states at the interfaces of the control volumes using the method of characteristics. The approach is simple to implement, generates entropic solutions, and avoids solving Riemann problems. A diffusion control parameter is introduced to increase the accuracy of the scheme. Numerical examples are presented for the Euler equation for an ideal gas. The results demonstrate the method's ability to capture contact discontinuity and shock wave profiles with high accuracy and low cost as well as its robustness.
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Authors: Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13384
Pdf link: https://arxiv.org/pdf/2311.13384
Abstract With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene.
Simultaneous uniqueness and numerical inversion for an inverse problem in the time-domain diffuse optical tomography with fluorescence
Authors: Zhiyuan Li, Chunlong Sun
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2311.13391
Pdf link: https://arxiv.org/pdf/2311.13391
Abstract In this work, an inverse problem on the determination of multiple coefficients arising from the time-domain diffuse optical tomography with fluorescence (DOT-FDOT) is investigated. We simultaneously recover the distribution of background absorption coefficient, photon diffusion coefficient as well as the fluorescence absorption in biological tissue by the time-dependent boundary measurements. We build the uniqueness theorem of this multiple coefficients simultaneous inverse problem. After that, the numerical inversions are considered. We introduce an accelerated Landweber iterative algorithm and give several numerical examples illustrating the performance of the proposed inversion schemes.
Guided Flows for Generative Modeling and Decision Making
Authors: Qinqing Zheng, Matt Le, Neta Shaul, Yaron Lipman, Aditya Grover, Ricky T. Q. Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13443
Pdf link: https://arxiv.org/pdf/2311.13443
Abstract Classifier-free guidance is a key component for improving the performance of conditional generative models for many downstream tasks. It drastically improves the quality of samples produced, but has so far only been used for diffusion models. Flow Matching (FM), an alternative simulation-free approach, trains Continuous Normalizing Flows (CNFs) based on regressing vector fields. It remains an open question whether classifier-free guidance can be performed for Flow Matching models, and to what extent does it improve performance. In this paper, we explore the usage of Guided Flows for a variety of downstream applications involving conditional image generation, speech synthesis, and reinforcement learning. In particular, we are the first to apply flow models to the offline reinforcement learning setting. We also show that Guided Flows significantly improves the sample quality in image generation and zero-shot text-to-speech synthesis, and can make use of drastically low amounts of computation without affecting the agent's overall performance.
DiffusionMat: Alpha Matting as Sequential Refinement Learning
Authors: Yangyang Xu, Shengfeng He, Wenqi Shao, Kwan-Yee K. Wong, Yu Qiao, Ping Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13535
Pdf link: https://arxiv.org/pdf/2311.13535
Abstract In this paper, we introduce DiffusionMat, a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes. Diverging from conventional methods that utilize trimaps merely as loose guidance for alpha matte prediction, our approach treats image matting as a sequential refinement learning process. This process begins with the addition of noise to trimaps and iteratively denoises them using a pre-trained diffusion model, which incrementally guides the prediction towards a clean alpha matte. The key innovation of our framework is a correction module that adjusts the output at each denoising step, ensuring that the final result is consistent with the input image's structures. We also introduce the Alpha Reliability Propagation, a novel technique designed to maximize the utility of available guidance by selectively enhancing the trimap regions with confident alpha information, thus simplifying the correction task. To train the correction module, we devise specialized loss functions that target the accuracy of the alpha matte's edges and the consistency of its opaque and transparent regions. We evaluate our model across several image matting benchmarks, and the results indicate that DiffusionMat consistently outperforms existing methods. Project page at~\url{https://cnnlstm.github.io/DiffusionMat
ADriver-I: A General World Model for Autonomous Driving
Authors: Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, Tiancai Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.13549
Pdf link: https://arxiv.org/pdf/2311.13549
Abstract Typically, autonomous driving adopts a modular design, which divides the full stack into perception, prediction, planning and control parts. Though interpretable, such modular design tends to introduce a substantial amount of redundancy. Recently, multimodal large language models (MLLM) and diffusion techniques have demonstrated their superior performance on comprehension and generation ability. In this paper, we first introduce the concept of interleaved vision-action pair, which unifies the format of visual features and control signals. Based on the vision-action pairs, we construct a general world model based on MLLM and diffusion model for autonomous driving, termed ADriver-I. It takes the vision-action pairs as inputs and autoregressively predicts the control signal of the current frame. The generated control signals together with the historical vision-action pairs are further conditioned to predict the future frames. With the predicted next frame, ADriver-I performs further control signal prediction. Such a process can be repeated infinite times, ADriver-I achieves autonomous driving in the world created by itself. Extensive experiments are conducted on nuScenes and our large-scale private datasets. ADriver-I shows impressive performance compared to several constructed baselines. We hope our ADriver-I can provide some new insights for future autonomous driving and embodied intelligence.
WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space
Authors: Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13570
Pdf link: https://arxiv.org/pdf/2311.13570
Abstract Modern learning-based approaches to 3D-aware image synthesis achieve high photorealism and 3D-consistent viewpoint changes for the generated images. Existing approaches represent instances in a shared canonical space. However, for in-the-wild datasets a shared canonical system can be difficult to define or might not even exist. In this work, we instead model instances in view space, alleviating the need for posed images and learned camera distributions. We find that in this setting, existing GAN-based methods are prone to generating flat geometry and struggle with distribution coverage. We hence propose WildFusion, a new approach to 3D-aware image synthesis based on latent diffusion models (LDMs). We first train an autoencoder that infers a compressed latent representation, which additionally captures the images' underlying 3D structure and enables not only reconstruction but also novel view synthesis. To learn a faithful 3D representation, we leverage cues from monocular depth prediction. Then, we train a diffusion model in the 3D-aware latent space, thereby enabling synthesis of high-quality 3D-consistent image samples, outperforming recent state-of-the-art GAN-based methods. Importantly, our 3D-aware LDM is trained without any direct supervision from multiview images or 3D geometry and does not require posed images or learned pose or camera distributions. It directly learns a 3D representation without relying on canonical camera coordinates. This opens up promising research avenues for scalable 3D-aware image synthesis and 3D content creation from in-the-wild image data. See https://katjaschwarz.github.io/wildfusion for videos of our 3D results.
On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates
Authors: Stefano Bruno, Ying Zhang, Dong-Young Lim, Ömer Deniz Akyildiz, Sotirios Sabanis
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13584
Pdf link: https://arxiv.org/pdf/2311.13584
Abstract We provide full theoretical guarantees for the convergence behaviour of diffusion-based generative models under the assumption of strongly logconcave data distributions while our approximating class of functions used for score estimation is made of Lipschitz continuous functions. We demonstrate via a motivating example, sampling from a Gaussian distribution with unknown mean, the powerfulness of our approach. In this case, explicit estimates are provided for the associated optimization problem, i.e. score approximation, while these are combined with the corresponding sampling estimates. As a result, we obtain the best known upper bound estimates in terms of key quantities of interest, such as the dimension and rates of convergence, for the Wasserstein-2 distance between the data distribution (Gaussian with unknown mean) and our sampling algorithm. Beyond the motivating example and in order to allow for the use of a diverse range of stochastic optimizers, we present our results using an $L^2$-accurate score estimation assumption, which crucially is formed under an expectation with respect to the stochastic optimizer and our novel auxiliary process that uses only known information. This approach yields the best known convergence rate for our sampling algorithm.
Keyword: adaptive

Frequency Analysis with Multiple Kernels and Complete Dictionary
Authors: Cuiyun Lin, Tao Qian
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.12798
Pdf link: https://arxiv.org/pdf/2311.12798
Abstract In signal analysis, among the effort of seeking for efficient representations of a signal into the basic ones of meaningful frequencies, to extract principal frequency components, consecutively one after another or $n$ at one time, is a fundamental strategy. For this goal, we define the concept of mean-frequency and develop the related frequency decomposition with the complete Szeg\"o kernel dictionary, the latter consisting of the multiple kernels, being defined as the parameter-derivatives of the Szeg\"o kernels. Several major energy matching pursuit type sparse representations, including greedy algorithm (GA), orthogonal greedy algorithm (OGA), adaptive Fourier decomposition (AFD), pre-orthogonal adaptive Fourier decomposition (POAFD), $n$-Best approximation and unwinding Blaschke expansion, are analyzed and compared. Of which an order in re-construction efficiency between the mentioned algorithms is given based on detailed study of their respective remainders. The study spells out the natural connections between the multiple kernels and the related Laguerre system, and in particular shows that both, like the Fourier series, extract out the $O(n^{-\sigma})$ order convergence rate from the functions in the Hardy-Sobolev space of order $\sigma >0.$ Existence of the $n$-Best approximation with the complete Szeg\"o dictionary is proved and the related algorithm aspects are discussed. The included experiments form a significant integration part of the study, for they not only illustrate the theoretical results, but also provide cross comparison between various ways of combination between the matching pursuit algorithms and the dictionaries in use. Experiments show that the complete dictionary remarkably improves approximation efficiency.
Personalization of Affective Models to Enable Neuropsychiatric Digital Precision Health Interventions: A Feasibility Study
Authors: Ali Kargarandehkordi, Matti Kaisti, Peter Washington
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12812
Pdf link: https://arxiv.org/pdf/2311.12812
Abstract Mobile digital therapeutics for autism spectrum disorder (ASD) often target emotion recognition and evocation, which is a challenge for children with ASD. While such mobile applications often use computer vision machine learning (ML) models to guide the adaptive nature of the digital intervention, a single model is usually deployed and applied to all children. Here, we explore the potential of model personalization, or training a single emotion recognition model per person, to improve the performance of these underlying emotion recognition models used to guide digital health therapies for children with ASD. We conducted experiments on the Emognition dataset, a video dataset of human subjects evoking a series of emotions. For a subset of 10 individuals in the dataset with a sufficient representation of at least two ground truth emotion labels, we trained a personalized version of three classical ML models on a set of 51 features extracted from each video frame. We measured the importance of each facial feature for all personalized models and observed differing ranked lists of top features across subjects, motivating the need for model personalization. We then compared the personalized models against a generalized model trained using data from all 10 participants. The mean F1-scores achieved by the personalized models were 90.48%, 92.66%, and 86.40%, respectively. By contrast, the mean F1-scores reached by non-personalized models trained on different human subjects and evaluated using the same test set were 88.55%, 91.78%, and 80.42%, respectively. The personalized models outperformed the generalized models for 7 out of 10 participants. PCA analyses on the remaining 3 participants revealed relatively facial configuration differences between emotion labels within each subject, suggesting that personalized ML will fail when the variation among data points within a subjects data is too low.
ECNR: Efficient Compressive Neural Representation of Time-Varying Volumetric Datasets
Authors: Kaiyuan Tang, Chaoli Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.12831
Pdf link: https://arxiv.org/pdf/2311.12831
Abstract Due to its conceptual simplicity and generality, compressive neural representation has emerged as a promising alternative to traditional compression methods for managing massive volumetric datasets. The state-of-the-art neural compression solution, neurcomp, however, utilizes a single large multilayer perceptron (MLP) to encode the global volume, incurring slow training and inference. This paper presents an efficient compressive neural representation (ECNR) solution that improves upon neurcomp to handle large-scale time-varying datasets. At the heart of our approach is a multiscale structure that uses the Laplacian pyramid for adaptive signal fitting via implicit neural representation. We leverage multiple small MLPs at each scale for fitting local content or residual blocks. By assigning similar blocks to the same MLP via size uniformization, we enable balanced parallelization among MLPs to significantly speed up training and inference. A deep compression strategy is then employed to compact the resulting model. We demonstrate the effectiveness of ECNR with multiple datasets and compare it with neurcomp and two state-of-the-art conventional compression methods (SZ3 and TTHRESH). Our results position ECNR as a promising alternative to neurcomp for scientific data compression.
Robustifying Generalizable Implicit Shape Networks with a Tunable Non-Parametric Model
Authors: Amine Ouasfi, Adnane Boukhayma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.12967
Pdf link: https://arxiv.org/pdf/2311.12967
Abstract Feedforward generalizable models for implicit shape reconstruction from unoriented point cloud present multiple advantages, including high performance and inference speed. However, they still suffer from generalization issues, ranging from underfitting the input point cloud, to misrepresenting samples outside of the training data distribution, or with toplogies unseen at training. We propose here an efficient mechanism to remedy some of these limitations at test time. We combine the inter-shape data prior of the network with an intra-shape regularization prior of a Nystr\"om Kernel Ridge Regression, that we further adapt by fitting its hyperprameters to the current shape. The resulting shape function defined in a shape specific Reproducing Kernel Hilbert Space benefits from desirable stability and efficiency properties and grants a shape adaptive expressiveness-robustness trade-off. We demonstrate the improvement obtained through our method with respect to baselines and the state-of-the-art using synthetic and real data.
Composite Adaptive Lyapunov-Based Deep Neural Network (Lb-DNN) Controller
Authors: Omkar Sudhir Patil, Emily J. Griffis, Wanjiku A. Makumi, Warren E. Dixon
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13056
Pdf link: https://arxiv.org/pdf/2311.13056
Abstract Recent advancements in adaptive control have equipped deep neural network (DNN)-based controllers with Lyapunov-based adaptation laws that work across a range of DNN architectures to uniquely enable online learning. However, the adaptation laws are based on tracking error, and offer convergence guarantees on only the tracking error without providing conclusions on the parameter estimation performance. Motivated to provide guarantees on the DNN parameter estimation performance, this paper provides the first result on composite adaptation for adaptive Lyapunov-based DNN controllers, which uses the Jacobian of the DNN and a prediction error of the dynamics that is computed using a novel method involving an observer of the dynamics. A Lyapunov-based stability analysis is performed which guarantees the tracking, observer, and parameter estimation errors are uniformly ultimately bounded (UUB), with stronger performance guarantees when the DNN's Jacobian satisfies the persistence of excitation (PE) condition. Comparative simulation results demonstrate a significant performance improvement with the developed composite adaptive Lb-DNN controller in comparison to the tracking error-based Lb-DNN.
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Authors: Yutao Feng, Yintong Shang, Xuan Li, Tianjia Shao, Chenfanfu Jiang, Yin Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13099
Pdf link: https://arxiv.org/pdf/2311.13099
Abstract We show that physics-based simulations can be seamlessly integrated with NeRF to generate high-quality elastodynamics of real-world objects. Unlike existing methods, we discretize nonlinear hyperelasticity in a meshless way, obviating the necessity for intermediate auxiliary shape proxies like a tetrahedral mesh or voxel grid. A quadratic generalized moving least square (Q-GMLS) is employed to capture nonlinear dynamics and large deformation on the implicit model. Such meshless integration enables versatile simulations of complex and codimensional shapes. We adaptively place the least-square kernels according to the NeRF density field to significantly reduce the complexity of the nonlinear simulation. As a result, physically realistic animations can be conveniently synthesized using our method for a wide range of hyperelastic materials at an interactive rate. For more information, please visit our project page at https://fytalon.github.io/pienerf/.
Fast Parallel Algorithms for Submodular $p$-Superseparable Maximization
Authors: Philip Cervenjak, Junhao Gan, Anthony Wirth
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.13123
Pdf link: https://arxiv.org/pdf/2311.13123
Abstract Maximizing a non-negative, monontone, submodular function $f$ over $n$ elements under a cardinality constraint $k$ (SMCC) is a well-studied NP-hard problem. It has important applications in, e.g., machine learning and influence maximization. Though the theoretical problem admits polynomial-time approximation algorithms, solving it in practice often involves frequently querying submodular functions that are expensive to compute. This has motivated significant research into designing parallel approximation algorithms in the adaptive complexity model; adaptive complexity (adaptivity) measures the number of sequential rounds of $\text{poly}(n)$ function queries an algorithm requires. The state-of-the-art algorithms can achieve $(1-\frac{1}{e}-\varepsilon)$-approximate solutions with $O(\frac{1}{\varepsilon^2}\log n)$ adaptivity, which approaches the known adaptivity lower-bounds. However, the $O(\frac{1}{\varepsilon^2} \log n)$ adaptivity only applies to maximizing worst-case functions that are unlikely to appear in practice. Thus, in this paper, we consider the special class of $p$-superseparable submodular functions, which places a reasonable constraint on $f$, based on the parameter $p$, and is more amenable to maximization, while also having real-world applicability. Our main contribution is the algorithm LS+GS, a finer-grained version of the existing LS+PGB algorithm, designed for instances of SMCC when $f$ is $p$-superseparable; it achieves an expected $(1-\frac{1}{e}-\varepsilon)$-approximate solution with $O(\frac{1}{\varepsilon^2}\log(p k))$ adaptivity independent of $n$. Additionally, unrelated to $p$-superseparability, our LS+GS algorithm uses only $O(\frac{n}{\varepsilon} + \frac{\log n}{\varepsilon^2})$ oracle queries, which has an improved dependence on $\varepsilon^{-1}$ over the state-of-the-art LS+PGB; this is achieved through the design of a novel thresholding subroutine.
Have Your Cake and Eat It Too: Toward Efficient and Accurate Split Federated Learning
Authors: Dengke Yan, Ming Hu, Zeke Xia, Yanxin Yang, Jun Xia, Xiaofei Xie, Mingsong Chen
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.13163
Pdf link: https://arxiv.org/pdf/2311.13163
Abstract Due to its advantages in resource constraint scenarios, Split Federated Learning (SFL) is promising in AIoT systems. However, due to data heterogeneity and stragglers, SFL suffers from the challenges of low inference accuracy and low efficiency. To address these issues, this paper presents a novel SFL approach, named Sliding Split Federated Learning (S$^2$FL), which adopts an adaptive sliding model split strategy and a data balance-based training mechanism. By dynamically dispatching different model portions to AIoT devices according to their computing capability, S$^2$FL can alleviate the low training efficiency caused by stragglers. By combining features uploaded by devices with different data distributions to generate multiple larger batches with a uniform distribution for back-propagation, S$^2$FL can alleviate the performance degradation caused by data heterogeneity. Experimental results demonstrate that, compared to conventional SFL, S$^2$FL can achieve up to 16.5\% inference accuracy improvement and 3.54X training acceleration.
AdaptiveFL: Adaptive Heterogeneous Federated Learning for Resource-Constrained AIoT Systems
Authors: Chentao Jia, Ming Hu, Zekai Chen, Yanxin Yang, Xiaofei Xie, Yang Liu, Mingsong Chen
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.13166
Pdf link: https://arxiv.org/pdf/2311.13166
Abstract Although Federated Learning (FL) is promising to enable collaborative learning among Artificial Intelligence of Things (AIoT) devices, it suffers from the problem of low classification performance due to various heterogeneity factors (e.g., computing capacity, memory size) of devices and uncertain operating environments. To address these issues, this paper introduces an effective FL approach named AdaptiveFL based on a novel fine-grained width-wise model pruning strategy, which can generate various heterogeneous local models for heterogeneous AIoT devices. By using our proposed reinforcement learning-based device selection mechanism, AdaptiveFL can adaptively dispatch suitable heterogeneous models to corresponding AIoT devices on the fly based on their available resources for local training. Experimental results show that, compared to state-of-the-art methods, AdaptiveFL can achieve up to 16.83% inference improvements for both IID and non-IID scenarios.
Cracking the Code of Negative Transfer: A Cooperative Game Theoretic Approach for Cross-Domain Sequential Recommendation
Authors: Chung Park, Taesan Kim, Taekyoon Choi, Junui Hong, Yelim Yu, Mincheol Cho, Kyunam Lee, Sungil Ryu, Hyungjun Yoon, Minsung Choi, Jaegul Choo
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13188
Pdf link: https://arxiv.org/pdf/2311.13188
Abstract This paper investigates Cross-Domain Sequential Recommendation (CDSR), a promising method that uses information from multiple domains (more than three) to generate accurate and diverse recommendations, and takes into account the sequential nature of user interactions. The effectiveness of these systems often depends on the complex interplay among the multiple domains. In this dynamic landscape, the problem of negative transfer arises, where heterogeneous knowledge between dissimilar domains leads to performance degradation due to differences in user preferences across these domains. As a remedy, we propose a new CDSR framework that addresses the problem of negative transfer by assessing the extent of negative transfer from one domain to another and adaptively assigning low weight values to the corresponding prediction losses. To this end, the amount of negative transfer is estimated by measuring the marginal contribution of each domain to model performance based on a cooperative game theory. In addition, a hierarchical contrastive learning approach that incorporates information from the sequence of coarse-level categories into that of fine-level categories (e.g., item level) when implementing contrastive learning was developed to mitigate negative transfer. Despite the potentially low relevance between domains at the fine-level, there may be higher relevance at the category level due to its generalised and broader preferences. We show that our model is superior to prior works in terms of model performance on two real-world datasets across ten different domains.
Breast Cancer classification by adaptive weighted average ensemble of previously trained models
Authors: Mosab S. M. Farea, zhe chen
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13206
Pdf link: https://arxiv.org/pdf/2311.13206
Abstract Breast cancer is a serious disease that inflicts millions of people each year, and the number of cases is increasing. Early detection is the best way to reduce the impact of the disease. Researchers have developed many techniques to detect breast cancer, including the use of histopathology images in CAD systems. This research proposes a technique that combine already fully trained model using adaptive average ensemble, this is different from the literature which uses average ensemble before training and the average ensemble is trained simultaneously. Our approach is different because it used adaptive average ensemble after training which has increased the performance of evaluation metrics. It averages the outputs of every trained model, and every model will have weight according to its accuracy. The accuracy in the adaptive weighted ensemble model has achieved 98% where the accuracy has increased by 1 percent which is better than the best participating model in the ensemble which was 97%. Also, it decreased the numbers of false positive and false negative and enhanced the performance metrics.
Test-time Adaptive Vision-and-Language Navigation
Authors: Junyu Gao, Xuan Yao, Changsheng Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13209
Pdf link: https://arxiv.org/pdf/2311.13209
Abstract Vision-and-Language Navigation (VLN) has witnessed significant advancements in recent years, largely attributed to meticulously curated datasets and proficiently trained models. Nevertheless, when tested in diverse environments, the trained models inevitably encounter significant shifts in data distribution, highlighting that relying solely on pre-trained and fixed navigation models is insufficient. To enhance models' generalization ability, test-time adaptation (TTA) demonstrates significant potential in the computer vision field by leveraging unlabeled test samples for model updates. However, simply applying existing TTA methods to the VLN task cannot well handle the adaptability-stability dilemma of VLN models, i.e., frequent updates can result in drastic changes in model parameters, while occasional updates can make the models ill-equipped to handle dynamically changing environments. Therefore, we propose a Fast-Slow Test-Time Adaptation (FSTTA) approach for VLN by performing decomposition-accumulation analysis for both gradients and parameters in a unified framework. Specifically, in the fast update phase, gradients generated during the recent multi-step navigation process are decomposed into components with varying levels of consistency. Then, these components are adaptively accumulated to pinpoint a concordant direction for fast model adaptation. In the slow update phase, historically recorded parameters are gathered, and a similar decomposition-accumulation analysis is conducted to revert the model to a stable state. Extensive experiments show that our method obtains impressive performance gains on four popular benchmarks.
Asymptotically compatible energy and dissipation law of the nonuniform L2-$1_σ$ scheme for time fractional Allen-Cahn model
Authors: Hong-lin Liao, Xiaohan Zhu, Hong Sun
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13216
Pdf link: https://arxiv.org/pdf/2311.13216
Abstract We build an asymptotically compatible energy of the variable-step L2-$1_{\sigma}$ scheme for the time-fractional Allen-Cahn model with the Caputo's fractional derivative of order $\alpha\in(0,1)$, under a weak step-ratio constraint $\tauk/\tau{k-1}\geq r_{\star}(\alpha)$ for $k\ge2$, where $\tauk$ is the $k$-th time-step size and $r{\star}(\alpha)\in(0.3865,0.4037)$ for $\alpha\in(0,1)$. It provides a positive answer to the open problem in [J. Comput. Phys., 414:109473], and, to the best of our knowledge, it is the first second-order nonuniform time-stepping scheme to preserve both the maximum bound principle and the energy dissipation law of time-fractional Allen-Cahn model. The compatible discrete energy is constructed via a novel discrete gradient structure of the second-order L2-$1_{\sigma}$ formula by a local-nonlocal splitting technique. It splits the discrete fractional derivative into two parts: one is a local term analogue to the trapezoid rule of the first derivative and the other is a nonlocal summation analogue to the L1 formula of Caputo derivative. Numerical examples with an adaptive time-stepping strategy are provided to show the effectiveness of our scheme and the asymptotic properties of the associated modified energy.
Robust Outlier Bound Condition to Phase Retrieval with Adversarial Sparse Outliers
Authors: Gao Huang, Song Li, Hang Xu
Subjects: Information Theory (cs.IT); Functional Analysis (math.FA); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2311.13219
Pdf link: https://arxiv.org/pdf/2311.13219
Abstract We consider the problem of recovering an unknown signal $\pmb{x}0\in \mathbb{R}^{n}$ from phaseless measurements. In this paper, we study the convex phase retrieval problem via PhaseLift from linear Gaussian measurements perturbed by $\ell{1}$-bounded noise and sparse outliers that can change an adversarially chosen $s$-fraction of the measurement vector. We show that the Robust-PhaseLift model can successfully reconstruct the ground-truth up to global phase for any $s< s^{}\approx 0.1185$ with $\mathcal{O}(n)$ measurements, even in the case where the sparse outliers may depend on the measurement and the observation. The recovery guarantees are based on the robust outlier bound condition and the analysis of the product of two Gaussian variables. Moreover, we construct adaptive counterexamples to show that the Robust-PhaseLift model fails when $s> s^{}$ with high probability.
DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency
Authors: Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, Dacheng Tao, Tianyou Chai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.13254
Pdf link: https://arxiv.org/pdf/2311.13254
Abstract Video semantic segmentation is a pivotal aspect of video representation learning. However, significant domain shifts present a challenge in effectively learning invariant spatio-temporal features across the labeled source domain and unlabeled target domain for video semantic segmentation. To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features. Firstly, we perform bidirectional spatio-temporal fusion at the image sequence level and shallow feature level, leading to the construction of two fused intermediate video domains. This prompts the video semantic segmentation model to consistently learn spatio-temporal features of shared patch sequences which are influenced by domain-specific contexts, thereby mitigating the feature gap between the source and target domain. Secondly, we propose a category-aware feature alignment module to promote the consistency of spatio-temporal features, facilitating adaptation to the target domain. Specifically, we adaptively aggregate the domain-specific deep features of each category along spatio-temporal dimensions, which are further constrained to achieve cross-domain intra-class feature alignment and inter-class feature separation. Extensive experiments demonstrate the effectiveness of our method, which achieves state-of-the-art mIOUs on multiple challenging benchmarks. Furthermore, we extend the proposed DA-STC to the image domain, where it also exhibits superior performance for domain adaptive semantic segmentation. The source code and models will be made available at \url{https://github.com/ZHE-SAPI/DA-STC}.
An $hp$-adaptive strategy based on locally predicted error reductions
Authors: Patrick Bammer, Andreas Schröder, Thomas P. Wihler
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13255
Pdf link: https://arxiv.org/pdf/2311.13255
Abstract We introduce a new $hp$-adaptive strategy for self-adjoint elliptic boundary value problems that does not rely on using classical a posteriori error estimators. Instead, our approach is based on a generally applicable prediction strategy for the reduction of the energy error that can be expressed in terms of local modifications of the degrees of freedom in the underlying discrete approximation space. The computations related to the proposed prediction strategy involve low-dimensional linear problems that are computationally inexpensive and highly parallelizable. The mathematical building blocks for this new concept are first developed on an abstract Hilbert space level, before they are employed within the specific context of $hp$-type finite element discretizations. For this particular framework, we discuss an explicit construction of $p$-enrichments and $hp$-refinements by means of an appropriate constraint coefficient technique that can be employed in any dimensions. The applicability and effectiveness of the resulting $hp$-adaptive strategy is illustrated with some $1$- and $2$-dimensional numerical examples.
Softmax Acceleration with Adaptive Numeric Format for both Training and Inference
Authors: Tianhua Xia, Sai Qian Zhang
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2311.13290
Pdf link: https://arxiv.org/pdf/2311.13290
Abstract The attention mechanism is a pivotal element within the Transformer architecture, making a substantial contribution to its exceptional performance. Within this attention mechanism, Softmax is an imperative component that enables the model to assess the degree of correlation between various segments of the input. Yet, prior research has shown that Softmax operations can significantly increase processing latency and energy consumption in the Transformer network due to their internal nonlinear operations and data dependencies. In this work, we proposed~\textit{Hyft}, a hardware efficient floating point Softmax accelerator for both training and inference. Hyft aims to reduce the implementation cost of different nonlinear arithmetic operations by adaptively converting intermediate results into the most suitable numeric format for each specific operation, leading to reconfigurable accelerator with hybrid numeric format. The evaluation results highlight that Hyft achieves a remarkable $15\times$ reduction in hardware resource utilization and a $20 \times$ reduction in processing latency, all while maintaining a negligible impact on Transformer accuracy.
A Novel Dynamic Event-triggered Mechanism for Dynamic Average Consensus
Authors: Tao Xu, Zhisheng Duan, Guanghui Wen, Zhiyong Sun
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13371
Pdf link: https://arxiv.org/pdf/2311.13371
Abstract This paper studies a challenging issue introduced in a recent survey, namely designing a distributed event-based scheme to solve the dynamic average consensus (DAC) problem. First, a robust adaptive distributed event-based DAC algorithm is designed without imposing specific initialization criteria to perform estimation task under intermittent communication. Second, a novel adaptive distributed dynamic event-triggered mechanism is proposed to determine the triggering time when neighboring agents broadcast information to each other. Compared to the existing event-triggered mechanisms, the novelty of the proposed dynamic event-triggered mechanism lies in that it guarantees the existence of a positive and uniform minimum inter-event interval without sacrificing any accuracy of the estimation, which is much more practical than only ensuring the exclusion of the Zeno behavior or the boundedness of the estimation error. Third, a composite adaptive law is developed to update the adaptive gain employed in the distributed event-based DAC algorithm and dynamic event-triggered mechanism. Using the composite adaptive update law, the distributed event-based solution proposed in our work is implemented without requiring any global information. Finally, numerical simulations are provided to illustrate the effectiveness of the theoretical results.
Analyzing the Evolution and Maintenance of ML Models on Hugging Face
Authors: Joel Castaño, Silverio Martínez-Fernández, Xavier Franch, Justus Bogner
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.13380
Pdf link: https://arxiv.org/pdf/2311.13380
Abstract Hugging Face (HF) has established itself as a crucial platform for the development and sharing of machine learning (ML) models. This repository mining study, which delves into more than 380,000 models using data gathered via the HF Hub API, aims to explore the community engagement, evolution, and maintenance around models hosted on HF, aspects that have yet to be comprehensively explored in the literature. We first examine the overall growth and popularity of HF, uncovering trends in ML domains, framework usage, authors grouping and the evolution of tags and datasets used. Through text analysis of model card descriptions, we also seek to identify prevalent themes and insights within the developer community. Our investigation further extends to the maintenance aspects of models, where we evaluate the maintenance status of ML models, classify commit messages into various categories (corrective, perfective, and adaptive), analyze the evolution across development stages of commits metrics and introduce a new classification system that estimates the maintenance status of models based on multiple attributes. This study aims to provide valuable insights about ML model maintenance and evolution that could inform future model development, maintenance, and community engagement strategies on community-driven platforms like HF.
Differentially Private Non-Convex Optimization under the KL Condition with Optimal Rates
Authors: Michael Menart, Enayat Ullah, Raman Arora, Raef Bassily, Cristóbal Guzmán
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13447
Pdf link: https://arxiv.org/pdf/2311.13447
Abstract We study private empirical risk minimization (ERM) problem for losses satisfying the $(\gamma,\kappa)$-Kurdyka-{\L}ojasiewicz (KL) condition. The Polyak-{\L}ojasiewicz (PL) condition is a special case of this condition when $\kappa=2$. Specifically, we study this problem under the constraint of $\rho$ zero-concentrated differential privacy (zCDP). When $\kappa\in[1,2]$ and the loss function is Lipschitz and smooth over a sufficiently large region, we provide a new algorithm based on variance reduced gradient descent that achieves the rate $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^\kappa\big)$ on the excess empirical risk, where $n$ is the dataset size and $d$ is the dimension. We further show that this rate is nearly optimal. When $\kappa \geq 2$ and the loss is instead Lipschitz and weakly convex, we show it is possible to achieve the rate $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^\kappa\big)$ with a private implementation of the proximal point method. When the KL parameters are unknown, we provide a novel modification and analysis of the noisy gradient descent algorithm and show that this algorithm achieves a rate of $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{\frac{2\kappa}{4-\kappa}}\big)$ adaptively, which is nearly optimal when $\kappa = 2$. We further show that, without assuming the KL condition, the same gradient descent algorithm can achieve fast convergence to a stationary point when the gradient stays sufficiently large during the run of the algorithm. Specifically, we show that this algorithm can approximate stationary points of Lipschitz, smooth (and possibly nonconvex) objectives with rate as fast as $\tilde{O}\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)$ and never worse than $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{1/2}\big)$. The latter rate matches the best known rate for methods that do not rely on variance reduction.
Multi-Objective Bayesian Optimization with Active Preference Learning
Authors: Ryota Ozaki, Kazuki Ishikawa, Youhei Kanzaki, Shinya Suzuki, Shion Takeno, Ichiro Takeuchi, Masayuki Karasuyama
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13460
Pdf link: https://arxiv.org/pdf/2311.13460
Abstract There are a lot of real-world black-box optimization problems that need to optimize multiple criteria simultaneously. However, in a multi-objective optimization (MOO) problem, identifying the whole Pareto front requires the prohibitive search cost, while in many practical scenarios, the decision maker (DM) only needs a specific solution among the set of the Pareto optimal solutions. We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in the MOO with expensive objective functions, in which a Bayesian preference model of the DM is adaptively estimated by an interactive manner based on the two types of supervisions called the pairwise preference and improvement request. To explore the most preferred solution, we define an acquisition function in which the uncertainty both in the objective functions and the DM preference is incorporated. Further, to minimize the interaction cost with the DM, we also propose an active learning strategy for the preference estimation. We empirically demonstrate the effectiveness of our proposed method through the benchmark function optimization and the hyper-parameter optimization problems for machine learning models.
Keyword: quantization

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
Authors: Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.13171
Pdf link: https://arxiv.org/pdf/2311.13171
Abstract Parameter-efficient fine-tuning (PEFT) techniques make it possible to efficiently adapt a language model to create "expert" models that specialize to new tasks or domains. Recent techniques in model merging and compositional generalization leverage these expert models by dynamically composing modules to improve zero/few-shot generalization. Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU. To address these issues, we present ComPEFT, a novel method for compressing fine-tuning residuals (task vectors) of PEFT based models. ComPEFT employs sparsification and ternary quantization to reduce the size of the PEFT module without performing any additional retraining while preserving or enhancing model performance. In extensive evaluation across T5, T0, and LLaMA-based models with 200M - 65B parameters, ComPEFT achieves compression ratios of 8x - 50x. In particular, we show that ComPEFT improves with scale - stronger models exhibit higher compressibility and better performance. For example, we show that ComPEFT applied to LLaMA outperforms QLoRA by 4.16% on MMLU with a storage size reduction of up to 26x. In addition, we show that the compressed experts produced by ComPEFT maintain few-shot compositional generalization capabilities, facilitate efficient communication and computation, and exhibit enhanced performance when merged. Lastly, we provide an analysis of different method components, compare it with other PEFT methods, and test ComPEFT's efficacy for compressing the residual of full-finetuning. Our code is available at https://github.com/prateeky2806/compeft.
Uncertainty Estimation in Multi-Agent Distributed Learning
Authors: Gleb Radchenko, Victoria Andrea Fill
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13356
Pdf link: https://arxiv.org/pdf/2311.13356
Abstract Traditionally, IoT edge devices have been perceived primarily as low-power components with limited capabilities for autonomous operations. Yet, with emerging advancements in embedded AI hardware design, a foundational shift paves the way for future possibilities. Thus, the aim of the KDT NEUROKIT2E project is to establish a new open-source framework to further facilitate AI applications on edge devices by developing new methods in quantization, pruning-aware training, and sparsification. These innovations hold the potential to expand the functional range of such devices considerably, enabling them to manage complex Machine Learning (ML) tasks utilizing local resources and laying the groundwork for innovative learning approaches. In the context of 6G's transformative potential, distributed learning among independent agents emerges as a pivotal application, attributed to 6G networks' support for ultra-reliable low-latency communication, enhanced data rates, and advanced edge computing capabilities. Our research focuses on the mechanisms and methodologies that allow edge network-enabled agents to engage in collaborative learning in distributed environments. Particularly, one of the key issues within distributed collaborative learning is determining the degree of confidence in the learning results, considering the spatio-temporal locality of data sets perceived by independent agents.

A-suozhang / GetArxivDaily

New submissions for Thu, 23 Nov 23 #210

Keyword: efficient

Frequency Analysis with Multiple Kernels and Complete Dictionary

High-Power and Safe RF Wireless Charging: Cautious Deployment and Operation

Evolution of Convolutional Neural Network (CNN): Compute vs Memory bandwidth for Edge AI

Semantic Face Compression for Metaverse: A Compact 3D Descriptor Based Approach

Advancing The Rate-Distortion-Computation Frontier For Neural Image Compression

EWasteNet: A Two-Stream Data Efficient Image Transformer Approach for E-Waste Classification

A PSO Based Method to Generate Actionable Counterfactuals for High Dimensional Data

ECNR: Efficient Compressive Neural Representation of Time-Varying Volumetric Datasets

A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling

Wafer Map Defect Patterns Semi-Supervised Classification Using Latent Vector Representation

A Novel Defocus-Blur Region Detection Approach Based on DCT Feature and PCNN Structure

Meticulously Selecting 1% of the Dataset for Pre-training! Generating Differentially Private Images Data with Semantics Query

A versatile circuit for emulating active biological dendrites applied to sound localisation and neuron imitation

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

The Case for Universal Basic Computing Power

A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs

Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale Fine-Grained Image Retrieval

An Efficient 3D Gaussian Representation for Monocular/Multi-view Dynamic Scenes

Local Convolution Enhanced Global Fourier Neural Operator For Multiscale Dynamic Spaces Prediction

Q-Seg: Quantum Annealing-based Unsupervised Image Segmentation

DroneOptiNet: A Framework for Optimal Drone-based Load Redistribution Mechanism for 5G and Beyond Solar Small Cell Networks

PINNs-Based Uncertainty Quantification for Transient Stability Analysis

Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection

Terrestrial-Satellite Spectrum Sharing in the Upper Mid-Band with Interference Nulling

Robustifying Generalizable Implicit Shape Networks with a Tunable Non-Parametric Model

Volatility and irregularity Capturing in stock price indices using time series Generative adversarial networks (TimeGAN)

Fourier pseudospectral methods for the spatial variable-order fractional wave equations

Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and Optimization

AC Power Flow Informed Parameter Learning for DC Power Flow Network Equivalents

Towards Better Parameter-Efficient Fine-Tuning for Large Language Models: A Position Paper

Enhancing Microgrid Resilience with Green Hydrogen Storage

Testing Closeness of Multivariate Distributions via Ramsey Theory

Top-$L$ Most Influential Community Detection Over Social Networks (Technical Report)

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

NeISF: Neural Incident Stokes Field for Geometry and Material Estimation

Optimal trajectory planning meets network-level routing: Integrated control framework for emerging mobility systems

Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models

Towards Detecting, Recognizing, and Parsing the Address Information from Bangla Signboard: A Deep Learning-based Approach

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

Robot at the Mirror: Learning to Imitate via Associating Self-supervised Models

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

A model-free approach to fingertip slip and disturbance detection for grasp stability inference

An $hp$-adaptive strategy based on locally predicted error reductions

Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective

Softmax Acceleration with Adaptive Numeric Format for both Training and Inference

Probabilistic Inference in Reinforcement Learning Done Right

On the parallel solution of hydro-mechanical problems with fracture networks and contact conditions

Timely and Efficient Information Delivery in Real-Time Industrial IoT Networks

Automated generation of attack trees with optimal shape and labelling

REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints

Gradual Verification for Smart Contracts

Deriving Comprehensible Theories from Probabilistic Circuits

Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training

Simultaneous uniqueness and numerical inversion for an inverse problem in the time-domain diffuse optical tomography with fluorescence

A Comparative Analysis Between SciTokens, Verifiable Credentials, and Smart Contracts: Novel Approaches for Authentication and Secure Access to Scientific Data

Learning-Based Relaxation of Completeness Requirements for Data Entry Forms

Outerplanar and Forest Storyplans

Leveraging CNNs and Ensemble Learning for Automated Disaster Image Classification

Medical Image Retrieval Using Pretrained Embeddings

Transfer Learning-based Real-time Handgun Detection

Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies

A Survey of Serverless Machine Learning Model Inference

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

Triangle-free $2$-matchings

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Keyword: faster

Proposing an intelligent mesh smoothing method with graph neural networks

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

Fast and Interpretable Mortality Risk Scores for Critical Care Patients

Optimal Transport with Cyclic Symmetry

Hierarchical Matrix Factorization for Interpretable Collaborative Filtering

Transfer Learning-based Real-time Handgun Detection

Keyword: mobile

Reducing the Environmental Impact of Wireless Communication via Probabilistic Machine Learning

High-Power and Safe RF Wireless Charging: Cautious Deployment and Operation

Personalization of Affective Models to Enable Neuropsychiatric Digital Precision Health Interventions: A Feasibility Study