New submissions for Wed, 8 Nov 23

Keyword: efficient

HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis

Authors: Hanchen Ye, Hyegang Jun, Deming Chen
Subjects: Hardware Architecture (cs.AR); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2311.03379
Pdf link: https://arxiv.org/pdf/2311.03379
Abstract Dataflow architectures are growing in popularity due to their potential to mitigate the challenges posed by the memory wall inherent to the Von Neumann architecture. At the same time, high-level synthesis (HLS) has demonstrated its efficacy as a design methodology for generating efficient dataflow architectures within a short development cycle. However, existing HLS tools rely on developers to explore the vast dataflow design space, ultimately leading to suboptimal designs. This phenomenon is especially concerning as the size of the HLS design grows. To tackle these challenges, we introduce HIDA, a new scalable and hierarchical HLS framework that can systematically convert an algorithmic description into a dataflow implementation on hardware. We first propose a collection of efficient and versatile dataflow representations for modeling the hierarchical dataflow structure. Capitalizing on these representations, we develop an automated optimizer that decomposes the dataflow optimization problem into multiple levels based on the inherent dataflow hierarchy. Using FPGAs as an evaluation platform, working with a set of neural networks modeled in PyTorch, HIDA achieves up to 8.54$\times$ higher throughput compared to the state-of-the-art (SOTA) HLS optimization tool. Furthermore, despite being fully automated and able to handle various applications, HIDA achieves 1.29$\times$ higher throughput over the SOTA RTL-based neural network accelerators on an FPGA.
A Simple and Efficient Baseline for Data Attribution on Images
Authors: Vasu Singla, Pedro Sandoval-Segura, Micah Goldblum, Jonas Geiping, Tom Goldstein
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03386
Pdf link: https://arxiv.org/pdf/2311.03386
Abstract Data attribution methods play a crucial role in understanding machine learning models, providing insight into which training data points are most responsible for model outputs during deployment. However, current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions. These approaches therefore come at a high computational cost, are memory intensive, and are hard to scale to large models or datasets. In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution. Our method is model-agnostic and scales easily to large datasets. We show results on CIFAR-10 and ImageNet, achieving strong performance that rivals or outperforms state-of-the-art approaches at a fraction of the compute or memory cost. Contrary to prior work, our results reinforce the intuition that a model's prediction on one image is most impacted by visually similar training samples. Our approach serves as a simple and efficient baseline for data attribution on images.
FPGA-QHAR: Throughput-Optimized for Quantized Human Action Recognition on The Edge
Authors: Azzam Alhussain, Mingjie Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.03390
Pdf link: https://arxiv.org/pdf/2311.03390
Abstract Accelerating Human Action Recognition (HAR) efficiently for real-time surveillance and robotic systems on edge chips remains a challenging research field, given its high computational and memory requirements. This paper proposed an integrated end-to-end HAR scalable HW/SW accelerator co-design based on an enhanced 8-bit quantized Two-Stream SimpleNet-PyTorch CNN architecture. Our network accelerator was trained on UCF101 and UCF24 datasets and implemented on edge SoC-FPGA. Our development uses partially streaming dataflow architecture to achieve higher throughput versus network design and resource utilization trade-off. We also fused all convolutional, batch-norm, and ReLU operations into a single homogeneous layer and utilized the Lucas-Kanade motion flow method to enable a high parallelism accelerator design and optimized on-chip engine computing.Furthermore, our proposed methodology achieved nearly 81% prediction accuracy with an approximately 24 FPS real-time inference throughput at 187MHz on ZCU104, which is 1.7x - 1.9x higher than the prior research. Lastly, the designed framework was benchmarked against several hardware chips for higher throughput and performance measurements and is now available as an open-source project on GitHub for training and implementation on edge platforms.
Training Multi-layer Neural Networks on Ising Machine
Authors: Xujie Song, Tong Liu, Shengbo Eben Li, Jingliang Duan, Wenxuan Wang, Keqiang Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2311.03408
Pdf link: https://arxiv.org/pdf/2311.03408
Abstract As a dedicated quantum device, Ising machines could solve large-scale binary optimization problems in milliseconds. There is emerging interest in utilizing Ising machines to train feedforward neural networks due to the prosperity of generative artificial intelligence. However, existing methods can only train single-layer feedforward networks because of the complex nonlinear network topology. This paper proposes an Ising learning algorithm to train quantized neural network (QNN), by incorporating two essential techinques, namely binary representation of topological network and order reduction of loss function. As far as we know, this is the first algorithm to train multi-layer feedforward networks on Ising machines, providing an alternative to gradient-based backpropagation. Firstly, training QNN is formulated as a quadratic constrained binary optimization (QCBO) problem by representing neuron connection and activation function as equality constraints. All quantized variables are encoded by binary bits based on binary encoding protocol. Secondly, QCBO is converted to a quadratic unconstrained binary optimization (QUBO) problem, that can be efficiently solved on Ising machines. The conversion leverages both penalty function and Rosenberg order reduction, who together eliminate equality constraints and reduce high-order loss function into a quadratic one. With some assumptions, theoretical analysis shows the space complexity of our algorithm is $\mathcal{O}(H^2L + HLN\log H)$, quantifying the required number of Ising spins. Finally, the algorithm effectiveness is validated with a simulated Ising machine on MNIST dataset. After annealing 700 ms, the classification accuracy achieves 98.3%. Among 100 runs, the success probability of finding the optimal solution is 72%. Along with the increasing number of spins on Ising machine, our algorithm has the potential to train deeper neural networks.
PowerFlowNet: Leveraging Message Passing GNNs for Improved Power Flow Approximation
Authors: Nan Lin, Stavros Orfanoudakis, Nathan Ordonez Cardenas, Juan S. Giraldo, Pedro P. Vergara
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.03415
Pdf link: https://arxiv.org/pdf/2311.03415
Abstract Accurate and efficient power flow (PF) analysis is crucial in modern electrical networks' efficient operation and planning. Therefore, there is a need for scalable algorithms capable of handling large-scale power networks that can provide accurate and fast solutions. Graph Neural Networks (GNNs) have emerged as a promising approach for enhancing the speed of PF approximations by leveraging their ability to capture distinctive features from the underlying power network graph. In this study, we introduce PowerFlowNet, a novel GNN architecture for PF approximation that showcases similar performance with the traditional Newton-Raphson method but achieves it 4 times faster in the simple IEEE 14-bus system and 145 times faster in the realistic case of the French high voltage network (6470rte). Meanwhile, it significantly outperforms other traditional approximation methods, such as the DC relaxation method, in terms of performance and execution time; therefore, making PowerFlowNet a highly promising solution for real-world PF analysis. Furthermore, we verify the efficacy of our approach by conducting an in-depth experimental evaluation, thoroughly examining the performance, scalability, interpretability, and architectural dependability of PowerFlowNet. The evaluation provides insights into the behavior and potential applications of GNNs in power system analysis.
Federated Learning for Clinical Structured Data: A Benchmark Comparison of Engineering and Statistical Approaches
Authors: Siqi Li, Di Miao, Qiming Wu, Chuan Hong, Danny D'Agostino, Xin Li, Yilin Ning, Yuqing Shang, Huazhu Fu, Marcus Eng Hock Ong, Hamed Haddadi, Nan Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03417
Pdf link: https://arxiv.org/pdf/2311.03417
Abstract Federated learning (FL) has shown promising potential in safeguarding data privacy in healthcare collaborations. While the term "FL" was originally coined by the engineering community, the statistical field has also explored similar privacy-preserving algorithms. Statistical FL algorithms, however, remain considerably less recognized than their engineering counterparts. Our goal was to bridge the gap by presenting the first comprehensive comparison of FL frameworks from both engineering and statistical domains. We evaluated five FL frameworks using both simulated and real-world data. The results indicate that statistical FL algorithms yield less biased point estimates for model coefficients and offer convenient confidence interval estimations. In contrast, engineering-based methods tend to generate more accurate predictions, sometimes surpassing central pooled and statistical FL models. This study underscores the relative strengths and weaknesses of both types of methods, emphasizing the need for increased awareness and their integration in future FL applications.
Orion: A Fully Homomorphic Encryption Compiler for Private Deep Neural Network Inference
Authors: Austin Ebel, Karthik Garimella, Brandon Reagen
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.03470
Pdf link: https://arxiv.org/pdf/2311.03470
Abstract Fully Homomorphic Encryption (FHE) has the potential to substantially improve privacy and security by enabling computation on encrypted data. This is especially true with deep learning, as today many popular user services are powered by neural networks. One of the major challenges facing wide-scale deployment of FHE-secured neural inference is effectively mapping them to the FHE domain. FHE poses many programming challenges including packing large vectors, handling expensive rotations, and correctly implementing complex strided convolutions. This makes programming FHE inferences prone to poor performance and errors. In this paper we overcome these challenges with Orion, an automated optimizing FHE compiler for neural inference. Orion automatically maps PyTorch-specified networks to FHE, handling common layer types and arbitrary tensor shapes and strides. Moreover, we develop novel optimizations that balance dense FHE vector packing, efficient rotations, and minimize operations to improve performance. We have implemented Orion, which will be open sourced, and evaluated it on common benchmarks used by the FHE deep learning community. We compare Orion to multiple state-of-the-art solutions and report iso-accuracy speedups ranging from 2.7$\times$ to 20.5$\times$.
In-Context Exemplars as Clues to Retrieving from Large Associative Memory
Authors: Jiachen Zhao
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03498
Pdf link: https://arxiv.org/pdf/2311.03498
Abstract Recently, large language models (LLMs) have made remarkable progress in natural language processing. The most representative ability of LLMs is in-context learning (ICL), which enables LLMs to learn patterns from in-context exemplars without training. The performance of ICL greatly depends on the exemplars used. However, how to choose exemplars remains unclear due to the lack of understanding of how in-context learning works. In this paper, we present a novel perspective on ICL by conceptualizing it as contextual retrieval from a model of associative memory. We establish a theoretical framework of ICL based on Hopfield Networks. Based on our framework, we look into how in-context exemplars influence the performance of ICL and propose more efficient active exemplar selection. Our study sheds new light on the mechanism of ICL by connecting it to memory retrieval, with potential implications for advancing the understanding of LLMs.
MFAAN: Unveiling Audio Deepfakes with a Multi-Feature Authenticity Network
Authors: Karthik Sivarama Krishnan, Koushik Sivarama Krishnan
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.03509
Pdf link: https://arxiv.org/pdf/2311.03509
Abstract In the contemporary digital age, the proliferation of deepfakes presents a formidable challenge to the sanctity of information dissemination. Audio deepfakes, in particular, can be deceptively realistic, posing significant risks in misinformation campaigns. To address this threat, we introduce the Multi-Feature Audio Authenticity Network (MFAAN), an advanced architecture tailored for the detection of fabricated audio content. MFAAN incorporates multiple parallel paths designed to harness the strengths of different audio representations, including Mel-frequency cepstral coefficients (MFCC), linear-frequency cepstral coefficients (LFCC), and Chroma Short Time Fourier Transform (Chroma-STFT). By synergistically fusing these features, MFAAN achieves a nuanced understanding of audio content, facilitating robust differentiation between genuine and manipulated recordings. Preliminary evaluations of MFAAN on two benchmark datasets, 'In-the-Wild' Audio Deepfake Data and The Fake-or-Real Dataset, demonstrate its superior performance, achieving accuracies of 98.93% and 94.47% respectively. Such results not only underscore the efficacy of MFAAN but also highlight its potential as a pivotal tool in the ongoing battle against deepfake audio content.
Brain Networks and Intelligence: A Graph Neural Network Based Approach to Resting State fMRI Data
Authors: Bishal Thapaliya, Esra Akbas, Jiayu Chen, Raam Sapkota, Bhaskar Ray, Pranav Suresh, Vince Calhoun, Jingyu Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2311.03520
Pdf link: https://arxiv.org/pdf/2311.03520
Abstract Resting-state functional magnetic resonance imaging (rsfMRI) is a powerful tool for investigating the relationship between brain function and cognitive processes as it allows for the functional organization of the brain to be captured without relying on a specific task or stimuli. In this paper, we present a novel modeling architecture called BrainRGIN for predicting intelligence (fluid, crystallized, and total intelligence) using graph neural networks on rsfMRI derived static functional network connectivity matrices. Extending from the existing graph convolution networks, our approach incorporates a clustering-based embedding and graph isomorphism network in the graph convolutional layer to reflect the nature of the brain sub-network organization and efficient network expression, in combination with TopK pooling and attention-based readout functions. We evaluated our proposed architecture on a large dataset, specifically the Adolescent Brain Cognitive Development Dataset, and demonstrated its effectiveness in predicting individual differences in intelligence. Our model achieved lower mean squared errors and higher correlation scores than existing relevant graph architectures and other traditional machine learning models for all of the intelligence prediction tasks. The middle frontal gyrus exhibited a significant contribution to both fluid and crystallized intelligence, suggesting their pivotal role in these cognitive processes. Total composite scores identified a diverse set of brain regions to be relevant which underscores the complex nature of total intelligence.
Towards Automated Negative Sampling in Implicit Recommendation
Authors: Fuyuan Lyu, Yaochen Hu, Xing Tang, Yingxue Zhang, Ruiming Tang, Xue Liu
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.03526
Pdf link: https://arxiv.org/pdf/2311.03526
Abstract Negative sampling methods are vital in implicit recommendation models as they allow us to obtain negative instances from massive unlabeled data. Most existing approaches focus on sampling hard negative samples in various ways. These studies are orthogonal to the recommendation model and implicit datasets. However, such an idea contradicts the common belief in AutoML that the model and dataset should be matched. Empirical experiments suggest that the best-performing negative sampler depends on the implicit dataset and the specific recommendation model. Hence, we propose a hypothesis that the negative sampler should align with the capacity of the recommendation models as well as the statistics of the datasets to achieve optimal performance. A mismatch between these three would result in sub-optimal outcomes. An intuitive idea to address the mismatch problem is to exhaustively select the best-performing negative sampler given the model and dataset. However, such an approach is computationally expensive and time-consuming, leaving the problem unsolved. In this work, we propose the AutoSample framework that adaptively selects the best-performing negative sampler among candidates. Specifically, we propose a loss-to-instance approximation to transform the negative sampler search task into the learning task over a weighted sum, enabling end-to-end training of the model. We also designed an adaptive search algorithm to extensively and efficiently explore the search space. A specific initialization approach is also obtained to better utilize the obtained model parameters during the search stage, which is similar to curriculum learning and leads to better performance and less computation resource consumption. We evaluate the proposed framework on four benchmarks over three models. Extensive experiments demonstrate the effectiveness and efficiency of our proposed framework.
PcLast: Discovering Plannable Continuous Latent States
Authors: Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.03534
Pdf link: https://arxiv.org/pdf/2311.03534
Abstract Goal-conditioned planning benefits from learned low-dimensional representations of rich, high-dimensional observations. While compact latent representations, typically learned from variational autoencoders or inverse dynamics, enable goal-conditioned planning they ignore state affordances, thus hampering their sample-efficient planning capabilities. In this paper, we learn a representation that associates reachable states together for effective onward planning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information); and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based and reward-free settings show significant improvements in sampling efficiency, and yields layered state abstractions that enable computationally efficient hierarchical planning.
Indexing Techniques for Graph Reachability Queries
Authors: Chao Zhang, Angela Bonifati, M. Tamer Özsu
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2311.03542
Pdf link: https://arxiv.org/pdf/2311.03542
Abstract We survey graph reachability indexing techniques for efficient processing of graph reachability queries in two types of popular graph models: plain graphs and edge-labeled graphs. Reachability queries are fundamental in graph processing, and reachability indexes are specialized data structures tailored for speeding up such queries. Work on this topic goes back four decades -- we include 33 of the proposed techniques. Plain graphs contain only vertices and edges, with reachability queries checking path existence between a source and target vertex. Edge-labeled graphs, in contrast, augment plain graphs by adding edge labels. Reachability queries in edge-labeled graphs incorporate path constraints based on edge labels, assessing both path existence and compliance with constraints. We categorize techniques in both plain and edge-labeled graphs and discuss the approaches according to this classification, using existing techniques as exemplars. We discuss the main challenges within each class and how these might be addressed in other approaches. We conclude with a discussion of the open research challenges and future research directions, along the lines of integrating reachability indexes into graph data management systems. This survey serves as a comprehensive resource for researchers and practitioners interested in the advancements, techniques, and challenges on reachability indexing in graph analytics.
Time-optimal Design and Control of Electric Race Cars Equipped with Multi-speed Transmissions
Authors: Camiel Cartignij, Mauro Salazar
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.03545
Pdf link: https://arxiv.org/pdf/2311.03545
Abstract This paper presents a framework to jointly optimize the design and control of an electric race car equipped with a multiple-gear transmission, specifically accounting for the discrete gearshift dynamics. We formulate the problem as a mixed-integer optimal control problem, and deal with its complexity by combining convex optimization and Pontryagin's Minimum Principle in a computationally efficient iterative algorithm satisfying necessary conditions for optimality upon convergence. Finally, we leverage our framework to compute the achievable lap time of a race car equipped with a fixed-gear transmission, a continuously variable transmission and a multiple-gear transmission with 2 to 4 speeds, revealing that a multiple-gear transmission can strike the best trade-off in terms of electric motor control, and transmission weight and efficiency, ultimately yielding the overall best lap time.
Scalable and Efficient Continual Learning from Demonstration via Hypernetwork-generated Stable Dynamics Model
Authors: Sayantan Auddy, Jakob Hollenstein, Matteo Saveriano, Antonio Rodríguez-Sánchez, Justus Piater
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.03600
Pdf link: https://arxiv.org/pdf/2311.03600
Abstract Learning from demonstration (LfD) provides an efficient way to train robots. The learned motions should be convergent and stable, but to be truly effective in the real world, LfD-capable robots should also be able to remember multiple motion skills. Multi-skill retention is a capability missing from existing stable-LfD approaches. On the other hand, recent work on continual-LfD has shown that hypernetwork-generated neural ordinary differential equation solvers, can learn multiple LfD tasks sequentially, but this approach lacks stability guarantees. We propose an approach for stable continual-LfD in which a hypernetwork generates two networks: a trajectory learning dynamics model, and a trajectory stabilizing Lyapunov function. The introduction of stability not only generates stable trajectories but also greatly improves continual learning performance, especially in the size-efficient chunked hypernetworks. With our approach, we can continually train a single model to predict the position and orientation trajectories of the robot's end-effector simultaneously for multiple real world tasks without retraining on past demonstrations. We also propose stochastic regularization with a single randomly sampled regularization term in hypernetworks, which reduces the cumulative training time cost for $N$ tasks from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$ without any loss in performance in real-world tasks. We empirically evaluate our approach on the popular LASA dataset, on high-dimensional extensions of LASA (including up to 32 dimensions) to assess scalability, and on a novel extended robotic task dataset (RoboTasks9) to assess real-world performance. In trajectory error metrics, stability metrics and continual learning metrics our approach performs favorably, compared to other baselines. Code and datasets will be shared after submission.
CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers
Authors: Jieming Bian, Shaolei Ren, Jie Xu
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.03615
Pdf link: https://arxiv.org/pdf/2311.03615
Abstract Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.
TWIST: Teacher-Student World Model Distillation for Efficient Sim-to-Real Transfer
Authors: Jun Yamada, Marc Rigter, Jack Collins, Ingmar Posner
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03622
Pdf link: https://arxiv.org/pdf/2311.03622
Abstract Model-based RL is a promising approach for real-world robotics due to its improved sample efficiency and generalization capabilities compared to model-free RL. However, effective model-based RL solutions for vision-based real-world applications require bridging the sim-to-real gap for any world model learnt. Due to its significant computational cost, standard domain randomisation does not provide an effective solution to this problem. This paper proposes TWIST (Teacher-Student World Model Distillation for Sim-to-Real Transfer) to achieve efficient sim-to-real transfer of vision-based model-based RL using distillation. Specifically, TWIST leverages state observations as readily accessible, privileged information commonly garnered from a simulator to significantly accelerate sim-to-real transfer. Specifically, a teacher world model is trained efficiently on state information. At the same time, a matching dataset is collected of domain-randomised image observations. The teacher world model then supervises a student world model that takes the domain-randomised image observations as input. By distilling the learned latent dynamics model from the teacher to the student model, TWIST achieves efficient and effective sim-to-real transfer for vision-based model-based RL tasks. Experiments in simulated and real robotics tasks demonstrate that our approach outperforms naive domain randomisation and model-free methods in terms of sample efficiency and task performance of sim-to-real transfer.
Stochastic convergence of regularized solutions for backward heat conduction problems
Authors: Zhongjian Wang, Wenlong Zhang, Zhiwen Zhang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.03623
Pdf link: https://arxiv.org/pdf/2311.03623
Abstract In this paper, we study the stochastic convergence of regularized solutions for backward heat conduction problems. These problems are recognized as ill-posed due to the exponential decay of eigenvalues associated with the forward problems. We derive an error estimate for the least-squares regularized minimization problem within the framework of stochastic convergence. Our analysis reveals that the optimal error of the Tikhonov-type least-squares optimization problem depends on the noise level, the number of sensors, and the underlying ground truth. Moreover, we propose a self-adaptive algorithm to identify the optimal regularization parameter for the optimization problem without requiring knowledge of the noise level or any other prior information, which will be very practical in applications. We present numerical examples to demonstrate the accuracy and efficiency of our proposed method. These numerical results show that our method is efficient in solving backward heat conduction problems.
Reinforcement Twinning: from digital twins to model-based reinforcement learning
Authors: Lorenzo Schena, Pedro Marques, Romain Poletti, Samuel Ahizi, Jan Van den Berghe, Miguel A. Mendez
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.03628
Pdf link: https://arxiv.org/pdf/2311.03628
Abstract We propose a novel framework for simultaneously training the digital twin of an engineering system and an associated control agent. The training of the twin combines methods from data assimilation and system identification, while the training of the control agent combines model-based optimal control and model-free reinforcement learning. The combined training of the control agent is achieved by letting it evolve independently along two paths (one driven by a model-based optimal control and another driven by reinforcement learning) and using the virtual environment offered by the digital twin as a playground for confrontation and indirect interaction. This interaction occurs as an "expert demonstrator", where the best policy is selected for the interaction with the real environment and "taught" to the other if the independent training stagnates. We refer to this framework as Reinforcement Twinning (RT). The framework is tested on three vastly different engineering systems and control tasks, namely (1) the control of a wind turbine subject to time-varying wind speed, (2) the trajectory control of flapping-wing micro air vehicles (FWMAVs) subject to wind gusts, and (3) the mitigation of thermal loads in the management of cryogenic storage tanks. The test cases are implemented using simplified models for which the ground truth on the closure law is available. The results show that the adjoint-based training of the digital twin is remarkably sample-efficient and completed within a few iterations. Concerning the control agent training, the results show that the model-based and the model-free control training benefit from the learning experience and the complementary learning approach of each other. The encouraging results open the path towards implementing the RT framework on real systems.
Novel data structures for label based queries specifically efficient for billion+ property graph networks using Kinetica-Graph
Authors: Bilge Kaan Karamete, Eli Glaser
Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2311.03631
Pdf link: https://arxiv.org/pdf/2311.03631
Abstract This paper discusses a novel data structure that efficiently implements label based graph queries particularly for very large graphs. The major issues in large graph databases is the memory foot-print of label based property associations to graph entities and subsequent query speeds. To this end, unlike the available graph databases, that use key-value pairs using map like associative containers, we have devised a novel data structure that is superior in its memory foot-print as well as its fast search characteristics without any compromise on the number of labels that can be associated to graph nodes and edges. We will demonstrate the power of this novel unconventional data structure over billion plus graphs within the context.
A Physics-Guided Bi-Fidelity Fourier-Featured Operator Learning Framework for Predicting Time Evolution of Drag and Lift Coefficients
Authors: Amirhossein Mollaali, Izzet Sahin, Iqrar Raza, Christian Moya, Guillermo Paniagua, Guang Lin
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
Arxiv link: https://arxiv.org/abs/2311.03639
Pdf link: https://arxiv.org/pdf/2311.03639
Abstract In the pursuit of accurate experimental and computational data while minimizing effort, there is a constant need for high-fidelity results. However, achieving such results often requires significant computational resources. To address this challenge, this paper proposes a deep operator learning-based framework that requires a limited high-fidelity dataset for training. We introduce a novel physics-guided, bi-fidelity, Fourier-featured Deep Operator Network (DeepONet) framework that effectively combines low and high-fidelity datasets, leveraging the strengths of each. In our methodology, we began by designing a physics-guided Fourier-featured DeepONet, drawing inspiration from the intrinsic physical behavior of the target solution. Subsequently, we train this network to primarily learn the low-fidelity solution, utilizing an extensive dataset. This process ensures a comprehensive grasp of the foundational solution patterns. Following this foundational learning, the low-fidelity deep operator network's output is enhanced using a physics-guided Fourier-featured residual deep operator network. This network refines the initial low-fidelity output, achieving the high-fidelity solution by employing a small high-fidelity dataset for training. Notably, in our framework, we employ the Fourier feature network as the Trunk network for the DeepONets, given its proficiency in capturing and learning the oscillatory nature of the target solution with high precision. We validate our approach using a well-known 2D benchmark cylinder problem, which aims to predict the time trajectories of lift and drag coefficients. The results highlight that the physics-guided Fourier-featured deep operator network, serving as a foundational building block of our framework, possesses superior predictive capability for the lift and drag coefficients compared to its data-driven counterparts.
Instruct Me More! Random Prompting for Visual In-Context Learning
Authors: Jiahao Zhang, Bowen Wang, Liangzhi Li, Yuta Nakashima, Hajime Nagahara
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03648
Pdf link: https://arxiv.org/pdf/2311.03648
Abstract Large-scale models trained on extensive datasets, have emerged as the preferred approach due to their high generalizability across various tasks. In-context learning (ICL), a popular strategy in natural language processing, uses such models for different tasks by providing instructive prompts but without updating model parameters. This idea is now being explored in computer vision, where an input-output image pair (called an in-context pair) is supplied to the model with a query image as a prompt to exemplify the desired output. The efficacy of visual ICL often depends on the quality of the prompts. We thus introduce a method coined Instruct Me More (InMeMo), which augments in-context pairs with a learnable perturbation (prompt), to explore its potential. Our experiments on mainstream tasks reveal that InMeMo surpasses the current state-of-the-art performance. Specifically, compared to the baseline without learnable prompt, InMeMo boosts mIoU scores by 7.35 and 15.13 for foreground segmentation and single object detection tasks, respectively. Our findings suggest that InMeMo offers a versatile and efficient way to enhance the performance of visual ICL with lightweight training. Code is available at https://github.com/Jackieam/InMeMo.
On the Performance of LoRa Empowered Communication for Wireless Body Area Networks
Authors: Minling Zhang, Guofa Cai, Zhiping Xu, Jiguang He, Markku Juntti
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.03653
Pdf link: https://arxiv.org/pdf/2311.03653
Abstract To remotely monitor the physiological status of the human body, long range (LoRa) communication has been considered as an eminently suitable candidate for wireless body area networks (WBANs). Typically, a Rayleigh-lognormal fading channel is encountered by the LoRa links of the WBAN. In this context, we characterize the performance of the LoRa system in WBAN scenarios with an emphasis on the physical (PHY) layer and medium access control (MAC) layer in the face of Rayleigh-lognormal fading channels and the same spreading factor interference. Specifically, closed-form approximate bit error probability (BEP) expressions are derived for the LoRa system. The results show that increasing the SF and reducing the interference efficiently mitigate the shadowing effects. Moreover, in the quest for the most suitable MAC protocol for LoRa based WBANs, three MAC protocols are critically appraised, namely the pure ALOHA, slotted ALOHA, and carrier-sense multiple access. The coverage probability, energy efficiency, throughput, and system delay of the three MAC protocols are analyzed in Rayleigh-lognormal fading channel. Furthermore, the performance of the equal-interval-based and equal-area-based schemes is analyzed to guide the choice of the SF. Our simulation results confirm the accuracy of the mathematical analysis and provide some useful insights for the future design of LoRa based WBANs.
Faster Algorithms for Cycle Hitting Problems on Disk Graphs
Authors: Shinwoo An, Kyungjin Cho, Eunjin Oh
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.03665
Pdf link: https://arxiv.org/pdf/2311.03665
Abstract In this paper, we consider three hitting problems on a disk intersection graph: Triangle Hitting Set, Feedback Vertex Set, and Odd Cycle Transversal. Given a disk intersection graph $G$, our goal is to compute a set of vertices hitting all triangles, all cycles, or all odd cycles, respectively. Our algorithms run in time $2^{\tilde O(k^{4/5})}n^{O(1)}$, $2^{\tilde O(k^{9/10})}n^{O(1)}$, and $2^{\tilde O(k^{19/20})}n^{O(1)}$, respectively, where $n$ denotes the number of vertices of $G$. These do not require a geometric representation of a disk graph. If a geometric representation of a disk graph is given as input, we can solve these problems more efficiently. In this way, we improve the algorithms for those three problem by Lokshtanov et al. [SODA 2022].
Incentive Design for Eco-driving in Urban Transportation Networks
Authors: M. Umar B. Niazi, Jung-Hoon Cho, Munther A. Dahleh, Roy Dong, Cathy Wu
Subjects: Systems and Control (eess.SY); Social and Information Networks (cs.SI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.03682
Pdf link: https://arxiv.org/pdf/2311.03682
Abstract Eco-driving emerges as a cost-effective and efficient strategy to mitigate greenhouse gas emissions in urban transportation networks. Acknowledging the persuasive influence of incentives in shaping driver behavior, this paper presents the `eco-planner,' a digital platform devised to promote eco-driving practices in urban transportation. At the outset of their trips, users provide the platform with their trip details and travel time preferences, enabling the eco-planner to formulate personalized eco-driving recommendations and corresponding incentives, while adhering to its budgetary constraints. Upon trip completion, incentives are transferred to users who comply with the recommendations and effectively reduce their emissions. By comparing our proposed incentive mechanism with a baseline scheme that offers uniform incentives to all users, we demonstrate that our approach achieves superior emission reductions and increased user compliance with a smaller budget.
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Authors: Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu
Subjects: Performance (cs.PF); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03687
Pdf link: https://arxiv.org/pdf/2311.03687
Abstract Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs.
Efficient Bottom-Up Synthesis for Programs with Local Variables
Authors: Xiang Li, Xiangyu Zhou, Rui Dong, Yihong Zhang, Xinyu Wang
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.03705
Pdf link: https://arxiv.org/pdf/2311.03705
Abstract We propose a new synthesis algorithm that can efficiently search programs with local variables (e.g., those introduced by lambdas). Prior bottom-up synthesis algorithms are not able to evaluate programs with free local variables, and therefore cannot effectively reduce the search space of such programs (e.g., using standard observational equivalence reduction techniques), making synthesis slow. Our algorithm can reduce the space of programs with local variables. The key idea, dubbed lifted interpretation, is to lift up the program interpretation process, from evaluating one program at a time to simultaneously evaluating all programs from a grammar. Lifted interpretation provides a mechanism to systematically enumerate all binding contexts for local variables, thereby enabling us to evaluate and reduce the space of programs with local variables. Our ideas are instantiated in the domain of web automation. The resulting tool, Arborist, can automate a significantly broader range of challenging tasks more efficiently than state-of-the-art techniques including WebRobot and Helena.
Contributions of Individual Generators to Nodal Carbon Emissions
Authors: Yize Chen, Deepyjoti Deka, Yuanyuan Shi
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.03712
Pdf link: https://arxiv.org/pdf/2311.03712
Abstract Recent shifts toward sustainable energy systems have witnessed the fast deployment of carbon-free and carbon-efficient generations across the power networks. However, the benefits of carbon reduction are not experienced evenly throughout the grid. Each generator can have distinct carbon emission rates. Due to the existence of physical power flows, nodal power consumption is met by a combination of a set of generators, while such combination is determined by network topology, generators' characteristics and power demand. This paper describes a technique based on physical power flow model, which can efficiently compute the nodal carbon emissions contributed by each single generator given the generation and power flow information. We also extend the technique to calculate both the nodal average carbon emission and marginal carbon emission rates. Simulation results validate the effectiveness of the calculations, while our technique provides a fundamental tool for applications such as carbon auditing, carbon-oriented demand management and future carbon-oriented capacity expansion.
Loss Balancing for Fair Supervised Learning
Authors: Mohammad Mahdi Khalili, Xueru Zhang, Mahed Abroshan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03714
Pdf link: https://arxiv.org/pdf/2311.03714
Abstract Supervised learning models have been used in various domains such as lending, college admission, face recognition, natural language processing, etc. However, they may inherit pre-existing biases from training data and exhibit discrimination against protected social groups. Various fairness notions have been proposed to address unfairness issues. In this work, we focus on Equalized Loss (EL), a fairness notion that requires the expected loss to be (approximately) equalized across different groups. Imposing EL on the learning process leads to a non-convex optimization problem even if the loss function is convex, and the existing fair learning algorithms cannot properly be adopted to find the fair predictor under the EL constraint. This paper introduces an algorithm that can leverage off-the-shelf convex programming tools (e.g., CVXPY) to efficiently find the global optimum of this non-convex optimization. In particular, we propose the ELminimizer algorithm, which finds the optimal fair predictor under EL by reducing the non-convex optimization to a sequence of convex optimization problems. We theoretically prove that our algorithm finds the global optimal solution under certain conditions. Then, we support our theoretical results through several empirical studies.
Learning to Learn for Few-shot Continual Active Learning
Authors: Stella Ho, Ming Liu, Shang Gao, Longxiang Gao
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.03732
Pdf link: https://arxiv.org/pdf/2311.03732
Abstract Continual learning strives to ensure stability in solving previously seen tasks while demonstrating plasticity in a novel domain. Recent advances in CL are mostly confined to a supervised learning setting, especially in NLP domain. In this work, we consider a few-shot continual active learning (CAL) setting where labeled data is inadequate, and unlabeled data is abundant but with a limited annotation budget. We propose a simple but efficient method, called Meta-Continual Active Learning. Specifically, we employ meta-learning and experience replay to address the trade-off between stability and plasticity. As a result, it finds an optimal initialization that efficiently utilizes annotated information for fast adaptation while preventing catastrophic forgetting of past tasks. We conduct extensive experiments to validate the effectiveness of the proposed method and analyze the effect of various active learning strategies and memory sample selection methods in a few-shot CAL setup. Our experiment results demonstrate that random sampling is the best default strategy for both active learning and memory sample selection to solve few-shot CAL problems.
Improved weight initialization for deep and narrow feedforward neural network
Authors: Hyunwoo Lee, Yunho Kim, Seungyeop Yang, Hayoung Choi
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2311.03733
Pdf link: https://arxiv.org/pdf/2311.03733
Abstract Appropriate weight initialization settings, along with the ReLU activation function, have been a cornerstone of modern deep learning, making it possible to train and deploy highly effective and efficient neural network models across diverse artificial intelligence. The problem of dying ReLU, where ReLU neurons become inactive and yield zero output, presents a significant challenge in the training of deep neural networks with ReLU activation function. Theoretical research and various methods have been introduced to address the problem. However, even with these methods and research, training remains challenging for extremely deep and narrow feedforward networks with ReLU activation function. In this paper, we propose a new weight initialization method to address this issue. We prove the properties of the proposed initial weight matrix and demonstrate how these properties facilitate the effective propagation of signal vectors. Through a series of experiments and comparisons with existing methods, we demonstrate the effectiveness of the new initialization method.
Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse Finetuning
Authors: Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Peng Shi, Wenpeng Yin, Rui Zhang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.03748
Pdf link: https://arxiv.org/pdf/2311.03748
Abstract Unified Sequence Labeling that articulates different sequence labeling problems such as Named Entity Recognition, Relation Extraction, Semantic Role Labeling, etc. in a generalized sequence-to-sequence format opens up the opportunity to make the maximum utilization of large language model knowledge toward structured prediction. Unfortunately, this requires formatting them into specialized augmented format unknown to the base pretrained language model (PLMs) necessitating finetuning to the target format. This significantly bounds its usefulness in data-limited settings where finetuning large models cannot properly generalize to the target format. To address this challenge and leverage PLM knowledge effectively, we propose FISH-DIP, a sample-aware dynamic sparse finetuning strategy that selectively focuses on a fraction of parameters, informed by feedback from highly regressing examples, during the fine-tuning process. By leveraging the dynamism of sparsity, our approach mitigates the impact of well-learned samples and prioritizes underperforming instances for improvement in generalization. Across five tasks of sequence labeling, we demonstrate that FISH-DIP can smoothly optimize the model in low resource settings offering upto 40% performance improvements over full fine-tuning depending on target evaluation settings. Also, compared to in-context learning and other parameter-efficient fine-tuning approaches, FISH-DIP performs comparably or better, notably in extreme low-resource settings.
Augmenting Radio Signals with Wavelet Transform for Deep Learning-Based Modulation Recognition
Authors: Tao Chen, Shilian Zheng, Kunfeng Qiu, Luxin Zhang, Qi Xuan, Xiaoniu Yang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.03761
Pdf link: https://arxiv.org/pdf/2311.03761
Abstract The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the diversity and quantity of training dataset and to reduce data sparsity and imbalance. In this paper, we propose data augmentation methods that involve replacing detail coefficients decomposed by discrete wavelet transform for reconstructing to generate new samples and expand the training set. Different generation methods are used to generate replacement sequences. Simulation results indicate that our proposed methods significantly outperform the other augmentation methods.
Asymptotically Steerable Finite Fourier-Bessel Transforms and Closure under Convolution
Authors: Arash Ghaani Farashahi, Gregory S. Chirikjian
Subjects: Numerical Analysis (math.NA); Functional Analysis (math.FA)
Arxiv link: https://arxiv.org/abs/2311.03772
Pdf link: https://arxiv.org/pdf/2311.03772
Abstract This paper develops a constructive numerical scheme for Fourier-Bessel approximations on disks compatible with convolutions supported on disks. We address accurate finite Fourier-Bessel transforms (FFBT) and inverse finite Fourier-Bessel transforms (iFFBT) of functions on disks using the discrete Fourier Transform (DFT) on Cartesian grids. Whereas the DFT and its fast implementation (FFT) are ubiquitous and are powerful for computing convolutions, they are not exactly steerable under rotations. In contrast, Fourier-Bessel expansions are steerable, but lose both this property and the preservation of band limits under convolution. This work captures the best features of both as the band limit is allowed to increase. The convergence/error analysis and asymptotic steerability of FFBT/ iFFBT are investigated. Conditions are established for the FFBT to converge to the Fourier-Bessel coefficient and for the iFFBT to uniformly approximate the Fourier-Bessel partial sums. The matrix form of the finite transforms is discussed. The implementation of the discrete method to compute numerical approximation of convolutions of compactly supported functions on disks is considered as well.
Multi-Beam Forming with Movable-Antenna Array
Authors: Wenyan Ma, Lipeng Zhu, Rui Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.03775
Pdf link: https://arxiv.org/pdf/2311.03775
Abstract Conventional multi-beam forming with fixed-position antenna (FPA) arrays needs to trade-off between maximizing the beamforming gain over desired directions and minimizing the interference power over undesired directions. In this letter, we study the enhanced multi-beam forming with a linear movable-antenna (MA) array by exploiting the new degrees of freedom (DoFs) via antennas' position optimization. Specifically, we jointly optimize the antenna position vector (APV) and antenna weight vector (AWV) to maximize the minimum beamforming gain over multiple desired directions, subject to a given constraint on the maximum interference power over undesired directions. We propose an efficient alternating optimization algorithm to find a suboptimal solution by iteratively optimizing one of the APV and AWV with the other being fixed. Numerical results show that the proposed multi-beam forming design with MA arrays can significantly outperform that with the traditional FPA arrays and other benchmark schemes in terms of both beamforming gain and interference suppression.
CapST: An Enhanced and Lightweight Method for Deepfake Video Classification
Authors: Wasim Ahmad, Yan-Tsung Peng, Yuan-Hao Chang, Gaddisa Olani Ganfure, Sarwar Khan, Sahibzada Adil Shahzad
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03782
Pdf link: https://arxiv.org/pdf/2311.03782
Abstract The proliferation of deepfake videos, synthetic media produced through advanced Artificial Intelligence techniques has raised significant concerns across various sectors, encompassing realms such as politics, entertainment, and security. In response, this research introduces an innovative and streamlined model designed to classify deepfake videos generated by five distinct encoders adeptly. Our approach not only achieves state of the art performance but also optimizes computational resources. At its core, our solution employs part of a VGG19bn as a backbone to efficiently extract features, a strategy proven effective in image-related tasks. We integrate a Capsule Network coupled with a Spatial Temporal attention mechanism to bolster the model's classification capabilities while conserving resources. This combination captures intricate hierarchies among features, facilitating robust identification of deepfake attributes. Delving into the intricacies of our innovation, we introduce an existing video level fusion technique that artfully capitalizes on temporal attention mechanisms. This mechanism serves to handle concatenated feature vectors, capitalizing on the intrinsic temporal dependencies embedded within deepfake videos. By aggregating insights across frames, our model gains a holistic comprehension of video content, resulting in more precise predictions. Experimental results on an extensive benchmark dataset of deepfake videos called DFDM showcase the efficacy of our proposed method. Notably, our approach achieves up to a 4 percent improvement in accurately categorizing deepfake videos compared to baseline models, all while demanding fewer computational resources.
Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models
Authors: Yichao Cao, Qingfei Tang, Xiu Su, Chen Song, Shan You, Xiaobo Lu, Chang Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03799
Pdf link: https://arxiv.org/pdf/2311.03799
Abstract Human-object interaction (HOI) detection aims to comprehend the intricate relationships between humans and objects, predicting $<human, action, object>$ triplets, and serving as the foundation for numerous computer vision tasks. The complexity and diversity of human-object interactions in the real world, however, pose significant challenges for both annotation and recognition, particularly in recognizing interactions within an open world context. This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs). The proposed method is dubbed as \emph{\textbf{UniHOI}}. We conduct a deep analysis of the three hierarchical features inherent in visual HOI detectors and propose a method for high-level relation extraction aimed at VL foundation models, which we call HO prompt-based learning. Our design includes an HO Prompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image. Furthermore, we utilize a LLM (\emph{i.e.} GPT) for interaction interpretation, generating a richer linguistic understanding for complex HOIs. For open-category interaction recognition, our method supports either of two input types: interaction phrase or interpretive sentence. Our efficient architecture design and learning methods effectively unleash the potential of the VL foundation models and LLMs, allowing UniHOI to surpass all existing methods with a substantial margin, under both supervised and zero-shot settings. The code and pre-trained weights are available at: \url{https://github.com/Caoyichao/UniHOI}.
Data-informed uncertainty quantification for laser-based powder bed fusion additive manufacturing
Authors: Mihaela Chiappetta, Chiara Piazzola, Lorenzo Tamellini, Alessandro Reali, Ferdinando Auricchio, Massimo Carraturo
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2311.03823
Pdf link: https://arxiv.org/pdf/2311.03823
Abstract We present an efficient approach to quantify the uncertainties associated with the numerical simulations of the laser-based powder bed fusion of metal processes. Our study focuses on a thermomechanical model of an Inconel 625 cantilever beam, based on the AMBench2018-01 benchmark proposed by the National Institute of Standards and Technology (NIST). The proposed approach consists of a forward uncertainty quantification analysis of the residual strain of the cantilever beam given the uncertainty on some of the parameters of the numerical simulation, namely the powder convection coefficient and the activation temperature. The uncertainty on such parameters is modeled by a data-informed probability density function obtained by a Bayesian inversion procedure, based on the displacement experimental data provided by NIST. To overcome the computational challenges of both the Bayesian inversion and the forward uncertainty quantification analysis we employ a multi-fidelity surrogate modeling technique, specifically the multi-index stochastic collocation method. The proposed approach allows us to achieve a 33\% reduction in the uncertainties on the prediction of residual strains compared with what we would get basing the forward UQ analysis on a-priori ranges for the uncertain parameters, and in particular the mode of the probability density function of such quantities (i.e., its ``most likely value'', roughly speaking) results to be in good agreement with the experimental data provided by NIST, even though only displacement data were used for the Bayesian inversion procedure.
On Deep Reinforcement Learning for Traffic Steering Intelligent ORAN
Authors: Fatemeh Kavehmadavani, Van-Dinh Nguyen, Thang X. Vu, Symeon Chatzinotas
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.03853
Pdf link: https://arxiv.org/pdf/2311.03853
Abstract This paper aims to develop the intelligent traffic steering (TS) framework, which has recently been considered as one of the key developments of 3GPP for advanced 5G. Since achieving key performance indicators (KPIs) for heterogeneous services may not be possible in the monolithic architecture, a novel deep reinforcement learning (DRL)-based TS algorithm is proposed at the non-real-time (non-RT) RAN intelligent controller (RIC) within the open radio access network (ORAN) architecture. To enable ORAN's intelligence, we distribute traffic load onto appropriate paths, which helps efficiently allocate resources to end users in a downlink multi-service scenario. Our proposed approach employs a three-step hierarchical process that involves heuristics, machine learning, and convex optimization to steer traffic flows. Through system-level simulations, we show the superior performance of the proposed intelligent TS scheme, surpassing established benchmark systems by 45.50%.
Design and Experimental Verification of a Jumping Legged Robot for Martian Lava Tube Exploration
Authors: Jørgen Anker Olsen, Kostas Alexis
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.03854
Pdf link: https://arxiv.org/pdf/2311.03854
Abstract The potential of Martian lava tubes for resource extraction and habitat sheltering highlights the need for robots capable to undertake the grueling task of their exploration. Driven by this motivation, in this work we introduce a legged robot system optimized for jumping in the low gravity of Mars, designed with leg configurations adaptable to both bipedal and quadrupedal systems. This design utilizes torque-controlled actuators coupled with springs for high-power jumping, robust locomotion, and an energy-efficient resting pose. Key design features include a 5-bar mechanism as leg concept, combined with springs connected by a high-strength cord. The selected 5-bar link lengths and spring stiffness were optimized for maximizing the jump height in Martian gravity and realized as a robot leg. Two such legs combined with a compact body allowed jump testing of a bipedal prototype. The robot is 0.472 m tall and weighs 7.9 kg. Jump testing with significant safety margins resulted in a measured jump height of 1.141 m in Earth's gravity, while a total of 4 jumping experiments are presented. Simulations utilizing the full motor torque and kinematic limits of the design resulted in a maximum possible jump height of 1.52 m in Earth's gravity and 3.63 m in Mars' gravity, highlighting the versatility of jumping as a form of locomotion and overcoming obstacles in lower gravity.
Hypergraphs with node attributes: structure and inference
Authors: Anna Badalyan, Nicolò Ruggeri, Caterina De Bacco
Subjects: Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.03857
Pdf link: https://arxiv.org/pdf/2311.03857
Abstract Many networked datasets with units interacting in groups of two or more, encoded with hypergraphs, are accompanied by extra information about nodes, such as the role of an individual in a workplace. Here we show how these node attributes can be used to improve our understanding of the structure resulting from higher-order interactions. We consider the problem of community detection in hypergraphs and develop a principled model that combines higher-order interactions and node attributes to better represent the observed interactions and to detect communities more accurately than using either of these types of information alone. The method learns automatically from the input data the extent to which structure and attributes contribute to explain the data, down weighing or discarding attributes if not informative. Our algorithmic implementation is efficient and scales to large hypergraphs and interactions of large numbers of units. We apply our method to a variety of systems, showing strong performance in hyperedge prediction tasks and in selecting community divisions that correlate with attributes when these are informative, but discarding them otherwise. Our approach illustrates the advantage of using informative node attributes when available with higher-order data.
FD-MIA: Efficient Attacks on Fairness-enhanced Models
Authors: Huan Tian, Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, Wanlei Zhou
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.03865
Pdf link: https://arxiv.org/pdf/2311.03865
Abstract Previous studies have developed fairness methods for biased models that exhibit discriminatory behaviors towards specific subgroups. While these models have shown promise in achieving fair predictions, recent research has identified their potential vulnerability to score-based membership inference attacks (MIAs). In these attacks, adversaries can infer whether a particular data sample was used during training by analyzing the model's prediction scores. However, our investigations reveal that these score-based MIAs are ineffective when targeting fairness-enhanced models in binary classifications. The attack models trained to launch the MIAs degrade into simplistic threshold models, resulting in lower attack performance. Meanwhile, we observe that fairness methods often lead to prediction performance degradation for the majority subgroups of the training data. This raises the barrier to successful attacks and widens the prediction gaps between member and non-member data. Building upon these insights, we propose an efficient MIA method against fairness-enhanced models based on fairness discrepancy results (FD-MIA). It leverages the difference in the predictions from both the original and fairness-enhanced models and exploits the observed prediction gaps as attack clues. We also explore potential strategies for mitigating privacy leakages. Extensive experiments validate our findings and demonstrate the efficacy of the proposed method.
A Comparative Study of Knowledge Transfer Methods for Misaligned Urban Building Labels
Authors: Bipul Neupane, Jagannath Aryal, Abbas Rajabifard
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03867
Pdf link: https://arxiv.org/pdf/2311.03867
Abstract Misalignment in Earth observation (EO) images and building labels impact the training of accurate convolutional neural networks (CNNs) for semantic segmentation of building footprints. Recently, three Teacher-Student knowledge transfer methods have been introduced to address this issue: supervised domain adaptation (SDA), knowledge distillation (KD), and deep mutual learning (DML). However, these methods are merely studied for different urban buildings (low-rise, mid-rise, high-rise, and skyscrapers), where misalignment increases with building height and spatial resolution. In this study, we present a workflow for the systematic comparative study of the three methods. The workflow first identifies the best (with the highest evaluation scores) hyperparameters, lightweight CNNs for the Student (among 43 CNNs from Computer Vision), and encoder-decoder networks (EDNs) for both Teachers and Students. Secondly, three building footprint datasets are developed to train and evaluate the identified Teachers and Students in the three transfer methods. The results show that U-Net with VGG19 (U-VGG19) is the best Teacher, and U-EfficientNetv2B3 and U-EfficientNet-lite0 are among the best Students. With these Teacher-Student pairs, SDA could yield upto 0.943, 0.868, 0.912, and 0.697 F1 scores in the low-rise, mid-rise, high-rise, and skyscrapers respectively. KD and DML provide model compression of upto 82%, despite marginal loss in performance. This new comparison concludes that SDA is the most effective method to address the misalignment problem, while KD and DML can efficiently compress network size without significant loss in performance. The 158 experiments and datasets developed in this study will be valuable to minimise the misaligned labels.
Mini but Mighty: Finetuning ViTs with Mini Adapters
Authors: Imad Eddine Marouf, Enzo Tartaglione, Stéphane Lathuilière
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03873
Pdf link: https://arxiv.org/pdf/2311.03873
Abstract Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via fine-tuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of finetuning. In this work, we observe that adapters perform poorly when the dimension of adapters is small, and we propose MiMi, a training framework that addresses this issue. We start with large adapters which can reach high performance, and iteratively reduce their size. To enable automatic estimation of the hidden dimension of every adapter, we also introduce a new scoring function, specifically designed for adapters, that compares the neuron importance across layers. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters across the three dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets.
FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer
Authors: Chi-Chih Chang, Yuan-Yao Sung, Shixing Yu, Ning-Chi Huang, Diana Marculescu, Kai-Chiang Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03912
Pdf link: https://arxiv.org/pdf/2311.03912
Abstract Vision Transformers (ViT) have recently demonstrated success across a myriad of computer vision tasks. However, their elevated computational demands pose significant challenges for real-world deployment. While low-rank approximation stands out as a renowned method to reduce computational loads, efficiently automating the target rank selection in ViT remains a challenge. Drawing from the notable similarity and alignment between the processes of rank selection and One-Shot NAS, we introduce FLORA, an end-to-end automatic framework based on NAS. To overcome the design challenge of supernet posed by vast search space, FLORA employs a low-rank aware candidate filtering strategy. This method adeptly identifies and eliminates underperforming candidates, effectively alleviating potential undertraining and interference among subnetworks. To further enhance the quality of low-rank supernets, we design a low-rank specific training paradigm. First, we propose weight inheritance to construct supernet and enable gradient sharing among low-rank modules. Secondly, we adopt low-rank aware sampling to strategically allocate training resources, taking into account inherited information from pre-trained models. Empirical results underscore FLORA's efficacy. With our method, a more fine-grained rank configuration can be generated automatically and yield up to 33% extra FLOPs reduction compared to a simple uniform configuration. More specific, FLORA-DeiT-B/FLORA-Swin-B can save up to 55%/42% FLOPs almost without performance degradtion. Importantly, FLORA boasts both versatility and orthogonality, offering an extra 21%-26% FLOPs reduction when integrated with leading compression techniques or compact hybrid structures. Our code is publicly available at https://github.com/shadowpa0327/FLORA.
Hardware Aware Evolutionary Neural Architecture Search using Representation Similarity Metric
Authors: Nilotpal Sinha, Abd El Rahman Shabayek, Anis Kacem, Peyman Rostami, Carl Shneider, Djamila Aouada
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2311.03923
Pdf link: https://arxiv.org/pdf/2311.03923
Abstract Hardware-aware Neural Architecture Search (HW-NAS) is a technique used to automatically design the architecture of a neural network for a specific task and target hardware. However, evaluating the performance of candidate architectures is a key challenge in HW-NAS, as it requires significant computational resources. To address this challenge, we propose an efficient hardware-aware evolution-based NAS approach called HW-EvRSNAS. Our approach re-frames the neural architecture search problem as finding an architecture with performance similar to that of a reference model for a target hardware, while adhering to a cost constraint for that hardware. This is achieved through a representation similarity metric known as Representation Mutual Information (RMI) employed as a proxy performance evaluator. It measures the mutual information between the hidden layer representations of a reference model and those of sampled architectures using a single training batch. We also use a penalty term that penalizes the search process in proportion to how far an architecture's hardware cost is from the desired hardware cost threshold. This resulted in a significantly reduced search time compared to the literature that reached up to 8000x speedups resulting in lower CO2 emissions. The proposed approach is evaluated on two different search spaces while using lower computational resources. Furthermore, our approach is thoroughly examined on six different edge devices under various hardware cost constraints.
On the Coupling of Hamilton's Principle and Thermodynamic Extremal Principles
Authors: Klaus Hackl, Jiří Svoboda, Franz Dieter Fischer
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2311.03926
Pdf link: https://arxiv.org/pdf/2311.03926
Abstract Extremal principles can generally be divided into two rather distinct classes. There are, on the one hand side, formulations based on the Lagrangian or Hamiltonian mechanics, respectively, dealing with time dependent problems, but essentially resting on conservation of energy and thus being not applicable to dissipative systems in a consistent way. On the other hand, there are formulations based essentially on maximizing the dissipation, working efficiently for the description of dissipative systems, but being not suitable for including inertia effects. Many at-tempts can be found in the literature to overcome this split into incompatible principles. How-ever, essentially all of them possess an unnatural appearance. In this work, we suggest a solution to this dilemma resting on an additional assumption based on the thermodynamic driving forces involved. Applications to a simple dissipative structure and a material with varying mass demonstrate the capability of the proposed approach.
Adaptive 3D Geometry-based Stochastic Channel Prediction for 3D DL Selection
Authors: Mervat Zarour, Qiuheng Zhou, Sergiy Melnyk, Hans D. Schotten
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.03975
Pdf link: https://arxiv.org/pdf/2311.03975
Abstract This paper addresses the challenges of mobile user requirements in shadowing and multi-fading environments, focusing on the Downlink (DL) radio node selection based on Uplink (UL) channel estimation. One of the key issues tackled in this research is the prediction performance in scenarios where estimated channels are integrated. An adaptive deep learning approach is proposed to improve performance, offering a compelling alternative to traditional interpolation techniques for air-to-ground link selection on demand. Moreover, our study considers a 3D channel model, which provides a more realistic and accurate representation than 2D models, particularly in the context of 3D network node distributions. This consideration becomes crucial in addressing the complex multipath fading effects within geometric stochastic 3D 3GPP channel models in urban environments. Furthermore, our research emphasises the need for adaptive prediction mechanisms that carefully balance the trade-off between DL link forecasted frequency response accuracy and the complexity requirements associated with estimation and prediction. This paper contributes to advancing 3D radio resource management by addressing these challenges, enabling more efficient and reliable communication for energy-constrained flying network nodes in dynamic environments.
Learned Causal Method Prediction
Authors: Shantanu Gupta, Cheng Zhang, Agrin Hilmkil
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2311.03989
Pdf link: https://arxiv.org/pdf/2311.03989
Abstract For a given causal question, it is important to efficiently decide which causal inference method to use for a given dataset. This is challenging because causal methods typically rely on complex and difficult-to-verify assumptions, and cross-validation is not applicable since ground truth causal quantities are unobserved.In this work, we propose CAusal Method Predictor (CAMP), a framework for predicting the best method for a given dataset. To this end, we generate datasets from a diverse set of synthetic causal models, score the candidate methods, and train a model to directly predict the highest-scoring method for that dataset. Next, by formulating a self-supervised pre-training objective centered on dataset assumptions relevant for causal inference, we significantly reduce the need for costly labeled data and enhance training efficiency. Our strategy learns to map implicit dataset properties to the best method in a data-driven manner. In our experiments, we focus on method prediction for causal discovery. CAMP outperforms selecting any individual candidate method and demonstrates promising generalization to unseen semi-synthetic and real-world benchmarks.
A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems
Authors: Cheng Yin, Yi Chen
Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.04014
Pdf link: https://arxiv.org/pdf/2311.04014
Abstract This paper introduces a novel operator, termed the Y operator, to elevate control performance in Actor-Critic(AC) based reinforcement learning for systems governed by stochastic differential equations(SDEs). The Y operator ingeniously integrates the stochasticity of a class of child-mother system into the Critic network's loss function, yielding substantial advancements in the control performance of RL algorithms.Additionally, the Y operator elegantly reformulates the challenge of solving partial differential equations for the state-value function into a parallel problem for the drift and diffusion functions within the system's SDEs.A rigorous mathematical proof confirms the operator's validity.This transformation enables the Y Operator-based Reinforcement Learning(YORL) framework to efficiently tackle optimal control problems in both model-based and data-driven systems.The superiority of YORL is demonstrated through linear and nonlinear numerical examples showing its enhanced performance over existing methods post convergence.
Over-the-Air Computation Empowered Federated Learning: A Joint Uplink-Downlink Design
Authors: Deyou Zhang, Ming Xiao, Mikael Skoglund
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.04059
Pdf link: https://arxiv.org/pdf/2311.04059
Abstract In this paper, we investigate the communication designs of over-the-air computation (AirComp) empowered federated learning (FL) systems considering uplink model aggregation and downlink model dissemination jointly. We first derive an upper bound on the expected difference between the training loss and the optimal loss, which reveals that optimizing the FL performance is equivalent to minimizing the distortion in the received global gradient vector at each edge node. As such, we jointly optimize each edge node transmit and receive equalization coefficients along with the edge server forwarding matrix to minimize the maximum gradient distortion across all edge nodes. We further utilize the MNIST dataset to evaluate the performance of the considered FL system in the context of the handwritten digit recognition task. Experiment results show that deploying multiple antennas at the edge server significantly reduces the distortion in the received global gradient vector, leading to a notable improvement in recognition accuracy compared to the single antenna case.
Implementation and Comparison of Methods to Extract Reliability KPIs out of Textual Wind Turbine Maintenance Work Orders
Authors: Marc-Alexander Lutz, Bastian Schäfermeier, Rachael Sexton, Michael Sharp, Alden Dima, Stefan Faulstich, Jagan Mohini Aluri
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04064
Pdf link: https://arxiv.org/pdf/2311.04064
Abstract Maintenance work orders are commonly used to document information about wind turbine operation and maintenance. This includes details about proactive and reactive wind turbine downtimes, such as preventative and corrective maintenance. However, the information contained in maintenance work orders is often unstructured and difficult to analyze, making it challenging for decision-makers to use this information for optimizing operation and maintenance. To address this issue, this work presents three different approaches to calculate reliability key performance indicators from maintenance work orders. The first approach involves manual labeling of the maintenance work orders by domain experts, using the schema defined in an industrial guideline to assign the label accordingly. The second approach involves the development of a model that automatically labels the maintenance work orders using text classification methods. The third technique uses an AI-assisted tagging tool to tag and structure the raw maintenance information contained in the maintenance work orders. The resulting calculated reliability key performance indicator of the first approach are used as a benchmark for comparison with the results of the second and third approaches. The quality and time spent are considered as criteria for evaluation. Overall, these three methods make extracting maintenance information from maintenance work orders more efficient, enable the assessment of reliability key performance indicators and therefore support the optimization of wind turbine operation and maintenance.
Energy-based Calibrated VAE with Test Time Free Lunch
Authors: Yihong Luo, Siya Qiu, Xingjian Tao, Yujun Cai, Jing Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.04071
Pdf link: https://arxiv.org/pdf/2311.04071
Abstract In this paper, we propose a novel Energy-Calibrated Generative Model that utilizes a Conditional EBM for enhancing Variational Autoencoders (VAEs). VAEs are sampling efficient but often suffer from blurry generation results due to the lack of training in the generative direction. On the other hand, Energy-Based Models (EBMs) can generate high-quality samples but require expensive Markov Chain Monte Carlo (MCMC) sampling. To address these issues, we introduce a Conditional EBM for calibrating the generative direction during training, without requiring it for test time sampling. Our approach enables the generative model to be trained upon data and calibrated samples with adaptive weight, thereby enhancing efficiency and effectiveness without necessitating MCMC sampling in the inference phase. We also show that the proposed approach can be extended to calibrate normalizing flows and variational posterior. Moreover, we propose to apply the proposed method to zero-shot image restoration via neural transport prior and range-null theory. We demonstrate the effectiveness of the proposed method through extensive experiments in various applications, including image generation and zero-shot image restoration. Our method shows state-of-the-art performance over single-step non-adversarial generation.
Deep Neural Network based Optimal Control of Greenhouses
Authors: Kiran Kumar Sathyanarayanan, Philipp Sauerteig, Stefan Streif
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.04077
Pdf link: https://arxiv.org/pdf/2311.04077
Abstract Automatic control of greenhouse crop production is of great interest owing to the increasing energy and labor costs. Hierarchical Model Predictive Control (HMPC) is a multi-level control strategy for regulating environmental conditions in a greenhouse through energy-efficient operation and resource utilization. We suggest in this work to use two-level HMPC, where the upper level generates suitable reference trajectories based on day-ahead predictions. These references are tracked down in the lower level using Nonlinear Model Predictive Control (NMPC). In order to apply HMPC, a model of the crop dynamics is essential. However, the complex nature of the underlying model including discontinuities and nonlinearities results in intractable computational complexity and long sampling times. In this paper, we propose to use NMPC as a data generator to learn the tracking control policy using deep neural networks. Then, the references are tracked using the trained Deep Neural Network (DNN) to reduce the computational burden. The efficiency of our approach under real-time disturbances is demonstrated by means of a simulation study.
A Nearly Linear-Time Distributed Algorithm for Exact Maximum Matching
Authors: Taisuke Izumi Naoki Kitamura Yutaro Yamaguchi
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.04140
Pdf link: https://arxiv.org/pdf/2311.04140
Abstract In this paper, we propose a randomized $\tilde{O}(\Mmax)$-round algorithm for the maximum cardinality matching problem in the CONGEST model, where $\Mmax$ means the maximum size of a matching of the input graph $G$. The proposed algorithm substantially improves the current best worst-case running time. The key technical ingredient is a new randomized algorithm of finding an augmenting path of length $\ell$ with high probability within $\tilde{O}(\ell)$ rounds, which positively settles an open problem left in the prior work by Ahmadi and Kuhn [DISC'20]. The idea of our augmenting path algorithm is based on a recent result by Kitamura and Izumi [IEICE Trans.'22], which efficiently identifies a sparse substructure of the input graph containing an augmenting path, following a new concept called \emph{alternating base trees}. Their algorithm, however, resorts to a centralized approach of collecting the entire information of the substructure into a single vertex for constructing an augmenting path. The technical highlight of this paper is to provide a fully-decentralized counterpart of such a centralized method. To develop the algorithm, we prove several new structural properties of alternating base trees, which are of independent interest.
What is Lost in Knowledge Distillation?
Authors: Manas Mohanty, Tanya Roosta, Peyman Passban
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.04142
Pdf link: https://arxiv.org/pdf/2311.04142
Abstract Deep neural networks (DNNs) have improved NLP tasks significantly, but training and maintaining such networks could be costly. Model compression techniques, such as, knowledge distillation (KD), have been proposed to address the issue; however, the compression process could be lossy. Motivated by this, our work investigates how a distilled student model differs from its teacher, if the distillation process causes any information losses, and if the loss follows a specific pattern. Our experiments aim to shed light on the type of tasks might be less or more sensitive to KD by reporting data points on the contribution of different factors, such as the number of layers or attention heads. Results such as ours could be utilized when determining effective and efficient configurations to achieve optimal information transfers between larger (teacher) and smaller (student) models.
A new fast numerical method for the generalized Rosen-Zener model
Authors: Christian Bonhomme, Stefano Pozza, Niel Van Buggenhout
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.04144
Pdf link: https://arxiv.org/pdf/2311.04144
Abstract In quantum mechanics, the Rosen-Zener model represents a two-level quantum system. Its generalization to multiple degenerate sets of states leads to larger non-autonomous linear system of ordinary differential equations (ODEs). We propose a new method for computing the solution operator of this system of ODEs. This new method is based on a recently introduced expression of the solution in terms of an infinite matrix equation, which can be efficiently approximated by combining truncation, fixed point iterations, and low-rank approximation. This expression is possible thanks to the so-called $\star$-product approach for linear ODEs. In the numerical experiments, the new method's computing time scales linearly with the model's size. We provide a first partial explanation of this linear behavior.
Computing Approximate $\ell_p$ Sensitivities
Authors: Swati Padmanabhan, David P. Woodruff, Qiuyi (Richard)Zhang
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.04158
Pdf link: https://arxiv.org/pdf/2311.04158
Abstract Recent works in dimensionality reduction for regression tasks have introduced the notion of sensitivity, an estimate of the importance of a specific datapoint in a dataset, offering provable guarantees on the quality of the approximation after removing low-sensitivity datapoints via subsampling. However, fast algorithms for approximating $\ell_p$ sensitivities, which we show is equivalent to approximate $\ell_p$ regression, are known for only the $\ell_2$ setting, in which they are termed leverage scores. In this work, we provide efficient algorithms for approximating $\ell_p$ sensitivities and related summary statistics of a given matrix. In particular, for a given $n \times d$ matrix, we compute $\alpha$-approximation to its $\ell_1$ sensitivities at the cost of $O(n/\alpha)$ sensitivity computations. For estimating the total $\ell_p$ sensitivity (i.e. the sum of $\ell_p$ sensitivities), we provide an algorithm based on importance sampling of $\ell_p$ Lewis weights, which computes a constant factor approximation to the total sensitivity at the cost of roughly $O(\sqrt{d})$ sensitivity computations. Furthermore, we estimate the maximum $\ell_1$ sensitivity, up to a $\sqrt{d}$ factor, using $O(d)$ sensitivity computations. We generalize all these results to $\ell_p$ norms for $p > 1$. Lastly, we experimentally show that for a wide class of matrices in real-world datasets, the total sensitivity can be quickly approximated and is significantly smaller than the theoretical prediction, demonstrating that real-world datasets have low intrinsic effective dimensionality.
Measure transport via polynomial density surrogates
Authors: Josephine Westermann, Jakob Zech
Subjects: Numerical Analysis (math.NA); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2311.04172
Pdf link: https://arxiv.org/pdf/2311.04172
Abstract We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $\pi$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $\pi$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss the design and construction of suitable sparse approximation spaces, and provide a complete error and cost analysis for target densities belonging to certain smoothness classes. Further, we explore the relation between our proposed algorithm and related approaches that aim to find suitable transports via optimization over a class of parametrized transports. Finally, we discuss the efficient implementation of our algorithm and report on numerical experiments which confirm our theory.
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models
Authors: Yuiga Wada, Kanta Kaneda, Komei Sugiura
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.04192
Pdf link: https://arxiv.org/pdf/2311.04192
Abstract Image captioning studies heavily rely on automatic evaluation metrics such as BLEU and METEOR. However, such n-gram-based metrics have been shown to correlate poorly with human evaluation, leading to the proposal of alternative metrics such as SPICE for English; however, no equivalent metrics have been established for other languages. Therefore, in this study, we propose an automatic evaluation metric called JaSPICE, which evaluates Japanese captions based on scene graphs. The proposed method generates a scene graph from dependencies and the predicate-argument structure, and extends the graph using synonyms. We conducted experiments employing 10 image captioning models trained on STAIR Captions and PFN-PIC and constructed the Shichimi dataset, which contains 103,170 human evaluations. The results showed that our metric outperformed the baseline metrics for the correlation coefficient with the human evaluation.
Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.04193
Pdf link: https://arxiv.org/pdf/2311.04193
Abstract Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise within the learning process and distracts the agent's focus from task-relevant visual cues. Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI. Our approach induces a task-conditioned bottleneck using a small learnable codebook module. This codebook is trained jointly to optimize task reward and acts as a task-conditioned selective filter over the visual observation. Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, ProcTHOR, ArchitecTHOR, RoboTHOR, AI2-iTHOR, and ManipulaTHOR. The filtered representations produced by the codebook are also able generalize better and converge faster when adapted to other simulation environments such as Habitat. Our qualitative analyses show that agents explore their environments more effectively and their representations retain task-relevant information like target object recognition while ignoring superfluous information about other objects. Code and pretrained models are available at our project website: https://embodied-codebook.github.io.
Quantization-aware Neural Architectural Search for Intrusion Detection
Authors: Rabin Yu Acharya, Laurens Le Jeune, Nele Mentens, Fatemeh Ganji, Domenic Forte
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.04194
Pdf link: https://arxiv.org/pdf/2311.04194
Abstract Deploying machine learning-based intrusion detection systems (IDSs) on hardware devices is challenging due to their limited computational resources, power consumption, and network connectivity. Hence, there is a significant need for robust, deep learning models specifically designed with such constraints in mind. In this paper, we present a design methodology that automatically trains and evolves quantized neural network (NN) models that are a thousand times smaller than state-of-the-art NNs but can efficiently analyze network data for intrusion at high accuracy. In this regard, the number of LUTs utilized by this network when deployed to an FPGA is between 2.3x and 8.5x smaller with performance comparable to prior work.
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
Authors: Yihe Deng, Weitong Zhang, Zixiang Chen, Quanquan Gu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04205
Pdf link: https://arxiv.org/pdf/2311.04205
Abstract Misunderstandings arise not only in interpersonal communication but also between humans and Large Language Models (LLMs). Such discrepancies can make LLMs interpret seemingly unambiguous questions in unexpected ways, yielding incorrect responses. While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped. In this paper, we present a method named `Rephrase and Respond' (RaR), which allows LLMs to rephrase and expand questions posed by humans and provide responses in a single prompt. This approach serves as a simple yet effective prompting method for improving performance. We also introduce a two-step variant of RaR, where a rephrasing LLM first rephrases the question and then passes the original and rephrased questions together to a different responding LLM. This facilitates the effective utilization of rephrased questions generated by one LLM with another. Our experiments demonstrate that our methods significantly improve the performance of different models across a wide range to tasks. We further provide a comprehensive comparison between RaR and the popular Chain-of-Thought (CoT) methods, both theoretically and empirically. We show that RaR is complementary to CoT and can be combined with CoT to achieve even better performance. Our work not only contributes to enhancing LLM performance efficiently and effectively but also sheds light on a fair evaluation of LLM capabilities. Data and codes are available at https://github.com/uclaml/Rephrase-and-Respond.
Deep Hashing via Householder Quantization
Authors: Lucas R. Schwengber, Lucas Resende, Paulo Orenstein, Roberto I. Oliveira
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.04207
Pdf link: https://arxiv.org/pdf/2311.04207
Abstract Hashing is at the heart of large-scale image similarity search, and recent methods have been substantially improved through deep learning techniques. Such algorithms typically learn continuous embeddings of the data. To avoid a subsequent costly binarization step, a common solution is to employ loss functions that combine a similarity learning term (to ensure similar images are grouped to nearby embeddings) and a quantization penalty term (to ensure that the embedding entries are close to binarized entries, e.g., -1 or 1). Still, the interaction between these two terms can make learning harder and the embeddings worse. We propose an alternative quantization strategy that decomposes the learning problem in two stages: first, perform similarity learning over the embedding space with no quantization; second, find an optimal orthogonal transformation of the embeddings so each coordinate of the embedding is close to its sign, and then quantize the transformed embedding through the sign function. In the second step, we parametrize orthogonal transformations using Householder matrices to efficiently leverage stochastic gradient descent. Since similarity measures are usually invariant under orthogonal transformations, this quantization strategy comes at no cost in terms of performance. The resulting algorithm is unsupervised, fast, hyperparameter-free and can be run on top of any existing deep hashing or metric learning algorithm. We provide extensive experimental results showing that this approach leads to state-of-the-art performance on widely used image datasets, and, unlike other quantization strategies, brings consistent improvements in performance to existing deep hashing algorithms.
Keyword: faster

PowerFlowNet: Leveraging Message Passing GNNs for Improved Power Flow Approximation
Authors: Nan Lin, Stavros Orfanoudakis, Nathan Ordonez Cardenas, Juan S. Giraldo, Pedro P. Vergara
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.03415
Pdf link: https://arxiv.org/pdf/2311.03415
Abstract Accurate and efficient power flow (PF) analysis is crucial in modern electrical networks' efficient operation and planning. Therefore, there is a need for scalable algorithms capable of handling large-scale power networks that can provide accurate and fast solutions. Graph Neural Networks (GNNs) have emerged as a promising approach for enhancing the speed of PF approximations by leveraging their ability to capture distinctive features from the underlying power network graph. In this study, we introduce PowerFlowNet, a novel GNN architecture for PF approximation that showcases similar performance with the traditional Newton-Raphson method but achieves it 4 times faster in the simple IEEE 14-bus system and 145 times faster in the realistic case of the French high voltage network (6470rte). Meanwhile, it significantly outperforms other traditional approximation methods, such as the DC relaxation method, in terms of performance and execution time; therefore, making PowerFlowNet a highly promising solution for real-world PF analysis. Furthermore, we verify the efficacy of our approach by conducting an in-depth experimental evaluation, thoroughly examining the performance, scalability, interpretability, and architectural dependability of PowerFlowNet. The evaluation provides insights into the behavior and potential applications of GNNs in power system analysis.
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
Authors: Farnoosh Javadi, Walid Ahmed, Habib Hajimolahoseini, Foozhan Ataiefard, Mohammad Hassanpour, Saina Asani, Austin Wen, Omar Mohamed Awad, Kangling Liu, Yang Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03426
Pdf link: https://arxiv.org/pdf/2311.03426
Abstract Massive transformer-based models face several challenges, including slow and computationally intensive pre-training and over-parametrization. This paper addresses these challenges by proposing a versatile method called GQKVA, which generalizes query, key, and value grouping techniques. GQKVA is designed to speed up transformer pre-training while reducing the model size. Our experiments with various GQKVA variants highlight a clear trade-off between performance and model size, allowing for customized choices based on resource and time limitations. Our findings also indicate that the conventional multi-head attention approach is not always the best choice, as there are lighter and faster alternatives available. We tested our method on ViT, which achieved an approximate 0.3% increase in accuracy while reducing the model size by about 4% in the task of image classification. Additionally, our most aggressive model reduction experiment resulted in a reduction of approximately 15% in model size, with only around a 1% drop in accuracy.
Asynchronous Local Computations in Distributed Bayesian Learning
Authors: Kinjal Bhar, He Bai, Jemin George, Carl Busart
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2311.03496
Pdf link: https://arxiv.org/pdf/2311.03496
Abstract Due to the expanding scope of machine learning (ML) to the fields of sensor networking, cooperative robotics and many other multi-agent systems, distributed deployment of inference algorithms has received a lot of attention. These algorithms involve collaboratively learning unknown parameters from dispersed data collected by multiple agents. There are two competing aspects in such algorithms, namely, intra-agent computation and inter-agent communication. Traditionally, algorithms are designed to perform both synchronously. However, certain circumstances need frugal use of communication channels as they are either unreliable, time-consuming, or resource-expensive. In this paper, we propose gossip-based asynchronous communication to leverage fast computations and reduce communication overhead simultaneously. We analyze the effects of multiple (local) intra-agent computations by the active agents between successive inter-agent communications. For local computations, Bayesian sampling via unadjusted Langevin algorithm (ULA) MCMC is utilized. The communication is assumed to be over a connected graph (e.g., as in decentralized learning), however, the results can be extended to coordinated communication where there is a central server (e.g., federated learning). We theoretically quantify the convergence rates in the process. To demonstrate the efficacy of the proposed algorithm, we present simulations on a toy problem as well as on real world data sets to train ML models to perform classification tasks. We observe faster initial convergence and improved performance accuracy, especially in the low data range. We achieve on average 78% and over 90% classification accuracy respectively on the Gamma Telescope and mHealth data sets from the UCI ML repository.
Sketching methods with small window guarantee using minimum decycling sets
Authors: Guillaume Marçais, Dan DeBlasio, Carl Kingsford
Subjects: Data Structures and Algorithms (cs.DS); Genomics (q-bio.GN)
Arxiv link: https://arxiv.org/abs/2311.03592
Pdf link: https://arxiv.org/pdf/2311.03592
Abstract Most sequence sketching methods work by selecting specific $k$-mers from sequences so that the similarity between two sequences can be estimated using only the sketches. Estimating sequence similarity is much faster using sketches than using sequence alignment, hence sketching methods are used to reduce the computational requirements of computational biology software packages. Applications using sketches often rely on properties of the $k$-mer selection procedure to ensure that using a sketch does not degrade the quality of the results compared with using sequence alignment. In particular the window guarantee ensures that no long region of the sequence goes unrepresented in the sketch. A sketching method with a window guarantee corresponds to a Decycling Set, aka an unavoidable sets of $k$-mers. Any long enough sequence must contain a $k$-mer from any decycling set (hence, it is unavoidable). Conversely, a decycling set defines a sketching method by selecting the $k$-mers from the set. Although current methods use one of a small number of sketching method families, the space of decycling sets is much larger, and largely unexplored. Finding decycling sets with desirable characteristics is a promising approach to discovering new sketching methods with improved performance (e.g., with small window guarantee). The Minimum Decycling Sets (MDSs) are of particular interest because of their small size. Only two algorithms, by Mykkeltveit and Champarnaud, are known to generate two particular MDSs, although there is a vast number of alternative MDSs. We provide a simple method that allows one to explore the space of MDSs and to find sets optimized for desirable properties. We give evidence that the Mykkeltveit sets are close to optimal regarding one particular property, the remaining path length.
Testing RadiX-Nets: Advances in Viable Sparse Topologies
Authors: Kevin Kwak, Zack West, Hayden Jananthan, Jeremy Kepner
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03609
Pdf link: https://arxiv.org/pdf/2311.03609
Abstract The exponential growth of data has sparked computational demands on ML research and industry use. Sparsification of hyper-parametrized deep neural networks (DNNs) creates simpler representations of complex data. Past research has shown that some sparse networks achieve similar performance as dense ones, reducing runtime and storage. RadiX-Nets, a subgroup of sparse DNNs, maintain uniformity which counteracts their lack of neural connections. Generation, independent of a dense network, yields faster asymptotic training and removes the need for costly pruning. However, little work has been done on RadiX-Nets, making testing challenging. This paper presents a testing suite for RadiX-Nets in TensorFlow. We test RadiX-Net performance to streamline processing in scalable models, revealing relationships between network topology, initialization, and training behavior. We also encounter "strange models" that train inconsistently and to lower accuracy while models of similar sparsity train well.
Time-Efficient Reinforcement Learning with Stochastic Stateful Policies
Authors: Firas Al-Hafez, Guoping Zhao, Jan Peters, Davide Tateo
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04082
Pdf link: https://arxiv.org/pdf/2311.04082
Abstract Stateful policies play an important role in reinforcement learning, such as handling partially observable environments, enhancing robustness, or imposing an inductive bias directly into the policy structure. The conventional method for training stateful policies is Backpropagation Through Time (BPTT), which comes with significant drawbacks, such as slow training due to sequential gradient propagation and the occurrence of vanishing or exploding gradients. The gradient is often truncated to address these issues, resulting in a biased policy update. We present a novel approach for training stateful policies by decomposing the latter into a stochastic internal state kernel and a stateless policy, jointly optimized by following the stateful policy gradient. We introduce different versions of the stateful policy gradient theorem, enabling us to easily instantiate stateful variants of popular reinforcement learning and imitation learning algorithms. Furthermore, we provide a theoretical analysis of our new gradient estimator and compare it with BPTT. We evaluate our approach on complex continuous control tasks, e.g., humanoid locomotion, and demonstrate that our gradient estimator scales effectively with task complexity while offering a faster and simpler alternative to BPTT.
HADES: Fast Singularity Detection with Local Measure Comparison
Authors: Uzu Lim, Harald Oberhauser, Vidit Nanda
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Algebraic Topology (math.AT); Differential Geometry (math.DG); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2311.04171
Pdf link: https://arxiv.org/pdf/2311.04171
Abstract We introduce Hades, an unsupervised algorithm to detect singularities in data. This algorithm employs a kernel goodness-of-fit test, and as a consequence it is much faster and far more scaleable than the existing topology-based alternatives. Using tools from differential geometry and optimal transport theory, we prove that Hades correctly detects singularities with high probability when the data sample lives on a transverse intersection of equidimensional manifolds. In computational experiments, Hades recovers singularities in synthetically generated data, branching points in road network data, intersection rings in molecular conformation space, and anomalies in image data.
Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.04193
Pdf link: https://arxiv.org/pdf/2311.04193
Abstract Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise within the learning process and distracts the agent's focus from task-relevant visual cues. Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI. Our approach induces a task-conditioned bottleneck using a small learnable codebook module. This codebook is trained jointly to optimize task reward and acts as a task-conditioned selective filter over the visual observation. Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, ProcTHOR, ArchitecTHOR, RoboTHOR, AI2-iTHOR, and ManipulaTHOR. The filtered representations produced by the codebook are also able generalize better and converge faster when adapted to other simulation environments such as Habitat. Our qualitative analyses show that agents explore their environments more effectively and their representations retain task-relevant information like target object recognition while ignoring superfluous information about other objects. Code and pretrained models are available at our project website: https://embodied-codebook.github.io.
Keyword: mobile

Agile, User-Centered Design and Quality in Software Processes for Mobile Application Development Teaching
Authors: Manuel Ignacio Castillo López, Ana Libia Eslava Cervantes, Gustavo de la Cruz Martínez, Jorge Luis Ortega Arjona
Subjects: Computers and Society (cs.CY); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.03361
Pdf link: https://arxiv.org/pdf/2311.03361
Abstract Agile methods in undergraduate courses have been explored in an effort to close the gap between industry and professional profiles. We have structured an Android application development course based on a tailored user-centered Agile process for development of educational digital tools. This process is based on Scrum and Extreme Programming in combination with User Experience (UX) approaches. The course is executed in two phases: the first half of the semester presents theory on Agile and mobile applications development, the latter half is managed as a workshop where students develop for an actual client. The introduction of UX and user-centered design exploiting the close relationship with stakeholders expected from Agile processes allows for different quality features development. Since 2019 two of the projects have been extended and one project has been developed with the described process and course alumni. Students and stakeholders have found value in the generated products and process.
6DVF: Data Visualisation Framework for mHealth Apps
Authors: Yasmeen Anjeer Alshehhi, Khlood Ahmad, Mohamed Abdelrazek, Alessio Bonti
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2311.03657
Pdf link: https://arxiv.org/pdf/2311.03657
Abstract The widespread of data visualisation tools on smartphones has provided end users an easy way to track their health data, leading designers to put more effort into delivering suitable visualisations. Both academia and industry have developed several frameworks to guide the creation of informative and well-designed charts, such as the visualisation and design framework and Google Material Design. Despite the typical focus on design and chart types in these existing frameworks, our study highlights the need to incorporate additional components when developing data visualisations. The needs of non-expert users, the nature of the data being represented, and the mobile environment are often not prioritised in these frameworks, leading to visualisations that do not meet user needs and expectations. To address these issues, we propose our Six-Dimensions Data Visualisation Framework (6DVF) to assist in the design and evaluation of visualisations on mobile devices. Finally, we present our initial findings from a designer evaluation experiment.
SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers
Authors: Xiangyong Lu, Masanori Suganuma, Takayuki Okatani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03747
Pdf link: https://arxiv.org/pdf/2311.03747
Abstract Computer vision has become increasingly prevalent in solving real-world problems across diverse domains, including smart agriculture, fishery, and livestock management. These applications may not require processing many image frames per second, leading practitioners to use single board computers (SBCs). Although many lightweight networks have been developed for mobile/edge devices, they primarily target smartphones with more powerful processors and not SBCs with the low-end CPUs. This paper introduces a CNN-ViT hybrid network called SBCFormer, which achieves high accuracy and fast computation on such low-end CPUs. The hardware constraints of these CPUs make the Transformer's attention mechanism preferable to convolution. However, using attention on low-end CPUs presents a challenge: high-resolution internal feature maps demand excessive computational resources, but reducing their resolution results in the loss of local image details. SBCFormer introduces an architectural design to address this issue. As a result, SBCFormer achieves the highest trade-off between accuracy and speed on a Raspberry Pi 4 Model B with an ARM-Cortex A72 CPU. For the first time, it achieves an ImageNet-1K top-1 accuracy of around 80% at a speed of 1.0 frame/sec on the SBC. Code is available at https://github.com/xyongLu/SBCFormer.
Learning-Based Latency-Constrained Fronthaul Compression Optimization in C-RAN
Authors: Axel Grönland, Bleron Klaiqi, Xavier Gelabert
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03899
Pdf link: https://arxiv.org/pdf/2311.03899
Abstract The evolution of wireless mobile networks towards cloudification, where Radio Access Network (RAN) functions can be hosted at either a central or distributed locations, offers many benefits like low cost deployment, higher capacity, and improved hardware utilization. Nevertheless, the flexibility in the functional deployment comes at the cost of stringent fronthaul (FH) capacity and latency requirements. One possible approach to deal with these rigorous constraints is to use FH compression techniques. To ensure that FH capacity and latency requirements are met, more FH compression is applied during high load, while less compression is applied during medium and low load to improve FH utilization and air interface performance. In this paper, a model-free deep reinforcement learning (DRL) based FH compression (DRL-FC) framework is proposed that dynamically controls FH compression through various configuration parameters such as modulation order, precoder granularity, and precoder weight quantization that affect both FH load and air interface performance. Simulation results show that DRL-FC exhibits significantly higher FH utilization (68.7% on average) and air interface throughput than a reference scheme (i.e. with no applied compression) across different FH load levels. At the same time, the proposed DRL-FC framework is able to meet the predefined FH latency constraints (in our case set to 260 $\mu$s) under various FH loads.
Adaptive 3D Geometry-based Stochastic Channel Prediction for 3D DL Selection
Authors: Mervat Zarour, Qiuheng Zhou, Sergiy Melnyk, Hans D. Schotten
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.03975
Pdf link: https://arxiv.org/pdf/2311.03975
Abstract This paper addresses the challenges of mobile user requirements in shadowing and multi-fading environments, focusing on the Downlink (DL) radio node selection based on Uplink (UL) channel estimation. One of the key issues tackled in this research is the prediction performance in scenarios where estimated channels are integrated. An adaptive deep learning approach is proposed to improve performance, offering a compelling alternative to traditional interpolation techniques for air-to-ground link selection on demand. Moreover, our study considers a 3D channel model, which provides a more realistic and accurate representation than 2D models, particularly in the context of 3D network node distributions. This consideration becomes crucial in addressing the complex multipath fading effects within geometric stochastic 3D 3GPP channel models in urban environments. Furthermore, our research emphasises the need for adaptive prediction mechanisms that carefully balance the trade-off between DL link forecasted frequency response accuracy and the complexity requirements associated with estimation and prediction. This paper contributes to advancing 3D radio resource management by addressing these challenges, enabling more efficient and reliable communication for energy-constrained flying network nodes in dynamic environments.
Interactive Semantic Map Representation for Skill-based Visual Object Navigation
Authors: Tatiana Zemskova, Aleksei Staroverov, Kirill Muravyev, Dmitry Yudin, Aleksandr Panov
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.04107
Pdf link: https://arxiv.org/pdf/2311.04107
Abstract Visual object navigation using learning methods is one of the key tasks in mobile robotics. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on a regular (backward) or delayed (forward) image sequence. We have implemented this representation into a full-fledged navigation approach called SkillTron, which can select robot skills from end-to-end policies based on reinforcement learning and classic map-based planning methods. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation. We conducted intensive experiments with the proposed approach in the Habitat environment, which showed a significant superiority in navigation quality metrics compared to state-of-the-art approaches. The developed code and used custom datasets are publicly available at github.com/AIRI-Institute/skill-fusion.
Keyword: pruning

Testing RadiX-Nets: Advances in Viable Sparse Topologies
Authors: Kevin Kwak, Zack West, Hayden Jananthan, Jeremy Kepner
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03609
Pdf link: https://arxiv.org/pdf/2311.03609
Abstract The exponential growth of data has sparked computational demands on ML research and industry use. Sparsification of hyper-parametrized deep neural networks (DNNs) creates simpler representations of complex data. Past research has shown that some sparse networks achieve similar performance as dense ones, reducing runtime and storage. RadiX-Nets, a subgroup of sparse DNNs, maintain uniformity which counteracts their lack of neural connections. Generation, independent of a dense network, yields faster asymptotic training and removes the need for costly pruning. However, little work has been done on RadiX-Nets, making testing challenging. This paper presents a testing suite for RadiX-Nets in TensorFlow. We test RadiX-Net performance to streamline processing in scalable models, revealing relationships between network topology, initialization, and training behavior. We also encounter "strange models" that train inconsistently and to lower accuracy while models of similar sparsity train well.
Cup Curriculum: Curriculum Learning on Model Capacity
Authors: Luca Scharr, Vanessa Toborek
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03956
Pdf link: https://arxiv.org/pdf/2311.03956
Abstract Curriculum learning (CL) aims to increase the performance of a learner on a given task by applying a specialized learning strategy. This strategy focuses on either the dataset, the task, or the model. There is little to no work analysing the possibilities to apply CL on the model capacity in natural language processing. To close this gap, we propose the cup curriculum. In a first phase of training we use a variation of iterative magnitude pruning to reduce model capacity. These weights are reintroduced in a second phase, resulting in the model capacity to show a cup-shaped curve over the training iterations. We empirically evaluate different strategies of the cup curriculum and show that it outperforms early stopping reliably while exhibiting a high resilience to overfitting.
Keyword: diffusion

Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems
Authors: Derek Lilienthal, Paul Mello, Magdalini Eirinaki, Stas Tiomkin
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03488
Pdf link: https://arxiv.org/pdf/2311.03488
Abstract While recommender systems have become an integral component of the Web experience, their heavy reliance on user data raises privacy and security concerns. Substituting user data with synthetic data can address these concerns, but accurately replicating these real-world datasets has been a notoriously challenging problem. Recent advancements in generative AI have demonstrated the impressive capabilities of diffusion models in generating realistic data across various domains. In this work we introduce a Score-based Diffusion Recommendation Model (SDRM), which captures the intricate patterns of real-world datasets required for training highly accurate recommender systems. SDRM allows for the generation of synthetic data that can replace existing datasets to preserve user privacy, or augment existing datasets to address excessive data sparsity. Our method outperforms competing baselines such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models in synthesizing various datasets to replace or augment the original data by an average improvement of 4.30% in Recall@$n$ and 4.65% in NDCG@$n$.
3DifFusionDet: Diffusion Model for 3D Object Detection with Robust LiDAR-Camera Fusion
Authors: Xinhao Xiang, Simon Dräger, Jiawei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03742
Pdf link: https://arxiv.org/pdf/2311.03742
Abstract Good 3D object detection performance from LiDAR-Camera sensors demands seamless feature alignment and fusion strategies. We propose the 3DifFusionDet framework in this paper, which structures 3D object detection as a denoising diffusion process from noisy 3D boxes to target boxes. In this framework, ground truth boxes diffuse in a random distribution for training, and the model learns to reverse the noising process. During inference, the model gradually refines a set of boxes that were generated at random to the outcomes. Under the feature align strategy, the progressive refinement method could make a significant contribution to robust LiDAR-Camera fusion. The iterative refinement process could also demonstrate great adaptability by applying the framework to various detecting circumstances where varying levels of accuracy and speed are required. Extensive experiments on KITTI, a benchmark for real-world traffic object identification, revealed that 3DifFusionDet is able to perform favorably in comparison to earlier, well-respected detectors.
Learning Decentralized Traffic Signal Controllers with Multi-Agent Graph Reinforcement Learning
Authors: Yao Zhang, Zhiwen Yu, Jun Zhang, Liang Wang, Tom H. Luan, Bin Guo, Chau Yuen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.03756
Pdf link: https://arxiv.org/pdf/2311.03756
Abstract This paper considers optimal traffic signal control in smart cities, which has been taken as a complex networked system control problem. Given the interacting dynamics among traffic lights and road networks, attaining controller adaptivity and scalability stands out as a primary challenge. Capturing the spatial-temporal correlation among traffic lights under the framework of Multi-Agent Reinforcement Learning (MARL) is a promising solution. Nevertheless, existing MARL algorithms ignore effective information aggregation which is fundamental for improving the learning capacity of decentralized agents. In this paper, we design a new decentralized control architecture with improved environmental observability to capture the spatial-temporal correlation. Specifically, we first develop a topology-aware information aggregation strategy to extract correlation-related information from unstructured data gathered in the road network. Particularly, we transfer the road network topology into a graph shift operator by forming a diffusion process on the topology, which subsequently facilitates the construction of graph signals. A diffusion convolution module is developed, forming a new MARL algorithm, which endows agents with the capabilities of graph learning. Extensive experiments based on both synthetic and real-world datasets verify that our proposal outperforms existing decentralized algorithms.
Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models
Authors: Shengzhe Zhou, Zejian Lee, Shengyuan Zhang, Lefan Hou, Changyuan Yang, Guang Yang, Lingyun Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03830
Pdf link: https://arxiv.org/pdf/2311.03830
Abstract Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation model ($\textbf{SFERD}$). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$\times$64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models.
Formulating Discrete Probability Flow Through Optimal Transport
Authors: Pengze Zhang, Hubery Yin, Chen Li, Xiaohua Xie
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.03886
Pdf link: https://arxiv.org/pdf/2311.03886
Abstract Continuous diffusion models are commonly acknowledged to display a deterministic probability flow, whereas discrete diffusion models do not. In this paper, we aim to establish the fundamental theory for the probability flow of discrete diffusion models. Specifically, we first prove that the continuous probability flow is the Monge optimal transport map under certain conditions, and also present an equivalent evidence for discrete cases. In view of these findings, we are then able to define the discrete probability flow in line with the principles of optimal transport. Finally, drawing upon our newly established definitions, we propose a novel sampling method that surpasses previous discrete diffusion models in its ability to generate more certain outcomes. Extensive experiments on the synthetic toy dataset and the CIFAR-10 dataset have validated the effectiveness of our proposed discrete probability flow. Code is released at: https://github.com/PangzeCheung/Discrete-Probability-Flow.
RobustMat: Neural Diffusion for Street Landmark Patch Matching under Challenging Environments
Authors: Rui She, Qiyu Kang, Sijie Wang, Yuan-Rui Yang, Kai Zhao, Yang Song, Wee Peng Tay
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03904
Pdf link: https://arxiv.org/pdf/2311.03904
Abstract For autonomous vehicles (AVs), visual perception techniques based on sensors like cameras play crucial roles in information acquisition and processing. In various computer perception tasks for AVs, it may be helpful to match landmark patches taken by an onboard camera with other landmark patches captured at a different time or saved in a street scene image database. To perform matching under challenging driving environments caused by changing seasons, weather, and illumination, we utilize the spatial neighborhood information of each patch. We propose an approach, named RobustMat, which derives its robustness to perturbations from neural differential equations. A convolutional neural ODE diffusion module is used to learn the feature representation for the landmark patches. A graph neural PDE diffusion module then aggregates information from neighboring landmark patches in the street scene. Finally, feature similarity learning outputs the final matching score. Our approach is evaluated on several street scene datasets and demonstrated to achieve state-of-the-art matching results under environmental perturbations.
Improving the Effectiveness of Deep Generative Data
Authors: Ruyu Wang, Sabrina Schmedding, Marco F. Huber
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03959
Pdf link: https://arxiv.org/pdf/2311.03959
Abstract Recent deep generative models (DGMs) such as generative adversarial networks (GANs) and diffusion probabilistic models (DPMs) have shown their impressive ability in generating high-fidelity photorealistic images. Although looking appealing to human eyes, training a model on purely synthetic images for downstream image processing tasks like image classification often results in an undesired performance drop compared to training on real data. Previous works have demonstrated that enhancing a real dataset with synthetic images from DGMs can be beneficial. However, the improvements were subjected to certain circumstances and yet were not comparable to adding the same number of real images. In this work, we propose a new taxonomy to describe factors contributing to this commonly observed phenomenon and investigate it on the popular CIFAR-10 dataset. We hypothesize that the Content Gap accounts for a large portion of the performance drop when using synthetic images from DGM and propose strategies to better utilize them in downstream tasks. Extensive experiments on multiple datasets showcase that our method outperforms baselines on downstream classification tasks both in case of training on synthetic only (Synthetic-to-Real) and training on a mix of real and synthetic data (Data Augmentation), particularly in the data-scarce scenario.
A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems
Authors: Cheng Yin, Yi Chen
Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.04014
Pdf link: https://arxiv.org/pdf/2311.04014
Abstract This paper introduces a novel operator, termed the Y operator, to elevate control performance in Actor-Critic(AC) based reinforcement learning for systems governed by stochastic differential equations(SDEs). The Y operator ingeniously integrates the stochasticity of a class of child-mother system into the Critic network's loss function, yielding substantial advancements in the control performance of RL algorithms.Additionally, the Y operator elegantly reformulates the challenge of solving partial differential equations for the state-value function into a parallel problem for the drift and diffusion functions within the system's SDEs.A rigorous mathematical proof confirms the operator's validity.This transformation enables the Y Operator-based Reinforcement Learning(YORL) framework to efficiently tackle optimal control problems in both model-based and data-driven systems.The superiority of YORL is demonstrated through linear and nonlinear numerical examples showing its enhanced performance over existing methods post convergence.
Generative Structural Design Integrating BIM and Diffusion Model
Authors: Zhili He, Yu-Hsing Wang, Jian Zhang
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.04052
Pdf link: https://arxiv.org/pdf/2311.04052
Abstract Intelligent structural design using AI can effectively reduce time overhead and increase efficiency. It has potential to become the new design paradigm in the future to assist and even replace engineers, and so it has become a research hotspot in the academic community. However, current methods have some limitations to be addressed, whether in terms of application scope, visual quality of generated results, or evaluation metrics of results. This study proposes a comprehensive solution. Firstly, we introduce building information modeling (BIM) into intelligent structural design and establishes a structural design pipeline integrating BIM and generative AI, which is a powerful supplement to the previous frameworks that only considered CAD drawings. In order to improve the perceptual quality and details of generations, this study makes 3 contributions. Firstly, in terms of generation framework, inspired by the process of human drawing, a novel 2-stage generation framework is proposed to replace the traditional end-to-end framework to reduce the generation difficulty for AI models. Secondly, in terms of generative AI tools adopted, diffusion models (DMs) are introduced to replace widely used generative adversarial network (GAN)-based models, and a novel physics-based conditional diffusion model (PCDM) is proposed to consider different design prerequisites. Thirdly, in terms of neural networks, an attention block (AB) consisting of a self-attention block (SAB) and a parallel cross-attention block (PCAB) is designed to facilitate cross-domain data fusion. The quantitative and qualitative results demonstrate the powerful generation and representation capabilities of PCDM. Necessary ablation studies are conducted to examine the validity of the methods. This study also shows that DMs have the potential to replace GANs and become the new benchmark for generative problems in civil engineering.
Simple Bundles of Complex Networks
Authors: Alexandre Benatti, Luciano da F. Costa
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2311.04133
Pdf link: https://arxiv.org/pdf/2311.04133
Abstract Complex networks can be used to represent and model an ample diversity of abstract and real-world systems and structures. A good deal of the research on these structures has focused on specific topological properties, including node degree, shortest paths, and modularity. In the present work, we develop an approach aimed at identifying and characterizing simple bundles of interconnections between pairs of nodes (source and destination) in complex networks. More specifically, simple bundles can be understood as corresponding to the bundle of paths obtained while traveling through successive neighborhoods after departing from a given source node. Because no node appears more than once along a given bundle, these structures have been said to be simple, in analogy to the concept of a simple path. In addition to describing simple bundles and providing a possible methodology for their identification, we also consider how their respective effective width can be estimated in terms of diffusion flow and exponential entropy of transition probabilities. The potential of the concepts and methods described in this work is then illustrated respectively to the characterization and analysis of model-theoretic networks, with several interesting results.
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Authors: Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, Jingren Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.04145
Pdf link: https://arxiv.org/pdf/2311.04145
Abstract Video synthesis has recently made remarkable strides benefiting from the rapid development of diffusion models. However, it still encounters challenges in terms of semantic accuracy, clarity and spatio-temporal continuity. They primarily arise from the scarcity of well-aligned text-video data and the complex inherent structure of videos, making it difficult for the model to simultaneously ensure semantic and qualitative excellence. In this report, we propose a cascaded I2VGen-XL approach that enhances model performance by decoupling these two factors and ensures the alignment of the input data by utilizing static images as a form of crucial guidance. I2VGen-XL consists of two stages: i) the base stage guarantees coherent semantics and preserves content from input images by using two hierarchical encoders, and ii) the refinement stage enhances the video's details by incorporating an additional brief text and improves the resolution to 1280$\times$720. To improve the diversity, we collect around 35 million single-shot text-video pairs and 6 billion text-image pairs to optimize the model. By this means, I2VGen-XL can simultaneously enhance the semantic accuracy, continuity of details and clarity of generated videos. Through extensive experiments, we have investigated the underlying principles of I2VGen-XL and compared it with current top methods, which can demonstrate its effectiveness on diverse data. The source code and models will be publicly available at \url{https://i2vgen-xl.github.io}.
Keyword: adaptive

Towards Automated Negative Sampling in Implicit Recommendation
Authors: Fuyuan Lyu, Yaochen Hu, Xing Tang, Yingxue Zhang, Ruiming Tang, Xue Liu
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.03526
Pdf link: https://arxiv.org/pdf/2311.03526
Abstract Negative sampling methods are vital in implicit recommendation models as they allow us to obtain negative instances from massive unlabeled data. Most existing approaches focus on sampling hard negative samples in various ways. These studies are orthogonal to the recommendation model and implicit datasets. However, such an idea contradicts the common belief in AutoML that the model and dataset should be matched. Empirical experiments suggest that the best-performing negative sampler depends on the implicit dataset and the specific recommendation model. Hence, we propose a hypothesis that the negative sampler should align with the capacity of the recommendation models as well as the statistics of the datasets to achieve optimal performance. A mismatch between these three would result in sub-optimal outcomes. An intuitive idea to address the mismatch problem is to exhaustively select the best-performing negative sampler given the model and dataset. However, such an approach is computationally expensive and time-consuming, leaving the problem unsolved. In this work, we propose the AutoSample framework that adaptively selects the best-performing negative sampler among candidates. Specifically, we propose a loss-to-instance approximation to transform the negative sampler search task into the learning task over a weighted sum, enabling end-to-end training of the model. We also designed an adaptive search algorithm to extensively and efficiently explore the search space. A specific initialization approach is also obtained to better utilize the obtained model parameters during the search stage, which is similar to curriculum learning and leads to better performance and less computation resource consumption. We evaluate the proposed framework on four benchmarks over three models. Extensive experiments demonstrate the effectiveness and efficiency of our proposed framework.
Stochastic convergence of regularized solutions for backward heat conduction problems
Authors: Zhongjian Wang, Wenlong Zhang, Zhiwen Zhang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.03623
Pdf link: https://arxiv.org/pdf/2311.03623
Abstract In this paper, we study the stochastic convergence of regularized solutions for backward heat conduction problems. These problems are recognized as ill-posed due to the exponential decay of eigenvalues associated with the forward problems. We derive an error estimate for the least-squares regularized minimization problem within the framework of stochastic convergence. Our analysis reveals that the optimal error of the Tikhonov-type least-squares optimization problem depends on the noise level, the number of sensors, and the underlying ground truth. Moreover, we propose a self-adaptive algorithm to identify the optimal regularization parameter for the optimization problem without requiring knowledge of the noise level or any other prior information, which will be very practical in applications. We present numerical examples to demonstrate the accuracy and efficiency of our proposed method. These numerical results show that our method is efficient in solving backward heat conduction problems.
Exploring the transformation of user interactions to Adaptive Human-Machine Interfaces
Authors: Angela Carrera-Rivera, Daniel Reguera-Bakhache, Felix Larrinaga, Ganix Lasa
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2311.03806
Pdf link: https://arxiv.org/pdf/2311.03806
Abstract Human-machine interfaces (HMI) facilitate communication between humans and machines, and their importance has increased in modern technology. However, traditional HMIs are often static and do not adapt to individual user preferences or behavior. Adaptive User Interfaces (AUIs) have become increasingly important in providing personalized user experiences. Machine learning techniques have gained traction in User Experience (UX) research to provide smart adaptations that can reduce user cognitive load. This paper presents an ongoing exploration of a method for generating adaptive user interfaces by analyzing user interactions and contextual data. It also provides an illustrative example using Markov chains to predict the next step for users interacting with an app for an industrial mixing machine. Furthermore, the paper conducts an offline evaluation of the approach, focusing on the precision of the recommendations. The study emphasizes the importance of incorporating user interactions and contextual data into the design of adaptive HMIs, while acknowledging the existing challenges and potential benefits.
iACOS: Advancing Implicit Sentiment Extraction with Informative and Adaptive Negative Examples
Authors: Xiancai Xu, Jia-Dong Zhang, Lei Xiong, Zhishang Liu
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.03896
Pdf link: https://arxiv.org/pdf/2311.03896
Abstract Aspect-based sentiment analysis (ABSA) have been extensively studied, but little light has been shed on the quadruple extraction consisting of four fundamental elements: aspects, categories, opinions and sentiments, especially with implicit aspects and opinions. In this paper, we propose a new method iACOS for extracting Implicit Aspects with Categories and Opinions with Sentiments. First, iACOS appends two implicit tokens at the end of a text to capture the context-aware representation of all tokens including implicit aspects and opinions. Second, iACOS develops a sequence labeling model over the context-aware token representation to co-extract explicit and implicit aspects and opinions. Third, iACOS devises a multi-label classifier with a specialized multi-head attention for discovering aspect-opinion pairs and predicting their categories and sentiments simultaneously. Fourth, iACOS leverages informative and adaptive negative examples to jointly train the multi-label classifier and the other two classifiers on categories and sentiments by multi-task learning. Finally, the experimental results show that iACOS significantly outperforms other quadruple extraction baselines according to the F1 score on two public benchmark datasets.
Temporal Graph Representation Learning with Adaptive Augmentation Contrastive
Authors: Hongjiang Chen, Pengfei Jiao, Huijun Tang, Huaming Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03897
Pdf link: https://arxiv.org/pdf/2311.03897
Abstract Temporal graph representation learning aims to generate low-dimensional dynamic node embeddings to capture temporal information as well as structural and property information. Current representation learning methods for temporal networks often focus on capturing fine-grained information, which may lead to the model capturing random noise instead of essential semantic information. While graph contrastive learning has shown promise in dealing with noise, it only applies to static graphs or snapshots and may not be suitable for handling time-dependent noise. To alleviate the above challenge, we propose a novel Temporal Graph representation learning with Adaptive augmentation Contrastive (TGAC) model. The adaptive augmentation on the temporal graph is made by combining prior knowledge with temporal information, and the contrastive objective function is constructed by defining the augmented inter-view contrast and intra-view contrast. To complement TGAC, we propose three adaptive augmentation strategies that modify topological features to reduce noise from the network. Our extensive experiments on various real networks demonstrate that the proposed model outperforms other temporal graph representation learning methods.
Adaptive 3D Geometry-based Stochastic Channel Prediction for 3D DL Selection
Authors: Mervat Zarour, Qiuheng Zhou, Sergiy Melnyk, Hans D. Schotten
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.03975
Pdf link: https://arxiv.org/pdf/2311.03975
Abstract This paper addresses the challenges of mobile user requirements in shadowing and multi-fading environments, focusing on the Downlink (DL) radio node selection based on Uplink (UL) channel estimation. One of the key issues tackled in this research is the prediction performance in scenarios where estimated channels are integrated. An adaptive deep learning approach is proposed to improve performance, offering a compelling alternative to traditional interpolation techniques for air-to-ground link selection on demand. Moreover, our study considers a 3D channel model, which provides a more realistic and accurate representation than 2D models, particularly in the context of 3D network node distributions. This consideration becomes crucial in addressing the complex multipath fading effects within geometric stochastic 3D 3GPP channel models in urban environments. Furthermore, our research emphasises the need for adaptive prediction mechanisms that carefully balance the trade-off between DL link forecasted frequency response accuracy and the complexity requirements associated with estimation and prediction. This paper contributes to advancing 3D radio resource management by addressing these challenges, enabling more efficient and reliable communication for energy-constrained flying network nodes in dynamic environments.
mmFUSION: Multimodal Fusion for 3D Objects Detection
Authors: Javed Ahmad, Alessio Del Bue
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.04058
Pdf link: https://arxiv.org/pdf/2311.04058
Abstract Multi-sensor fusion is essential for accurate 3D object detection in self-driving systems. Camera and LiDAR are the most commonly used sensors, and usually, their fusion happens at the early or late stages of 3D detectors with the help of regions of interest (RoIs). On the other hand, fusion at the intermediate level is more adaptive because it does not need RoIs from modalities but is complex as the features of both modalities are presented from different points of view. In this paper, we propose a new intermediate-level multi-modal fusion (mmFUSION) approach to overcome these challenges. First, the mmFUSION uses separate encoders for each modality to compute features at a desired lower space volume. Second, these features are fused through cross-modality and multi-modality attention mechanisms proposed in mmFUSION. The mmFUSION framework preserves multi-modal information and learns to complement modalities' deficiencies through attention weights. The strong multi-modal features from the mmFUSION framework are fed to a simple 3D detection head for 3D predictions. We evaluate mmFUSION on the KITTI and NuScenes dataset where it performs better than available early, intermediate, late, and even two-stage based fusion schemes. The code with the mmdetection3D project plugin will be publicly available soon.
Energy-based Calibrated VAE with Test Time Free Lunch
Authors: Yihong Luo, Siya Qiu, Xingjian Tao, Yujun Cai, Jing Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.04071
Pdf link: https://arxiv.org/pdf/2311.04071
Abstract In this paper, we propose a novel Energy-Calibrated Generative Model that utilizes a Conditional EBM for enhancing Variational Autoencoders (VAEs). VAEs are sampling efficient but often suffer from blurry generation results due to the lack of training in the generative direction. On the other hand, Energy-Based Models (EBMs) can generate high-quality samples but require expensive Markov Chain Monte Carlo (MCMC) sampling. To address these issues, we introduce a Conditional EBM for calibrating the generative direction during training, without requiring it for test time sampling. Our approach enables the generative model to be trained upon data and calibrated samples with adaptive weight, thereby enhancing efficiency and effectiveness without necessitating MCMC sampling in the inference phase. We also show that the proposed approach can be extended to calibrate normalizing flows and variational posterior. Moreover, we propose to apply the proposed method to zero-shot image restoration via neural transport prior and range-null theory. We demonstrate the effectiveness of the proposed method through extensive experiments in various applications, including image generation and zero-shot image restoration. Our method shows state-of-the-art performance over single-step non-adversarial generation.
Keyword: quantization

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Authors: Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu
Subjects: Performance (cs.PF); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03687
Pdf link: https://arxiv.org/pdf/2311.03687
Abstract Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs.
Preliminary Design of Scalable Hardware Integrated Platform for LLRF Application
Authors: Lin Jiang, Jingjun Wen, Tao Xue, Xiaowei Guo, Haoyan Yang, Qiutong Pan, Jianmin Li, Yinong Liu, Liangjun Wei
Subjects: Hardware Architecture (cs.AR); Accelerator Physics (physics.acc-ph)
Arxiv link: https://arxiv.org/abs/2311.03841
Pdf link: https://arxiv.org/pdf/2311.03841
Abstract In this paper, the SHIP4LLRF (Scalable Hardware Integrated Platform for LLRF) based on 6U VPX-standard was designed preliminarily, which includes 6U mother board and two HPC FPGA mezzanine cards (FMCs). The ADC and DAC FMC is based on ADS54J60 from TI and LTC2000Y-16 form ADI, respectively. The system mother board is based on Xilinx Kintex UltraScale KU060, which also features 64-bit DDR4 SDRAM, QSFP and USB3.0 interfaces. Each FMC connector is assigned 58 pairs of LVDS standard IOs and 8 pairs of GTH high-speed serial lanes. Besides, the mother board is equipped with the self-developed ZYNQBee2 module based on ZYNQ7010 for slow control such as EPICS. All ADC or DAC raw data in each SHIP4LLEF is compressed loss-less without triggering and transmitted to the process board. A scalar quantization method which is in development is used for lossless compression of ADC raw data, the process board will decompress the ADC data and perform a digital algorithm to measure the amplitude and phase of the high frequency signal. This de-sign is scalable for testing and upgradability, mean-while, the trigger-less data transmission enable this system participate in both local (rack-scale) and accelerator-wide communication networks.
Learning-Based Latency-Constrained Fronthaul Compression Optimization in C-RAN
Authors: Axel Grönland, Bleron Klaiqi, Xavier Gelabert
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03899
Pdf link: https://arxiv.org/pdf/2311.03899
Abstract The evolution of wireless mobile networks towards cloudification, where Radio Access Network (RAN) functions can be hosted at either a central or distributed locations, offers many benefits like low cost deployment, higher capacity, and improved hardware utilization. Nevertheless, the flexibility in the functional deployment comes at the cost of stringent fronthaul (FH) capacity and latency requirements. One possible approach to deal with these rigorous constraints is to use FH compression techniques. To ensure that FH capacity and latency requirements are met, more FH compression is applied during high load, while less compression is applied during medium and low load to improve FH utilization and air interface performance. In this paper, a model-free deep reinforcement learning (DRL) based FH compression (DRL-FC) framework is proposed that dynamically controls FH compression through various configuration parameters such as modulation order, precoder granularity, and precoder weight quantization that affect both FH load and air interface performance. Simulation results show that DRL-FC exhibits significantly higher FH utilization (68.7% on average) and air interface throughput than a reference scheme (i.e. with no applied compression) across different FH load levels. At the same time, the proposed DRL-FC framework is able to meet the predefined FH latency constraints (in our case set to 260 $\mu$s) under various FH loads.
Deep Hashing via Householder Quantization
Authors: Lucas R. Schwengber, Lucas Resende, Paulo Orenstein, Roberto I. Oliveira
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.04207
Pdf link: https://arxiv.org/pdf/2311.04207
Abstract Hashing is at the heart of large-scale image similarity search, and recent methods have been substantially improved through deep learning techniques. Such algorithms typically learn continuous embeddings of the data. To avoid a subsequent costly binarization step, a common solution is to employ loss functions that combine a similarity learning term (to ensure similar images are grouped to nearby embeddings) and a quantization penalty term (to ensure that the embedding entries are close to binarized entries, e.g., -1 or 1). Still, the interaction between these two terms can make learning harder and the embeddings worse. We propose an alternative quantization strategy that decomposes the learning problem in two stages: first, perform similarity learning over the embedding space with no quantization; second, find an optimal orthogonal transformation of the embeddings so each coordinate of the embedding is close to its sign, and then quantize the transformed embedding through the sign function. In the second step, we parametrize orthogonal transformations using Householder matrices to efficiently leverage stochastic gradient descent. Since similarity measures are usually invariant under orthogonal transformations, this quantization strategy comes at no cost in terms of performance. The resulting algorithm is unsupervised, fast, hyperparameter-free and can be run on top of any existing deep hashing or metric learning algorithm. We provide extensive experimental results showing that this approach leads to state-of-the-art performance on widely used image datasets, and, unlike other quantization strategies, brings consistent improvements in performance to existing deep hashing algorithms.

A-suozhang / GetArxivDaily

New submissions for Wed, 8 Nov 23 #195

Keyword: efficient

HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis

A Simple and Efficient Baseline for Data Attribution on Images

FPGA-QHAR: Throughput-Optimized for Quantized Human Action Recognition on The Edge

Training Multi-layer Neural Networks on Ising Machine

PowerFlowNet: Leveraging Message Passing GNNs for Improved Power Flow Approximation

Federated Learning for Clinical Structured Data: A Benchmark Comparison of Engineering and Statistical Approaches

Orion: A Fully Homomorphic Encryption Compiler for Private Deep Neural Network Inference

In-Context Exemplars as Clues to Retrieving from Large Associative Memory

MFAAN: Unveiling Audio Deepfakes with a Multi-Feature Authenticity Network

Brain Networks and Intelligence: A Graph Neural Network Based Approach to Resting State fMRI Data

Towards Automated Negative Sampling in Implicit Recommendation

PcLast: Discovering Plannable Continuous Latent States

Indexing Techniques for Graph Reachability Queries

Time-optimal Design and Control of Electric Race Cars Equipped with Multi-speed Transmissions

Scalable and Efficient Continual Learning from Demonstration via Hypernetwork-generated Stable Dynamics Model

CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers

TWIST: Teacher-Student World Model Distillation for Efficient Sim-to-Real Transfer

Stochastic convergence of regularized solutions for backward heat conduction problems

Reinforcement Twinning: from digital twins to model-based reinforcement learning

Novel data structures for label based queries specifically efficient for billion+ property graph networks using Kinetica-Graph

A Physics-Guided Bi-Fidelity Fourier-Featured Operator Learning Framework for Predicting Time Evolution of Drag and Lift Coefficients

Instruct Me More! Random Prompting for Visual In-Context Learning

On the Performance of LoRa Empowered Communication for Wireless Body Area Networks

Faster Algorithms for Cycle Hitting Problems on Disk Graphs

Incentive Design for Eco-driving in Urban Transportation Networks

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Efficient Bottom-Up Synthesis for Programs with Local Variables

Contributions of Individual Generators to Nodal Carbon Emissions

Loss Balancing for Fair Supervised Learning

Learning to Learn for Few-shot Continual Active Learning

Improved weight initialization for deep and narrow feedforward neural network

Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse Finetuning

Augmenting Radio Signals with Wavelet Transform for Deep Learning-Based Modulation Recognition

Asymptotically Steerable Finite Fourier-Bessel Transforms and Closure under Convolution

Multi-Beam Forming with Movable-Antenna Array

CapST: An Enhanced and Lightweight Method for Deepfake Video Classification

Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models

Data-informed uncertainty quantification for laser-based powder bed fusion additive manufacturing

On Deep Reinforcement Learning for Traffic Steering Intelligent ORAN

Design and Experimental Verification of a Jumping Legged Robot for Martian Lava Tube Exploration

Hypergraphs with node attributes: structure and inference

FD-MIA: Efficient Attacks on Fairness-enhanced Models

A Comparative Study of Knowledge Transfer Methods for Misaligned Urban Building Labels

Mini but Mighty: Finetuning ViTs with Mini Adapters

FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer

Hardware Aware Evolutionary Neural Architecture Search using Representation Similarity Metric

On the Coupling of Hamilton's Principle and Thermodynamic Extremal Principles

Adaptive 3D Geometry-based Stochastic Channel Prediction for 3D DL Selection

Learned Causal Method Prediction

A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems

Over-the-Air Computation Empowered Federated Learning: A Joint Uplink-Downlink Design

Implementation and Comparison of Methods to Extract Reliability KPIs out of Textual Wind Turbine Maintenance Work Orders

Energy-based Calibrated VAE with Test Time Free Lunch

Deep Neural Network based Optimal Control of Greenhouses

A Nearly Linear-Time Distributed Algorithm for Exact Maximum Matching

What is Lost in Knowledge Distillation?

A new fast numerical method for the generalized Rosen-Zener model

Computing Approximate $\ell_p$ Sensitivities

Measure transport via polynomial density surrogates

JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

Quantization-aware Neural Architectural Search for Intrusion Detection

Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

Deep Hashing via Householder Quantization

Keyword: faster

PowerFlowNet: Leveraging Message Passing GNNs for Improved Power Flow Approximation

GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values

Asynchronous Local Computations in Distributed Bayesian Learning

Sketching methods with small window guarantee using minimum decycling sets

Testing RadiX-Nets: Advances in Viable Sparse Topologies

Time-Efficient Reinforcement Learning with Stochastic Stateful Policies

HADES: Fast Singularity Detection with Local Measure Comparison

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

Keyword: mobile

Agile, User-Centered Design and Quality in Software Processes for Mobile Application Development Teaching

6DVF: Data Visualisation Framework for mHealth Apps

SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers